Announcement

Collapse
No announcement yet.

Archaic Introgression - GHOST - MUC7 GENE

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Archaic Introgression - GHOST - MUC7 GENE

    Whole-genome sequence analysis of a Pan African set of samples reveals archaic gene flow from an extinct basal population of modern humans into sub-Saharan populations 2019


    Here, we examine 15 African populations covering all major continental linguistic groups, ecosystems, and lifestyles within Africa through analysis of whole-genome sequence data of 21 individuals sequenced at deep coverage. We observe a remarkable correlation among genetic diversity and geographic distance, with the hunter-gatherer groups being more genetically differentiated and having larger effective population sizes throughout most modern-human history. Admixture signals are found between neighbor populations from both hunter-gatherer and agriculturalists groups, whereas North African individuals are closely related to Eurasian populations. Regarding archaic gene flow, we test six complex demographic models that consider recent admixture as well as archaic introgression. We identify the fingerprint of an archaic introgression event in the sub-Saharan populations included in the models (~ 4.0% in Khoisan, ~ 4.3% in Mbuti Pygmies, and ~ 5.8% in Mandenka) from an early divergent and currently extinct ghost modern human lineage.

    [...]

    Archaic hominins could have also left a footprint in the gene pool of extant populations, which would represent another confounding parameter when analyzing the genetic diversity within the African continent. Initial studies carried out on archaic genomes reported that Neanderthal or Denisovan signatures were found in non-African groups but not in the genomes of sub-Saharan populations [30, 31]. Recent analyses, though, revealed a more complex panorama. Traces of Neanderthal introgression have been observed not only in North African populations [32], who are in fact historically and genetically different from sub-Saharan peoples [33, 34], but also in other African populations, for instance in Yoruba genomes, although they were most likely introduced through recent Eurasian admixture [28, 35, 36]. Furthermore, some evidence of introgression from unknown now-extinct hominins in African groups is accumulating [37,38,39,40,41,42]. More precisely, archaic introgression has been estimated to be around 5 to 7.9% in Yoruba [37, 42], 2% in Khoisan and Biaka Pygmy [38], and 2% in Hadza, Sandawe, and Western Pygmy populations [39]. Specific candidate introgressed regions have also been identified, for instance, a 20 kbp block found exclusively in sub-Saharan populations that covers the entire MUC7 gene, a protein abundantly expressed in saliva and associated with the composition of oral microbiome [40], or 265 loci spanning ~ 20 Mbp spread across the genome that were detected in two Western African Pygmy populations [41]. Moreover, the first study with whole-genome sequences from prehistoric Africans suggests the existence of a basal modern human lineage that separated before Khoisan ancestors did and have left asymmetrical signatures on different present day western African populations [43]. An alternative model that also fits their data would involve lasting and long-range gene flow that resulted in eastern and southern Africans being unequally connected to different western African groups. With either model, this study has unraveled that basal diversifications of modern humans were complex. In fact, this complexity is in line with the scenario described in previous studies of several events of gene flow that occurred further back in time among archaic hominins, such as between a population that diverged early from AMHs in Africa and ancestors of the Neanderthals [44, 45] or between unknown archaic hominins and ancestors of Denisovans [36].



    We collected 21 samples from the four major continental African linguistic groups that belong to 15 different African populations which are either agriculturalists or hunter-gatherers (Fig. 1a). In addition, we included four Eurasian samples for this study. Whole-genome sequencing of the 25 male individuals was conducted on Illumina sequencing platforms. Nine samples were newly sequenced for this project while the whole-genome shotgun read data was already published for the remaining 16 individuals. All samples were paired-end sequence at deep coverage (21–47x) (Table 1, Additional file 1: Table S1.2).

    [...]

    Hunter-gatherers present the highest genetic diversity of all populations, with Khoisan having greater amount of genetic differences than Pygmies (Fig. 1b top, Additional file 1: Figure S4.1). The four Khoisan samples show similar measures of genetic differences to non-Khoisan samples even belonging to three different groups. Pygmies do not form a single cluster; instead, the Baka Pygmy, in comparison with Mbuti Pygmies, displays less genetic differences to other sub-Saharan and North African populations. Sub-Saharan agriculturalist individuals share highly similar values of genetic diversity relative to all other samples, with lower levels than the ones observed in hunter-gatherers but not as reduced as the non-African samples. The only exception is the Toubou individual, who also maintains similar genetic distance to other sub-Saharan samples but is genetically closer to North African and non-African samples. As expected, North African samples are genetically closer to non-African samples than to sub-Saharan individuals, showing a considerable reduction of genetic diversity.

    We determined long homozygous regions, or runs of homozygosity (ROH), of at least 0.5, 1, and 1.5 Mbp of callable genome in each sample (Fig. 1b bottom, Additional file 1: Figure S4.2). Overall, the total length of ROH within a genome depends largely on the geographical origin of the individual; this is, relatively similar values are observed within continents while the amount increase as the distance to Africa gets bigger [54]. However, long ROH over 1.5 Mbp do not follow this geographical tendency. Instead, those segments are more frequent in populations in which isolation and consanguineous unions are more common. We observed that sub-Saharan agriculturalists present the lowest amounts of ROH, whereas both Khoisan and Pygmies show higher levels of ROH that are closer to the ones found in North African or Eurasian populations (Fig. 1b bottom). Moreover, there are three samples (Saharawi, Toubou, and Yoruba_HGDP00927) as well as almost all hunter-gatherers with long ROH, which might indicate in-breeding at the population or individual level.

    [...]

    Genetic ancestries and gene flow in African individuals

    We explored the correspondence between genetic and geographic diversity in our African samples (Additional file 1: Figure S5.1). We obtained a significant correlation between the first two dimensions of a multidimensional scaling analysis from a genetic distance matrix and the coordinates of the sampled individuals in an African map (R = 0.58; p value based on 1000 replications = 0.003. Removing Bantu individuals, R = 0.655; p value based on 1000 replications = 0.001). This correlation suggests that genetics tends to fit the geographic location of the sampled individuals. In fact, we observed that genetic differentiation tends to increase monotonically with geographic distance between individuals (Additional file 1: Figure S5.2), a pattern that is consistent with a main genetic gradient among African populations. Finally, by means of a Bearing procedure [55], we found that the genetic differentiation in the African continent is in the north-west to the south-east axis (Additional file 1: Figure S5.3). This direction is similar to the north to south angle described by [56] using Fst-based distances and SNP microarray data and is consistent with the Sahara desert acting as a genetic barrier between populations at both sides [56]. The fact that our pattern is somehow rotated could be explained by the particular geographical sampling scheme of our study, which tends to be on the north-west/south-east spatial axis (correlation between latitude and longitude of our sampled locations = − 0.536, p value = 0.012).

    [...]

    Overall, results from both analyses suggest that African populations can be clustered in four major genetic groups: Khoisan, Pygmy, sub-Saharan agriculturalist, and North Africa (Fig. 2). Consistent with the highest amount of differences observed (Fig. 1b), we found the maximum genetic variance was found between Khoisan and Eurasian populations. With the exception of the Baka individual, the other hunter-gatherer samples in our dataset are mostly represented by a single ancestry; however, it should be noted that the general picture for hunter-gatherers is more complex, with mixed ancestries for most populations (Additional file 1: Figure S6.3). On the other hand, most sub-Saharan agriculturalist individuals present some hunter-gatherer ancestry. The proportion is mainly related to the geographic distance between mixed populations. Dinkas, South African, and West African Bantus present the highest proportions of hunter-gatherer ancestries, and they are geographically the closest populations to Mbuti, Khoisan, and Baka, respectively. The East African Bantu, Laal, and Mandenka individuals show lower proportions of hunter-gatherer ancestries, with values following a dwindling gradient that is concordant with the ascending distance to the Mbuti Pygmy location. Finally, North African samples are closer to Eurasian populations than to any sub-Saharan populations, implying that the Sahara Desert might have represented a major barrier within African populations.

    https://genomebiology.biomedcentral....84-5/figures/2


    To formally test admixture, we applied the D-statistics test [59] addressing two scenarios: the admixture between hunter-gatherer populations and their respective geographically surrounding agriculturalist populations (South African Bantu for Khoisans; Laal, Toubou, Dinka, and Eastern Bantu for Mbuti Pygmies; Yoruba and Western Bantu for Baka Pygmies), and the putative gene flow from west Eurasian to African populations. Additionally, we evaluated the latter scenario by calculating F4-ratio estimates [59], which provide accurate proportions of European ancestry into African populations. The ratios we constructed were f4(Han, Yoruba; X, Chimp)/f4(Han, Yoruba; French, Chimp), being X a hunter-gatherer population, and f4(Sardinian, Han; X, Yoruba)/f4(Sardinian, Han; French, Yoruba) when X refers to other African groups.

    We found clear evidence of admixture between Khoisan populations and the South African Bantu individual, as well as between Dinka and Mbuti Pygmies, as this was consistently observed in several comparisons made using different African populations (Additional file 1: Tables S6.1–2). We also detected signatures of gene flow between Mbuti Pygmies and both Chadian individuals (Laal and Toubou), although with lower significance (Additional file 1: Table S6.2). By contrast, East African Bantu, West African Bantu, or Yoruba populations show no evidence of gene flow with their neighbors, Mbuti and Baka Pygmies (Additional file 1: Tables S6.2–3).



    As expected, evidence for admixture between west Eurasians (represented by the French sample) and North African populations was formally identified with the D-statistics test (Additional file 1: Table S6.4). We then estimated an F4-ratio [29, 59] and obtained a significant proportion of the Eurasian component present in North African populations, with values as high as 84.9% for the Saharawi individual and 76.0% for the Libyan sample (Additional file 1: Table S6.5). Two other northeastern sub-Saharan populations (Toubou and East African Bantu) also stood out with highly significant D-statistics values, although of lower magnitude. This is concordant with an estimated west Eurasian ancestry proportion found of 31.4% and 14.9%, respectively (Additional file 1: Tables S6.4–5). Finally, the three Khoisan groups present significant small proportions (3.83–4.11%) of Eurasian ancestry. This signature, which was estimated with the F4-ratio, was not detectable by the D-statistics test (Additional file 1: Tables S6.4–5).

    [...]


    Archaic introgression from known hominins

    Archaic introgression from either known or unknown extinct hominins has been suggested in different African populations [26, 30, 33,34,35,36,37,38,39]. In our data, we confirmed previous findings [28,29,30], as the results of the D-statistics of the form D(X = African population 1, Y = African population 2; Neanderthal/Denisova; Chimpanzee) showed that Eurasian samples as well as North African individuals exhibit a significant enrichment of Neanderthal DNA (higher in East Asia than in West Eurasia or North Africa) when compared to sub-Saharan African samples (Additional file 1: Figure S8.1). Z-score values are generally lower for signatures of Denisovan introgression than for Neanderthal, meaning that a lower proportion of gene flow is observed when admixture has taken place. Asian samples were enriched in archaic DNA from Denisovans, and the European and North African samples too, but at lower levels. This is probably due to the fact that Neanderthal and Denisova are sister groups and consequently share derived alleles that might confound their admixture signals. We found no signals of Neanderthal or Denisovan introgression in the sub-Saharan individuals, which was additionally confirmed with an F4-ratio test for the Neanderthal introgression (Additional file 1: Table S8.1).


    Demographic model

    We aimed to explore the impact of recent population admixture on the genetic landscape of sub-Saharan populations in an integrative manner, as well as the presence and nature of archaic introgression from hominin populations. To this end, we conducted an Approximate Bayesian Computation (ABC) analysis coupled to a Deep Learning (DL) framework [50] (Additional file 1: Figure S9.1).

    We implemented six demographic models (Fig. 4; Additional file 1: Table S9.1) of increasing complexity from a basic one (model A).

    Model A summarizes accepted features of human demography [65]: (i) presence of archaic populations out of the African continent, represented by the Neanderthal and Denisovans lineages, (ii) introgression from early anatomically modern humans into Neanderthal [44, 45], (iii) introgression from an extremely archaic population into Denisovans [36], (iv) Khoisans at the root of mankind [11, 14,15,16,17,18], (v) Out-of-Africa event of AMHs [3], (vi) archaic introgression of a Neanderthal-like population after the Out-of-Africa event in Eurasian populations [30], and (vii) archaic introgression from a Denisovan-like population in East Asians [31]. Furthermore, we included recent migrations between Europeans to West Africans, Europeans to Mbutis, Europeans to Khoisans, West Africans to Mbutis, West Africans to Khoisans, Mbutis to West Africans, Mbuti to Khoisans, and Khoisans to Mbutis. These last parameters, as well as the introgression of the archaic population in Denisovans, can be considered as nuisance parameters.

    Model Bextends model A by adding a “ghost” archaic population, XAf, directly related to the lineage leading to AMHs. In this model, XAf independently inbreeds with each of the AMH African populations.

    Model C extends A by considering that the ghost archaic population is directly related to the Neanderthal lineage, Xn.

    Model D considers that Xn appears in the archaic lineage out of Africa before the Neanderthal and Denisovan split.

    Model E is a mixture of model B and C. It considers two ghost archaic populations, one that directly split from the lineage that will produce the AMHs and another related to the Neanderthal lineage, both admixing with AMH populations within Africa. Finally,

    Model F mixes the ghost features of models B and D.


    First, we estimated the power of the ABC-DL framework to distinguish among the six considered models by using simulated datasets from known models as observed data and running the ABC-DL framework to estimate the posterior probability of each model. Additional file 1: Table S9.2 shows the confusion matrix for the six models using 100 simulations for each model as observed data. Our analysis suggests that the ABC-DL framework cannot identify all the models with the same accuracy; model F shows the lowest P (real model = X | predicted model by ABC-DL = X) = 0.41, whereas models A, B, C, and D show posterior probabilities of correct assignment > 0.5. This is not surprising given that models E and F are the most general ones. Given these results, we applied the ABC-DL to our observed data. Out of the six considered models, the one showing the largest posterior probability is model B (P (model = B|Data) = 0.85), namely the presence of a ghost archaic population directly related with the lineage that produced the anatomically modern humans. Notably, this posterior probability of model B is 11 times greater than the one from the second most supported model (model D) (P (model = D|Data) = 0.078)), a substantial Bayes factor difference [66] that suggests that the best model out of all the compared ones is model B. Remarkably, basic model A, which does not include any kind of archaic introgression in Africa, has a posterior probability close to 0.

    Next, we aimed to estimate the posterior probability of each of the 52 parameters of model B by applying the ABC-DL approach. As a preliminary step, we quantified the performance of the ABC-DL framework in simulated data. For each parameter, we ascertained 1000 simulations at random and estimated the posterior distribution using the ABC-DL. Next, we computed the factor 2 statistic (Additional file 1: Table S9.3), which is the number of times that the estimated mean is within the range 50% and 200% of the true value of the parameter (see Excoffier et al. [67] for details). In 96% of the times, the mean of the posterior distribution of the time of split of XAf with the AMH lineage is within the factor 2, suggesting high confidence in using the mean of this parameter as proxy of the real value. The factor 2 of the amount of introgression of XAf to the different African populations ranges between 77% (XAf to West African) and 72% (XAf to Khoisan) and the times that XAf introgression to the African populations is within the factor 2 range are also ~ 80%, much higher than the expected under randomness. According to the factor 2 analysis, the worse performance of using the mean as a proxy is for migration parameters, which show percentages of factor 2 of ~ 50%, similar to the ones that are observed if the mean of the posterior is sampled at random from the prior distribution. Overall, these analyses support that the mean of the posterior distribution obtained by the ABC-DL framework is a good proxy of the real value used in the simulations for most of the parameters.

    Finally, we estimated the posterior distributions of the parameters that describe the most supported demographic model (Fig. 4, Table 2, and Additional file 1: Table S9.4). The ABC-DL produced posterior distributions that strongly deviated from the prior distributions that we considered (see Additional file 1: Figure S9.3) for most of the parameters, suggesting that the ABC-DL approach could properly extract the information present in the observed data to update the prior distributions of each parameter. Not surprisingly, most of the parameters showing posterior distributions similar to the prior distributions are the same that showed low factor 2 values in our former analysis. According to our ABC-DL analyses (Table 2), the AMH lineage and the one from the archaic Eurasian populations diverged 603 kya (95% credible interval (CI) ranging from 495.85 to 796.86 kya). The ghost XAf archaic population and the AMH lineage split 528 kya (95% CI of 230.16 to 700.06 kya), whereas the Denisovan and Neanderthal lineages split 426 kya (95% CI from 332.77 to 538.37 kya). Archaic introgression estimates from XAf to African populations range from 3.8% (95% CI 1.7 to 4.8%) in Khoisan and 3.9% (95% CI 1.3 to 4.9%) in Mbuti to 5.8% (95% CI 0.7 to 0.97%) in West Africa. Our analyses also identified the archaic introgression from early AMHs into Neanderthals (mean of the posterior distribution = 1.2%), yet the 95% CI included 0% (95% CI ranging from 0 to 4%).

    The obtained estimates of Neanderthal introgression in Eurasian populations in model B are larger (3.9%, 95% CI from 0.017 to 0.048%) than usually reported. Since sub-Saharan populations are traditionally used as outgroup for detecting archaic introgression out of Africa, we wondered whether these estimated values of archaic introgression in Eurasia could be higher than previously by the fact that we were considering in model B archaic introgression within Africa. We conducted the ABC-DL analysis using the model A, the basic model that does not consider XAf (Additional file 1: Table S9.4). The mean of the posterior distribution of the introgression of Neanderthal ancestry in Eurasian populations was 1.1% (95% CI 0.35 to 3.6%), 3.3 times smaller than that obtained in model B and closer to the range of previously reported values.


    [...]

    Compelling evidence accumulates in favor of interbreeding between early hominin species being common instead of exceptional. Neanderthal and Denisovan introgression in Asia, Europe, and North Africa has been well established in previous studies [30,31,32] and confirmed in our data with a D-statistics analysis. Although the poor DNA preservation in ancient samples hinders direct analyses [69], indirect evidence increasingly supports the contribution of unknown now-extinct hominins to the African genetic pool in sub-Saharan Africa [28, 35,36,37,38,39,40,41,42], where the ancestors of modern humans coexisted during the Pleistocene with different archaic humans [41]. Our ABC-DL analysis is a new incorporation to this bulk of indicia. Indeed, it corroborates that a model in which there is no archaic introgression is extremely unlikely, as was previously observed in [38]. Applying this novel strategy that includes a trained machine learning algorithm as first step, the output of which we used in the ABC analysis, we have been able to inquire complex models circumventing the demanding computational requirements for modeling such complex scenarios.



    Our results suggest interbreeding of AMHs with an archaic ghost population that diverged from the AMH lineage at a temporal scale similar to the one between the Neanderthals and Denisovans. This observation would indicate the presence of a deep archaic population substructure also in the African continent and contrasts with previous studies that suggested that a basal lineage had a major impact only on particular western African populations [43]. Furthermore, our analyses showed that the estimated proportion of Neanderthal ancestry in Eurasian populations is highly sensitive to the presence of XAf population, increasing by a threefold the amount of archaic introgression. This result suggests that the amount of Neanderthal ancestry out of Africa that so far has been estimated could be an underestimation by not having considered events of archaic introgression in Africa in the tested models.


    https://genomebiology.biomedcentral....19-1684-5#Fig2






    Population demography and gene flow among African groups, as well as the putative archaic introgression of ancient hominins, have been poorly explored at the genome level. Here, we examine 15 African populations covering all major continental linguistic groups, ecosystems, and lifestyles within Africa through analysis of whole-genome sequence data of 21 individuals sequenced at deep coverage. We observe a remarkable correlation among genetic diversity and geographic distance, with the hunter-gatherer groups being more genetically differentiated and having larger effective population sizes throughout most modern-human history. Admixture signals are found between neighbor populations from both hunter-gatherer and agriculturalists groups, whereas North African individuals are closely related to Eurasian populations. Regarding archaic gene flow, we test six complex demographic models that consider recent admixture as well as archaic introgression. We identify the fingerprint of an archaic introgression event in the sub-Saharan populations included in the models (~ 4.0% in Khoisan, ~ 4.3% in Mbuti Pygmies, and ~ 5.8% in Mandenka) from an early divergent and currently extinct ghost modern human lineage. The present study represents an in-depth genomic analysis of a Pan African set of individuals, which emphasizes their complex relationships and demographic history at population level.

  • #2
    Insights into the human demographic history of Africa through whole-genome sequence analysis

    Evidence for archaic introgression in African populations



    Interbreeding between archaic and modern humans has happened several times across time, involving Neanderthals, Denisovans, as well as several unidentified hominins for which no DNA samples have been found.

    The archaic introgression question in Africa has been object of great
    debate in recent years (Hammer et al., 2011; Lachance et al., 2012; Hsieh et al., 2016; Xu et al., 2017; Durvasula and Sankararaman, 2018). For a long time, it has remained the only continent without genetic evidences for archaic hominin presence, due to the rapid degradation of fossils in sub-Saharan African environments, except for North Africa, which is the only region where Neanderthal introgression has been detected (Sánchez-Quinto et al., 2012).

    However, several methodologies have been developed in order to infer archaic introgression when no ancient sample is available (Racimo et al., 2015). Using these techniques, Hammer et al. (2011) inferred archaic introgression in 61 non-coding regions from two hunter-gatherergroups (Biaka Pygmies and Khoisan). 2% of the genome was estimated to have been introgressed in these populations approximately 35,000 years ago from archaic hominins diverging from the modern human lineage around 700,000 years ago. Lachance et al. (2012) used whole-genome sequences of three Sub-Saharan hunter-gatherers (Western Africa Pygmies, Hadza and Sandawe), estimating archaic introgression with one or more archaic hominin populations about 40,000 years ago. Hsieh et al. (2016) inferred one admixture event from an unknown archaic population into the ancestors of Western Pygmies during the last 30,000 yr. According to Skoglund et al. (2017), an extinct basal western African lineage has contributed to current West Africans’ gene pool, specifically, 13% for Mende and 9% for Yoruba. Durvasula and Sankararaman (2018) estimated 8% of Yoruba population gene pool traces its origin to an unidentified, archaic population.

    Our study contributes to the complexity of the archaic introgression landscape in Africa, using an ABC-DL approach on whole-genome sequences not only discarding a scenario without archaic introgression, whose likelihood is almost zero, but also estimating extensive archaic admixture across Africa, with ancient traces detected in West Africa, East Africa and South Africa, into both hunter-gatherer and non-hunter-gatherer groups. In particular a proportion of 8.8% in Khoisan, 5.8% in Mbuti Pygmies, and 3.3% in Mandenka were estimated. Our analysis also pointed to the nature of the ghost population involved, which is likely to be an extinct sister lineage of Neanderthals from whom they diverged 450,000 years ago.



    Comment


    • #3
      Recovering signals of ghost archaic introgression in African populations 2020


      While introgression from Neanderthals and Denisovans has been documented in modern humans outside Africa, the contribution of archaic hominins to the genetic variation of present-day Africans remains poorly understood. We provide complementary lines of evidence for archaic introgression into four West African populations. Our analyses of site frequency spectra indicate that these populations derive 2 to 19% of their genetic ancestry from an archaic population that diverged before the split of Neanderthals and modern humans. Using a method that can identify segments of archaic ancestry without the need for reference archaic genomes, we built genome-wide maps of archaic ancestry in the Yoruba and the Mende populations. Analyses of these maps reveal segments of archaic ancestry at high frequency in these populations that represent potential targets of adaptive introgression. Our results reveal the substantial contribution of archaic ancestry in shaping the gene pool of present-day West African populations.

      https://www.science.org/doi/10.1126/sciadv.aax5097

      Comment

      Working...
      X