Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 06 June 2016

Determinants of genetic diversity

  • Hans Ellegren 1 &
  • Nicolas Galtier 2  

Nature Reviews Genetics volume  17 ,  pages 422–433 ( 2016 ) Cite this article

29k Accesses

405 Citations

175 Altmetric

Metrics details

  • Evolutionary biology
  • Evolutionary genetics
  • Genetic variation
  • Molecular evolution
  • Next-generation sequencing

This article has been updated

Lewontin's paradox — the much larger variation in species abundance than in genetic diversity — is closer to being explained.

The reproductive strategy of species has an impact on genome-wide diversity, providing a connection between population dynamic processes and the long-term effective population size ( N e ).

Selection at linked sites also affects genome-wide diversity, but not to an extent that it is sufficient alone to explain Lewontin's paradox.

Selection and demography, among other factors, contribute to variation in N e within genomes and leads to variation in diversity in different genomic regions of the same species.

Genetic polymorphism varies among species and within genomes, and has important implications for the evolution and conservation of species. The determinants of this variation have been poorly understood, but population genomic data from a wide range of organisms now make it possible to delineate the underlying evolutionary processes, notably how variation in the effective population size ( N e ) governs genetic diversity. Comparative population genomics is on its way to providing a solution to 'Lewontin's paradox' — the discrepancy between the many orders of magnitude of variation in population size and the much narrower distribution of diversity levels. It seems that linked selection plays an important part both in the overall genetic diversity of a species and in the variation in diversity within the genome. Genetic diversity also seems to be predictable from the life history of a species.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

176,64 € per year

only 14,72 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

research about genetic diversity

Similar content being viewed by others

research about genetic diversity

Phylogenomics and the rise of the angiosperms

research about genetic diversity

The rise of baobab trees in Madagascar

research about genetic diversity

Complexity of avian evolution revealed by family-level genomes

Change history, 08 june 2016.

In the original version of this article, the author name in reference 73 (Stebbins, G. L. Self fertilization and population variability in the higher plants. Am. Naturalist   91 , 41–46 (1957)) was mis-spelled. This has now been corrected. The authors apologise for this error.

Lewontin, R. C. & Hubby, J. L. A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura . Genetics 54 , 595–609 (1966).

PubMed   PubMed Central   CAS   Google Scholar  

Harris, H. Enzyme polymorphisms in man. Proc. R. Soc. Lond. B 164 , 298–310 (1966).

Article   CAS   PubMed   Google Scholar  

Quintana-Murci, L. & Clark, A. G. Population genetic tools for dissecting innate immunity in humans. Nat. Rev. Immunol. 13 , 280–293 (2013).

Article   PubMed   PubMed Central   CAS   Google Scholar  

Bodmer, W. Genetic characterization of human populations: from ABO to a genetic map of the British people. Genetics 199 , 267–279 (2015).

Hake, S. & Ross-Ibarra, J. Genetic, evolutionary and plant breeding insights from the domestication of maize. eLife 4 , e05861 (2015).

Article   PubMed Central   CAS   Google Scholar  

Soares, M. P. & Weiss, G. The Iron Age of host–microbe interactions. EMBO Rep. 16 , 1482–1500 (2015).

Vander Wal, E., Garant, D., Festa-Bianchet, M. & Pelletier, F. Evolutionary rescue in vertebrates: evidence, applications and uncertainty. Phil. Trans. R. Soc. B 368 , 20120090 (2012).

Article   Google Scholar  

Forcada, J. & Hoffman, J. I. Climate change selects for heterozygosity in a declining fur seal population. Nature 511 , 462–465 (2014).

Begun, D. J. et al. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans . PLoS Biol. 5 , e310 (2007).

Lack, J. B. et al. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics 199 , 1229–1241 (2015).

McVean, G., Spencer, C. C. A. & Chaix, R. Perspectives on human genetic variation from the HapMap project. PLoS Genet. 1 , e54 (2005).

The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526 , 68–74 (2015).

Tenaillon, M. I. et al. Patterns of DNA sequence polymorphism along chromosome 1 of maize ( Zea mays ssp. mays L.). Proc. Natl Acad. Sci. USA 98 , 9161–9166 (2001).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana . PLoS Biol. 3 , e196 (2005).

Doniger, S. W. et al. A catalog of neutral and deleterious polymorphism in yeast. PLoS Genet. 4 , e1000183 (2008).

Wong, G. K. S. et al. A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. Nature 432 , 717–722 (2004).

Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409 , 928–933 (2001).

Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12 , 756–766 (2011).

Lynch, M. Evolution of the mutation rate. Trends Genet. 26 , 345–352 (2010).

Charlesworth, B. Effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 10 , 195–205 (2009).

Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, 1983).

Book   Google Scholar  

Lewontin, R. The Genetic Basis of Evolutionary Change (Columbia Univ. Press, 1974). This book is a remarkably clear and early introduction to the problem of variation in genetic diversity and the first statement of the so-called Lewontin's paradox.

Google Scholar  

Leffler, E. M. et al. Revisiting an old riddle: what determines genetic diversity levels within species? PLoS Biol. 10 , e1001388 (2012). This article contains a thorough review of the distribution of DNA sequence diversity across hundreds of eukaryotic species.

Reed, D. H. & Frankham, R. Correlation between fitness and genetic diversity. Conserv. Biol. 17 , 230–237 (2003).

Reed, D. H. & Frankham, R. How closely correlated are molecular and quantitative measures of genetic variation? A meta-analysis. Evolution 55 , 1095–1103 (2001).

Bjørnstad, O. N. & Grenfell, B. T. Noisy clockwork: time series analysis of population fluctuations in animals. Science 293 , 638–643 (2001).

Article   PubMed   Google Scholar  

Sun, J., Cornelius, S. P., Janssen, J., Gray, K. A. & Motter, A. E. Regularity underlies erratic population abundances in marine ecosystems. J. R. Soc. Interface 12 , 20150235 (2015).

Article   PubMed   PubMed Central   Google Scholar  

Banks, S. C. et al. How does ecological disturbance influence genetic diversity? Trends Ecol. Evol. 28 , 670–679 (2013).

Alcala, N. & Vuilleumier, S. Turnover and accumulation of genetic diversity across large time-scale cycles of isolation and connection of populations. Proc. R. Soc. B 281 , 20141369 (2014).

Mayr, E. Animal Species and Evolution (Harvard Univ. Press, 1963).

Hewitt, G. The genetic legacy of the Quaternary ice ages. Nature 405 , 907–913 (2000).

Stuessy, T. F., Takayama, K., López-Sepúlveda, P. & Crawford, D. J. Interpretation of patterns of genetic variation in endemic plant species of oceanic islands. Bot. J. Linnean Soc. 174 , 276–288 (2014).

Aguilar, R., Quesada, M., Ashworth, L., Herrerias-Diego, Y. & Lobo, J. Genetic consequences of habitat fragmentation in plant populations: susceptible signals in plant traits and methodological approaches. Mol. Ecol. 17 , 5177–5188 (2008).

Caplins, S. A. et al. Landscape structure and the genetic effects of a population collapse. Proc. R. Soc. B 281 , 20141798 (2014).

Coltman, D. W. Molecular ecological approaches to studying the evolutionary impact of selective harvesting in wildlife. Mol. Ecol. 17 , 221–235 (2008).

Lynch, M. The Origins of Genome Architecture (Sinauer Associates, 2007).

Romiguier, J. et al. Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature 515 , 261–263 (2014). This study shows a comparative analysis of patterns of diversity across animals revealing a strong influence of the life-history traits of species.

Sung, W., Ackerman, M. S., Miller, S. F., Doak, T. G. & Lynch, M. Drift-barrier hypothesis and mutation-rate evolution. Proc. Natl Acad. Sci. USA 109 , 18488–18492 (2012).

Ness, R. W., Morgan, A. D., Vasanthakrishnan, R. B., Colegrave, N. & Keightley, P. D. Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii . Genome Res. 25 , 1739–1749 (2015).

Wright, S. Size of population and breeding structure in relation to evolution. Science 87 , 430–431 (1938).

Weber, D., Stewart, B. S., Garza, J. C. & Lehman, N. An empirical genetic assessment of the severity of the northern elephant seal population bottleneck. Curr. Biol. 10 , 1287–1290 (2000).

Hedrick, P. W. Conservation genetics and North American bison ( Bison bison ). J. Hered. 100 , 411–420 (2009).

Spielman, D., Brook, B. W. & Frankham, R. Most species are not driven to extinction before genetic factors impact them. Proc. Natl Acad. Sci. USA 101 , 15261–15264 (2004).

Nabholz, B., Mauffrey, J. -F., Bazin, E., Galtier, N. & Glemin, S. Determination of mitochondrial genetic diversity in mammals. Genetics 178 , 351–361 (2008).

McCusker, M. R. & Bentzen, P. Positive relationships between genetic diversity and abundance in fishes. Mol. Ecol. 19 , 4852–4862 (2010).

Perry, G. H. et al. Comparative RNA sequencing reveals substantial genetic variation in endangered primates. Genome Res. 22 , 602–610 (2012).

Pinsky, M. L. & Palumbi, S. R. Meta-analysis reveals lower genetic diversity in overfished populations. Mol. Ecol. 23 , 29–39 (2014).

Ho, S. Y. W. & Shapiro, B. Skyline-plot methods for estimating demographic history from nucleotide sequences. Mol. Ecol. Resour. 11 , 423–434 (2011).

Drummond, A. J., Rambaut, A., Shapiro, B. & Pybus, O. G. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22 , 1185–1192 (2005).

Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475 , 493–496 (2011).

Liu, X. & Fu, Y. -X. Exploring population size changes using SNP frequency spectra. Nat. Genet. 47 , 555–559 (2015).

Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46 , 919–925 (2014).

Nadachowska-Brzyska, K., Li, C., Smeds, L., Zhang, G. & Ellegren, H. Temporal dynamics of avian populations during Pleistocene revealed by whole-genome sequences. Curr. Biol. 25 , 1375–1380 (2015).

Jarne, P. Mating system, bottlenecks and genetic polymorphism in hermaphroditic animals. Genet. Res. 65 , 193–207 (1995).

Charlesworth, D. & Wright, S. Breeding systems and genome evolution. Curr. Opin. Genet. Dev. 11 , 685–690 (2001).

Glémin, S., Bazin, E. & Charlesworth, D. Impact of mating systems on patterns of sequence polymorphism in flowering plants. Proc. R. Soc. B 273 , 3011–3019 (2006).

Glémin, S. & Muyle, A. Mating systems and selection efficacy: a test using chloroplastic sequence data in angiosperms. J. Evol. Biol. 27 , 1386–1399 (2014).

Hartfield, M. Evolutionary genetic consequences of facultative sex and outcrossing. J. Evol. Biol. 29 , 5–22 (2016). This review discusses the theoretical predictions and empirical evidence regarding genome evolution in asexual versus sexual contexts.

Slotte, T. et al. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45 , 831–835 (2013).

Burgarella, C. et al. Molecular evolution of freshwater snails with contrasting mating systems. Mol. Biol. Evol. 32 , 2403–2416 (2015).

Thomas, C. G. et al. Full-genome evolutionary histories of selfing, splitting, and selection in Caenorhabditis . Genome Res. 25 , 667–678 (2015).

Dey, A., Chan, C. K. W., Thomas, C. G. & Cutter, A. D. Molecular hyperdiversity defines populations of the nematode Caenorhabditis brenneri . Proc. Natl Acad. Sci. USA 110 , 11056–11060 (2013).

Dolgin, E. S., Charlesworth, B. & Cutter, A. D. Population frequencies of transposable elements in selfing and outcrossing Caenorhabditis nematodes. Genet. Res. 90 , 317–329 (2008).

Article   CAS   Google Scholar  

Wright, S. I., Kalisz, S. & Slotte, T. Evolutionary consequences of self-fertilization in plants. Proc. R. Soc. B 280 , 20130133 (2013).

Balloux, F., Lehmann, L. & de MeeÛs, T. The population genetics of clonal and partially clonal diploids. Genetics 164 , 1635–1644 (2003).

PubMed   PubMed Central   Google Scholar  

Mark Welch, D. B. & Meselson, M. Evidence for the evolution of Bdelloid rotifers without sexual reproduction or genetic exchange. Science 288 , 1211–1215 (2000).

Delmotte, F. et al. Phylogenetic evidence for hybrid origins of asexual lineages in an aphid species. Evolution 57 , 1291–1303 (2003).

Schaefer, I. et al. No evidence for the 'Meselson effect' in parthenogenetic oribatid mites (Oribatida, Acari). J. Evol. Biol. 19 , 184–193 (2006).

Schwander, T., Henry, L. & Crespi Bernard, J. Molecular evidence for ancient asexuality in Timema stick insects. Curr. Biol. 21 , 1129–1134 (2011).

Hollister, J. D. et al. Recurrent loss of sex is associated with accumulation of deleterious mutations in Oenothera . Mol. Biol. Evol. 32 , 896–905 (2015).

Maynard Smith, J. The Evolution of Sex (Cambridge Univ. Press, 1978).

McDonald, M. J., Rice, D. P. & Desai, M. M. Sex speeds adaptation by altering the dynamics of molecular evolution. Nature 531 , 233–236 (2016).

Stebbins, G. L. Self fertilization and population variability in the higher plants. Am. Naturalist 91 , 41–46 (1957).

Judson, O. P. & Normark, B. B. Ancient asexual scandals. Trends Ecol. Evol. 11 , 41–46 (1996).

Simon, J. C., Delmotte, F., Rispe, C. & Crease, T. Phylogenetic evidence for hybrid origins of asexual lineages in an aphid species. Evolution 57 , 1291–1303 (2003).

Igic, B. & Busch, J. W. Is self-fertilization an evolutionary dead end? New Phytol. 198 , 386–397 (2013).

Tajima, F. Relationship between DNA polymorphism and fixation time. Genetics 125 , 447–454 (1990).

Cutter, A. D. & Payseur, B. A. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat. Rev. Genet. 14 , 262–274 (2013).

Maynard Smith, J. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23 , 23–35 (1974).

Kaplan, N. L., Hudson, R. R. & Langley, C. H. The “hitchhiking effect” revisited. Genetics 123 , 887–899 (1989).

Gillespie, J. H. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155 , 909–919 (2000).

Gillespie, J. H. Is the population size of a species relevant to its evolution? Evolution 55 , 2161–2169 (2001). This paper shows a theoretical examination of the effects of recurrent adaptive substitutions on linked loci and their relationship to N e .

Charlesworth, B., Morgan, M. T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134 , 1289–1303 (1993). This study shows a theoretical examination of the effects of recurrent deleterious substitutions on linked loci and the background selection model.

Charlesworth, B. The effect of background selection against deleterious mutations on weakly selected, linked variants. Genet. Res. 63 , 213–227 (1994).

Corbett-Detig, R. B., Hartl, D. L. & Sackton, T. B. Natural selection constrains neutral diversity across a wide range of species. PLoS Biol. 13 , e1002112 (2015). This article demonstrates the role of linked selection in shaping the within-genome variation in polymorphism and its relationship with N e .

Coop, G. Does linked selection explain the narrow range of genetic diversity across species? bioRxiv http://dx.doi.org/10.1101/042598 (2016).

Elyashiv, E. et al. A genomic map of the effects of linked selection in Drosophila . arXiv http://arXiv.org//abs/1408.5461v1 (2014).

Comeron, J. M. Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet. 10 , e1004434 (2014).

Enard, D., Messer, P. W. & Petrov, D. A. Genome-wide signals of positive selection in human evolution. Genome Res. 24 , 885–895 (2014).

Gossmann, T. I., Woolfit, M. & Eyre-Walker, A. Quantifying the variation in the effective population size within a genome. Genetics 189 , 1389–1402 (2011).

Wu, C.-I. The genic view of the process of speciation. J. Evol. Biol. 14 , 851–865 (2001).

Begun, D. J. & Aquadro, C. F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster . Nature 356 , 519–520 (1992).

Nachman, M. W. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17 , 481–485 (2001).

Lercher, M. J. & Hurst, L. D. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 18 , 337–340 (2002).

Dvorak, J., Luo, M. C. & Yang, Z. L. Restriction fragment length polymorphism and divergence in the genomic regions of high and low recombination in self-fertilizing and cross-fertilizing Aegilops species. Genetics 148 , 423–434 (1998).

Stephan, W. & Langley, C. H. DNA polymorphism in Lycopersicon and crossing-over per physical length. Genetics 150 , 1585–1593 (1998).

Cutter, A. D. & Choi, J. Y. Natural selection shapes nucleotide polymorphism across the genome of the nematode Caenorhabditis briggsae . Genome Res. 20 , 1103–1111 (2010).

Fay, J. C. & Wu, C. I. Hitchhiking under positive Darwinian selection. Genetics 155 , 1405–1413 (2000).

Campos, J. L., Halligan, D. L., Haddrill, P. R. & Charlesworth, B. The relation between recombination rate and patterns of molecular evolution and variation in Drosophila melanogaster . Mol. Biol. Evol. 31 , 1010–1028 (2014).

Messer, P. W. & Petrov, D. A. Frequent adaptation and the McDonald–Kreitman test. Proc. Natl Acad. Sci. USA 110 , 8615–8620 (2013).

Sella, G., Petrov, D. A., Przeworski, M. & Andolfatto, P. Pervasive natural selection in the Drosophila genome? PLoS Genet. 5 , e1000495 (2009). This article reviews the evidence for a pervasive role of linked selection on patterns of genetic variation in Drosophila species.

Slotte, T. The impact of linked selection on plant genomic variation. Brief. Funct. Genomics 13 , 268–275 (2014).

Lohmueller, K. E. et al. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet. 7 , e1002326 (2011).

Messer, P. W. SLiM: simulating evolution with selection and linkage. Genetics 194 , 1037–1039 (2013).

Hernandez, R. D. A flexible forward simulator for populations subject to selection and demography. Bioinformatics 24 , 2786–2787 (2008).

Bank, C., Ewing, G. B., Ferrer-Admettla, A., Foll, M. & Jensen, J. D. Thinking too positive? Revisiting current methods of population genetic selection inference. Trends Genet. 30 , 540–546 (2014).

Coop, G. & Ralph, P. Patterns of neutral diversity under general models of selective sweeps. Genetics 192 , 205–224 (2012).

Bolívar, P., Mugal, C. F., Nater, A. & Ellegren, H. Recombination rate variation modulates gene sequence evolution mainly via GC-biased gene conversion, not Hill–Robertson interference, in an avian system. Mol. Biol. Evol. 33 , 216–227 (2016).

Payseur, B. A. & Nachman, M. W. Gene density and human nucleotide polymorphism. Mol. Biol. Evol. 19 , 336–340 (2002).

Charlesworth, B. Background selection and patterns of genetic diversity in Drosophila melanogaster . Genet. Res. 68 , 131–149 (1996).

Hudson, R. R. & Kaplan, N. L. Deleterious background selection with recombination. Genetics 141 , 1605–1617 (1995).

Nordborg, M., Charlesworth, B. & Charlesworth, D. The effect of recombination on background selection. Genet. Res. 67 , 159–174 (1996).

Flowers, J. M. et al. Natural selection in gene-dense regions shapes the genomic pattern of polymorphism in wild and domesticated rice. Mol. Biol. Evol. 29 , 675–687 (2012).

Burri, R. et al. Linked selection and recombination rate variation drive the evolution of the genomic landscape of differentiation across the speciation continuum of Ficedula flycatchers. Genome Res. 25 , 1656–1665 (2015). This study is a high-resolution examination of genome-wide patterns of diversity and the role of recombination and linked selection in several species of flycatcher.

Nabholz, B. et al. Transcriptome population genomics reveals severe bottleneck and domestication cost in the African rice ( Oryza glaberrima ). Mol. Ecol. 23 , 2210–2227 (2014).

Hellmann, I., Ebersberger, I., Ptak, S. E., Pääbo, S. & Przeworski, M. A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72 , 1527–1535 (2003).

Yang, S. et al. Parent-progeny sequencing indicates higher mutation rates in heterozygotes. Nature 523 , 463–467 (2015).

Arbeithuber, B., Betancourt, A. J., Ebner, T. & Tiemann-Boege, I. Crossovers are associated with mutation and biased gene conversion at recombination hotspots. Proc. Natl Acad. Sci. USA 112 , 2109–2114 (2015).

Rattray, A., Santoyo, G., Shafer, B. & Strathern, J. N. Elevated mutation rate during meiosis in Saccharomyces cerevisiae . PLoS Genet. 11 , e1004910 (2015).

Duret, L. & Galtier, N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu. Rev. Genom. Hum. Genet. 10 , 285–311 (2009).

Wallberg, A., Glémin, S. & Webster, M. T. Extreme recombination frequencies shape genome variation and evolution in the honeybee, Apis mellifera . PLoS Genet. 11 , e1005189 (2015).

Hammer, M. F. et al. The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nat. Genet. 42 , 830–831 (2010).

Arbiza, L., Gottipati, S., Siepel, A. & Keinan, A. Contrasting X-linked and autosomal diversity across 14 human populations. Am. J. Hum. Genet. 94 , 827–844 (2014).

Gottipati, S., Arbiza, L., Siepel, A., Clark, A. G. & Keinan, A. Analyses of X-linked and autosomal genetic variation in population-scale whole genome sequencing. Nat. Genet. 43 , 741–743 (2011).

Charlesworth, B. The role of background selection in shaping patterns of molecular evolution and variation: evidence from variability on the Drosophila X chromosome. Genetics 191 , 233–246 (2012).

Frankham, R. How closely does genetic diversity in finite populations conform to predictions of neutral theory? Large deficits in regions of low recombination. Heredity 108 , 167–178 (2012). This paper reviews and demonstrates the reduction in genetic diversity in low-recombining genomic regions, including sex chromosomes, in plants and animals.

Mank, J. E., Vicoso, B., Berlin, S. & Charlesworth, B. Effective population size and the faster-X effect: empirical results and their interpretation. Evolution 64 , 663–674 (2010).

Corl, A. & Ellegren, H. The genomic signature of sexual selection in the genetic diversity of the sex chromosomes and autosomes. Evolution 66 , 2138–2149 (2012).

Huang, H. & Rabosky, D. L. Sex-linked genomic variation and its relationship to avian plumage dichromatism and sexual selection. BMC Evol. Biol. 15 , 199 (2015).

Smeds, L. et al. Genomic identification and characterization of the pseudoautosomal region in highly differentiated avian sex chromosomes. Nat. Commun. 5 , 5448 (2014).

Article   PubMed   CAS   Google Scholar  

Lien, S., Szyda, J., Schechinger, B., Rappold, G. & Arnheim, N. Evidence for heterogeneity in recombination in the human pseudoautosomal region: high resolution analysis by sperm typing and radiation-hybrid mapping. Am. J. Hum. Genet. 66 , 557–566 (2000).

Bussell, J. J., Pearson, N. M., Kanda, R., Filatov, D. A. & Lahn, B. T. Human polymorphism and human–chimpanzee divergence in pseudoautosomal region correlate with local recombination rate. Gene 368 , 94–100 (2006).

Charlesworth, B. & Charlesworth, D. The degeneration of Y chromosomes. Phil. Trans. R. Soc. Lond. B 355 , 1563–1572 (2000).

Bachtrog, D. Y-chromosome evolution: emerging insights into processes of Y-chromosome degeneration. Nat. Rev. Genet. 14 , 113–124 (2013).

Mank, J. E. Small but mighty: the evolutionary dynamics of W and Y sex chromosomes. Chromosome Res. 20 , 21–33 (2011).

Hellborg, L. & Ellegren, H. Low levels of nucleotide diversity in mammalian Y chromosomes. Mol. Biol. Evol. 21 , 158–163 (2004).

Bachtrog, D., Thornton, K., Clark, A., Andolfatto, P. & Harrison, R. Extensive introgression of mitochondrial DNA relative to nuclear genes in the Drosophila yakuba species group. Evolution 60 , 292–302 (2006).

Shen, P. et al. Population genetic implications from sequence variation in four Y chromosome genes. Proc. Natl Acad. Sci. USA 97 , 7354–7359 (2000).

Qiu, S., Bergero, R., Forrest, A., Kaiser, V. B. & Charlesworth, D. Nucleotide diversity in Silene latifolia autosomal and sex-linked genes. Proc. R. Soc. B 277 , 3283–3290 (2010).

Filatov, D. A., Laporte, V., Vitte, C. & Charlesworth, D. DNA diversity in sex-linked and autosomal genes of the plant species Silene latifolia and Silene dioica . Mol. Biol. Evol. 18 , 1442–1454 (2001).

Smeds, L. et al. Evolutionary analysis of the female-specific avian W chromosome. Nat. Commun. 6 , 7330 (2015).

Wilson Sayres, M. A., Lohmueller, K. E. & Nielsen, R. Natural selection reduced diversity on human Y chromosomes. PLoS Genet. 10 , e1004064 (2014).

Ellegren, H. Characteristics, causes and evolutionary consequences of male-biased mutation. Proc. R. Soc. B 274 , 1–10 (2007).

Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467 , 1099–1103 (2010).

Venn, O. et al. Strong male bias drives germline mutation in chimpanzees. Science 344 , 1272–1275 (2014).

Cutter, A. D., Jovelin, R. & Dey, A. Molecular hyperdiversity and evolution in very large populations. Mol. Ecol. 22 , 2074–2095 (2013). This article discusses the specificities and challenges associated with very highly polymorphic species, with a focus on Caenorhabditis nematodes.

Drouin, G. Characterization of the gene conversions between the multigene family members of the yeast genome. J. Mol. Evol. 55 , 14–23 (2002).

Borts, R. H. & Haber, J. E. Meiotic recombination in yeast: alteration by multiple heterozygosities. Science 237 , 1459–1465 (1987).

Dobzhansky, T. Evolution, Genetics, and Man (Wiley, 1955).

Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246 , 96–98 (1973).

Hubby, J. L. & Lewontin, R. C. A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura . Genetics 54 , 577–594 (1966).

Soulé, M. in Molecular Evolution (ed. Ayala, F.) 60–77 (Sinauer Associates, 1976).

Nevo, E., Beiles, A. & Ben-Shlomo, R. in Evolutionary Dynamics of Genetic Diversity: Proceedings of a Symposium held in Manchester, England, March 29–30, 1983 (ed. Mani, G. S.) (Springer, 1984).

Hamrick, J. L. & Godt, M. J. W. Effects of life history traits on genetic diversity in plant species. Phil. Trans. R. Soc. Lond. B 351 , 1291–1298 (1996).

Cole, C. T. Genetic variation in rare and common plants. Annu. Rev. Ecol. Evol. Systemat. 34 , 213–237 (2003).

Avise, J. C. et al. Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematics. Annu. Rev. Ecol. Systemat. 18 , 489–522 (1987).

Bazin, E., Glémin, S. & Galtier, N. Population size does not influence mitochondrial genetic diversity in animals. Science 312 , 570–572 (2006).

Nabholz, B., Glémin, S. & Galtier, N. The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals. BMC Evol. Biol. 9 , 1–13 (2009).

Ballard, J. W. O. & Whitlock, M. C. The incomplete natural history of mitochondria. Mol. Ecol. 13 , 729–744 (2004).

Berlin, S., Tomaras, D. & Charlesworth, B. Low mitochondrial variability in birds may indicate Hill–Robertson effects on the W chromosome. Heredity 99 , 389–396 (2007).

Hurst, G. D. D. & Jiggins, F. M. Problems with mitochondrial DNA as a marker in population, phylogeographic and phylogenetic studies: the effects of inherited symbionts. Proc. R. Soc. B 272 , 1525–1534 (2005).

Galtier, N., Nabholz, B., Glémin, S. & Hurst, G. D. D. Mitochondrial DNA as a marker of molecular diversity: a reappraisal. Mol. Ecol. 18 , 4541–4550 (2009).

Piganeau, G. & Eyre-Walker, A. Evidence for variation in the effective population size of animal mitochondrial DNA. PLoS ONE 4 , e4396 (2009).

Jarne, P. & Lagoda, P. J. L. Microsatellites, from molecules to populations and back. Trends Ecol. Evol. 11 , 424–429 (1996).

Väli, Ü., Einarsson, A., Waits, L. & Ellegren, H. To what extent do microsatellite markers reflect genome-wide genetic diversity in natural populations? Mol. Ecol. 17 , 3808–3817 (2008).

Fungtammasan, A. et al. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res. 25 , 736–749 (2015).

Ellegren, H. Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol. 29 , 51–63 (2014).

Lynch, M. & Conery, J. S. The origins of genome complexity. Science 302 , 1401–1404 (2003).

Wright, S. Evolution in Mendelian populations. Genetics 16 , 97–159 (1931).

Luikart, G., Ryman, N., Tallmon, D., Schwartz, M. & Allendorf, F. Estimation of census and effective population sizes: the increasing usefulness of DNA-based approaches. Conserv. Genet. 11 , 355–373 (2010).

Palstra, F. P. & Fraser, D. J. Effective/census population size ratio estimation: a compendium and appraisal. Ecol. Evol. 2 , 2357–2365 (2012).

Gilbert, K. J. & Whitlock, M. C. Evaluating methods for estimating local effective population size with and without migration. Evolution 69 , 2154–2166 (2015).

Browning, S. R. & Browning, B. L. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97 , 404–418 (2015).

Kirin, M. et al. Genomic runs of homozygosity record population history and consanguinity. PLoS ONE 5 , e13996 (2010).

Palamara, P. F., Lencz, T., Darvasi, A. & Pe'er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91 , 809–822 (2012).

Download references

Acknowledgements

This work was supported by Swedish Research Council grants (2010–5650 and 2013–8271), a European Research Council grant (AdG 249976) and the Knut and Alice Wallenberg Foundation to H.E., and by a European Research Council grant (AdG 232971) and a French National Research Agency grant (ANR-10-BINF-01-01) to N.G. The authors thank N. Bierne, S. Glemin and M. Lascoux for comments on the manuscript.

Author information

Authors and affiliations.

Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, Uppsala, SE-753 36, Sweden

Hans Ellegren

Institute of Evolutionary Sciences, French National Centre for Scientific Research (CNRS), University of Montpellier 2, Place E. Bataillon, Montpellier, 34095, France

Nicolas Galtier

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Hans Ellegren or Nicolas Galtier .

Ethics declarations

Competing interests.

The authors declare no competing financial interests.

PowerPoint slides

Powerpoint slide for fig. 1, powerpoint slide for fig. 2, powerpoint slide for fig. 3, powerpoint slide for fig. 4.

(Also known as genetic polymorphism). Variation in a DNA sequence between distinct individuals (or chromosomes) of a given species (or population).

Allelic variants of proteins that can be separated by electrophoresis based on differences in charge or structure.

The complete spread of a mutation in the population such that it replaces all other alleles at a site.

Fluctuation of allele frequency among generations in a population owing to the randomness of survival and reproduction of individuals, irrespective of selective pressures.

( N e ). The number of breeding individuals in an idealized population that would show the same amount of genetic drift (or inbreeding, or any other variable of interest) as the population under consideration.

( N c ).The number of individuals in a population.

A form of selection in which the selective advantage or disadvantage of a genotype is dependent on its frequency relative to other genotypes.

A sharp and rapid reduction in the size of a population.

The probability that two randomly sampled gene copies in a population carry distinct alleles; a measure of the genetic diversity.

The idea, based on the concept of diminishing returns, that selection can only improve a trait up to a point at which the next incremental improvement will be overwhelmed by the power of genetic drift.

A retrospective model of the distribution of gene divergence in a genealogy.

Chromosomal segments carried by two or more individuals that are identical because they have been inherited from a common ancestor, without recombination.

A form of genome evolution in which the number of sets of chromosomes increases.

The non-random association of alleles at two loci, often but not always due to physical linkage on the same chromosome.

Elimination or reduction of genetic diversity in the neighbourhood of a beneficial allele that increases in frequency in the population, typically after an environmental change.

Selective sweeps in which the beneficial allele corresponds to a single, new mutation appearing after an environmental change.

Selective sweeps in which the beneficial allele exists before an environmental change (thus representing standing variation) and is initially neutral or even slightly deleterious, or appears several times independently.

Pervasive reduction of genetic diversity owing to recurrent selective sweeps.

Reduction of genetic diversity owing to selection against deleterious mutations at linked loci.

New alleles entering the population by hybridization with members of a differentiated population or even a different species.

The change in allele frequency at a locus that itself is not necessarily affected by selection but is genetically linked to a locus that is.

The distribution of the frequency of variants across biallelic loci in a population sample.

A mating system in which males mate with more than one female.

A mating system in which females mate with more than one male.

When an organism of a particular sex carries two different types of sex chromosomes: that is, males of many animals and plants and females of birds, some fish and lizards, butterflies, and others.

The situation when there is only one chromosome copy in an individual of a diploid species, as for the X chromosome in males of many species.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Ellegren, H., Galtier, N. Determinants of genetic diversity. Nat Rev Genet 17 , 422–433 (2016). https://doi.org/10.1038/nrg.2016.58

Download citation

Published : 06 June 2016

Issue Date : July 2016

DOI : https://doi.org/10.1038/nrg.2016.58

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Population status and genetic assessment of mugger (crocodylus palustris) in a tropical regulated river system in north india.

  • Surya Prasad Sharma
  • Mirza Ghazanfarullah Ghazi
  • Syed Ainul Hussain

Scientific Reports (2024)

Population genetics of the endangered Clanwilliam sandfish Labeo seeberi: considerations for conservation management

  • Clint Rhode
  • Shaun F. Lesch
  • Martine S. Jordaan

Aquatic Sciences (2024)

Microsatellite and mtDNA-based exploration of inter-generic hybridization and patterns of genetic diversity in major carps of Punjab, Pakistan

  • Shakeela Parveen
  • Khalid Abbas
  • Laiba Shafique

Aquaculture International (2024)

Employing plant DNA barcodes for pomegranate species identification in Al-Baha Region, Saudi Arabia

  • Fatima Omari Alzahrani
  • Houda Maaroufi Dguimi
  • Sonia Zaoui

Journal of Umm Al-Qura University for Applied Sciences (2024)

Metabolomic profiling of wild rooibos (Aspalathus linearis) ecotypes and their antioxidant-derived phytopharmaceutical potential

  • C. Wilkinson
  • N. P. Makunga

Metabolomics (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research about genetic diversity

Frontiers for Young Minds

  • Download PDF

What Is Genetic Diversity and Why Does it Matter?

research about genetic diversity

All living things on Earth contain a unique code within them, called DNA. DNA is organised into genes, similar to the way letters are organised into words. Genes give our bodies instructions on how to function. However, the exact DNA code is different even between individuals within the same species. We call this genetic diversity. Genetic diversity causes differences in the shape of bird beaks, in the flavours of tomatoes, and even in the colour of your hair! Genetic diversity is important because it gives species a better chance of survival. However, genetic diversity can be lost when populations get smaller and isolated, which decreases a species’ ability to adapt and survive. In this article, we explore the importance of genetic diversity, discuss how it is formed and maintained in wild populations, how it is lost and why that is dangerous, and what we can do to conserve it.

Why is Everything and Everyone A Little Bit Different?

Earth contains millions of different species that all look different from one another. While some species look more similar to each other than others, like lions and tigers, they will still have differences between them. Even within each species, individuals look similar to each other but they are not identical. These differences and similarities are because of many small differences between individuals’ genes . All organisms have DNA and each individual’s DNA is organised into genes. These contain the instructions to build our bodies. This is similar to the way that letters are combined to make words that then make a story. DNA can be seen as the letters, genes the words, and their instructions are the story. Small differences in DNA might change blue eyes to green, or a butterfly’s wings from black to white, like how a word can change when you replace a letter.

The combined differences in the DNA of all individuals in a species make up the genetic diversity of that species. Genetic diversity causes individuals to have different characteristics, which we can see even in our groceries. Although all tomatoes belong to the same species, the tomatoes we eat are hugely diverse, ranging from giant beefeater tomatoes to tiny cherry tomatoes. There are also hundreds of apple varieties ( Figure 1 ), that range from red to green, tart to sweet, and some apples even have pink flesh inside! Genetic diversity is what makes these types of tomatoes and apples look so different [ 1 ]. Genetic diversity is also seen in animals. For example, dogs can be large enough to pull sledges or small enough to sit nicely on your lap. All dogs are from the same species, but they look different because of genetic diversity! Though often more difficult to see, genetic diversity is also extremely important in wild animals and plants.

Figure 1 - An example of genetic diversity in the food we eat.

  • Figure 1 - An example of genetic diversity in the food we eat.
  • All these apples are one species. Different alleles of the genes that control their colour cause the apples to be green, yellow, red, or almost purple. Differences in the alleles that control flavour make each type taste different.

How is Genetic Diversity Generated?

Changes to an individual’s DNA are called mutations ( Figure 2 ). Mutations can arise when mistakes are made while cells are copying DNA, like making a spelling mistake when copying a word. These mutations make up a species’ genetic diversity. Over generations, more and more mistakes are made, leading to more mutations. Most mutations are either harmful or have no impact at all, but sometimes these mutations can cause changes that are helpful for a species. The individuals that have these helpful mutations might have greater chances of survival, and have more babies as a result [ 2 ]. This is adaptation . When a mother and a father have babies, the DNA of their baby is a mix of the parents’ DNA. Babies have two copies of every gene in their DNA, one from each parent. Copies of the same gene with different mutations are called alleles . When parents make a sperm or an egg, alleles in each parent are shuffled and recombined, and only one allele of a gene ends up in each sperm or egg cell. When the reshuffled alleles from a mother and a father are combined when sperm and eggs join, new mixes of alleles are created in the babies [ 2 , 3 ]. The mixing of alleles allows for new combinations of mutations and characteristics, adding to a species’ genetic diversity ( Figure 2 ).

Figure 2 - (A) Genetic diversity is generated when mutations create new alleles over time.

  • Figure 2 - (A) Genetic diversity is generated when mutations create new alleles over time.
  • Mixing alleles from parents creates new combinations of alleles in their babies. Organisms that can clone themselves, like bacteria, can pass alleles to each other. Each coloured dot represents a different allele. (B) Genetic diversity can be lost when habitat loss divides populations or when buildings or highways isolate populations. (C) Creating protected areas where individuals from different populations can migrate and spread their genes can help a species to maintain its genetic diversity.

Not all species need a mother and a father to make a baby. Bacteria can clone themselves ( Figure 2 ) and directly pass their alleles from a parent to its identical clone [ 3 ]. Any mistakes in the parent’s DNA will be passed on to the clone. Amazingly, bacteria can also give alleles to each other, even if they are not related! This is a unique way simple species like bacteria can increase their genetic diversity, without relying on the mixing of alleles between a mother and a father [ 4 ].

Why is Genetic Diversity Important?

When a species has a lot of differences in its DNA, we say that genetic diversity is high [ 2 ]. In species with high genetic diversity, there are lots of mutations in the DNA, which cause differences in the way individuals look as well as differences in important traits that we cannot see [ 2 ]. This is called adaptation . For example, some types of apples can grow better in hotter environments, thanks to their genes. The variety of characteristics in species with high genetic diversity means they are more likely to successfully cope with changes in their environment. A great example of this is seen in the peppered moths during the industrial revolution [ 4 ]. Natural genetic diversity in peppered moths produced different wing colours, ranging from light to dark. Before the Industrial Revolution, peppered moths with light wings were more common because they had the best camouflage on white tree trunks. The Industrial Revolution caused a lot of air pollution that started to cover tree trunks, making them black. Light-winged moths were no longer camouflaged and were easy prey for birds. But dark-winged individuals were now hidden! This meant that dark moths had an advantage and were more likely to live long enough to have babies. The babies of dark moths were also dark because of the alleles they inherited from their parents, so they were also more likely to survive. The dark moths had higher fitness and became more common as a result [ 4 ].

What Happens When Genetic Diversity is Low?

When few mutations are found in the DNA of a species, genetic diversity is said to be low [ 2 ]. Low genetic diversity means that there is a limited variety of alleles for genes within that species and so there are not many differences between individuals. This can mean that there are fewer opportunities to adapt to environmental changes. Low genetic diversity often occurs due to habitat loss. For example, when a species’ habitat is destroyed or broken up into small pieces, populations become small. Small, fragmented populations can lead to loss of genetic diversity because fewer individuals can survive in the remaining habitat so fewer individuals breed to pass on their alleles. In small populations, the choice of mates is also limited. Over time, individuals will all become related and will be forced to mate with relatives. This is inbreeding . Inbred animals often have two identical alleles for their genes because the same gene was passed on from both parents. If this allele has harmful mutations, an inbred baby can be unhealthy. This is called inbreeding depression [ 2 ].

If genetic diversity gets too low, species can go extinct and be lost forever. This is due to the combined effects of inbreeding depression and failure to adapt to change. In such cases, the introduction of new alleles can save a population. This is called genetic rescue [ 2 ]. In the 1990s conservation scientists had to use genetic rescue to save the Florida panther, which was threatened by extinction due to low genetic diversity ( Figure 3 ) [ 5 ]. Very few Florida panthers remained and their genetic diversity was extremely low. Many Florida panther babies were sick because of inbreeding depression. A closely related panther with high genetic diversity was present in Texas. Texan panthers were moved to Florida to have babies with the Florida panthers. This increased genetic diversity because of the mixing of alleles we spoke about before. Soon after the Texan panthers arrived, many healthy kittens were born [ 5 ].

Figure 3 - (A) The Florida panther was once widespread, with high genetic diversity.

  • Figure 3 - (A) The Florida panther was once widespread, with high genetic diversity.
  • (B) Hunting and habitat loss reduced population size and resulted in very low genetic diversity and inbreeding. (C) Eight female panthers from Texas were moved to Florida to breed with Florida panthers. (D) When the Texas and Florida panthers bred, new alleles were introduced into the population, helping the Florida panther population become bigger and healthier over time.

What’s Happening to Genetic Diversity Around the World?

We hear a lot about the loss of species in the world, but we are also seeing a loss of genetic diversity within species. The increasing number of people on Earth and our increasing use of natural resources has reduced space and resources for wild species. Over time, many wild animal and plant populations have become smaller or more isolated. Many species have also gone through local extinctions. This has led to a global loss of genetic diversity. Scientists think that the genetic diversity within species may have declined by as much as 6% globally since the Industrial Revolution [ 6 ]. This means that many species are less able to adapt when facing new challenges, like climate change, pollution, and new diseases. If too much genetic diversity is lost, more and more species could become unhealthy and in need of conservation actions similar to the Florida panther. However, there are steps we can take to conserve and restore genetic diversity across many species.

How Do We Stop Genetic Diversity Loss?

We must preserve and protect genetic diversity. This can be done through the conservation of our remaining wild populations [ 2 ]. We can use nature reserves and wildlife bridges to reconnect wild populations that have become separated by our cities and highways. We can also restore habitats, because this will allow wild populations to get bigger. Sometimes we can even remove harmful stressors and pests so that populations can naturally regrow. We can also reintroduce species that have been lost from habitats they used to live in. Taken together, these strategies can help stop genetic diversity loss. It is important to protect genetic diversity because it is the foundation for healthy species. Healthy species are necessary for human health and for the health of the whole planet!

Gene : ↑ A section of DNA that contains the instructions for a trait.

Genetic Diversity : ↑ The overall diversity in the DNA between the individuals of a species.

Mutation : ↑ A change in an organism’s DNA. This can be a change of a single letter or a much bigger change of hundreds of letters at once.

Adaptation : ↑ The process of a species changing in order to better survive in its environment.

Alleles : ↑ Different variations of a gene caused by mutations. Many species have two alleles for every gene, one copy from each parent.

Inbreeding : ↑ Breeding between closely related individuals. Inbreeding often happens when populations are small and there are few options for mating. Inbred individuals are usually less healthy.

Inbreeding Depression : ↑ Inbred individuals share ancestors and are more likely to have identical copies of genes. If these genes contain harmful mutations, they will be expressed and cause lower health of inbred individuals.

Genetic Rescue : ↑ A conservation strategy, new individuals are moved into a population to increase genetic diversity and improve population health.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

[1] ↑ Meyer, R., and Purugganan, M. 2013. Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14:840–52. doi: 10.1038/nrg3605

[2] ↑ Frankham, R., Ballou, J. D., and Briscoe, D. A. 2002. Introduction to Conservation Genetics. Cambridge: Cambridge University Press. p. 617.

[3] ↑ Emamalipour M., Seidi K., Zununi V. S., Jahanban-Esfahlan A., Jaymand M., Majdi H., et al. 2020. Horizontal gene transfer: from evolutionary flexibility to disease progression. Front. Cell. Dev. Biol. 8:229. doi: 10.3389/fcell.2020.00229

[4] ↑ Cook, L. M., and Saccheri, I. J. 2013. The peppered moth and industrial melanism: evolution of a natural selection case study. Heredity 110:207–12. doi: 10.1038/hdy.2012.92

[5] ↑ Johnson, W. E., Onorato, D. P., Roelke, M. E., Land, E. D., Cunningham, M., Belden, R. C., et al. 2010. Genetic restoration of the Florida panther. Science . 329:1641–5. doi: 10.1126/science.1192891

[6] ↑ Leigh, D. M., Hendry, A. P., Vázquez-Domínguez, E., and Friesen, V. L. 2019. Estimated six per cent loss of genetic variation in wild populations since the industrial revolution. Evol. Appl. 12:1505–12. doi: 10.1111/eva.12810

  • Introduction to Genomics
  • Educational Resources
  • Policy Issues in Genomics
  • The Human Genome Project
  • Funding Opportunities
  • Funded Programs & Projects
  • Division and Program Directors
  • Scientific Program Analysts
  • Contact by Research Area
  • News & Events
  • Research Areas
  • Research investigators
  • Research Projects
  • Clinical Research
  • Data Tools & Resources
  • Genomics & Medicine
  • Family Health History
  • For Patients & Families
  • For Health Professionals
  • Jobs at NHGRI
  • Training at NHGRI
  • Funding for Research Training
  • Professional Development Programs
  • NHGRI Culture
  • Social Media
  • Broadcast Media
  • Image Gallery
  • Press Resources
  • Organization
  • NHGRI Director
  • Mission & Vision
  • Policies & Guidance
  • Institute Advisors
  • Strategic Vision
  • Leadership Initiatives
  • Diversity, Equity, and Inclusion
  • Partner with NHGRI
  • Staff Search

 alt=

Diversity in Genomic Research

Diversity among genomics research participants is essential for improving the health of everyone.

The Big Picture

Human DNA sequences (that is, our genomes) are more than 99.9% identical among people.

The 0.1% genomic differences come from variations among the nearly 3 billion bases (or “letters”) in our DNA; sometimes these variations can influence our chances of developing a disease.

So far, most people who have given permission for their DNA to be used for research are from European ancestry, making many populations from across the globe underrepresented in genomics research.

The National Human Genome Research Institute (NHGRI) is working to enhance the diversity of people who participate in genomics research, thereby improving our knowledge of human genomic variation and genomic information for all populations.

How it affects you

The code embedded within the human genome is complex, and genomics research has only scratched the surface of determining everything there is to know about what makes us all different at the DNA level.

Historically, the people who have provided their DNA for genomics research have been overwhelmingly of European ancestry, which creates gaps in knowledge about the genomes from people in the rest of the world. Scientists are now expanding their data collection to better understand how genomics can be used to improve the health and wellbeing of all people.

What makes human genomes diverse?

While humans are similar in most ways, our biological processes make each of us unique. Many aspects of those processes are encoded in our DNA, which is based on the sequence of the four letters of life — A, C, G, and T. The dissimilarities among human genomes, referred to as variants, come from differences in our DNA sequences.

The human genome is more than 3 billion letters long, which means that a variant could reside at any place among those letters. These variants occur at different frequencies across different human populations; some are rare and unique to specific families, while others are common and found across populations.

Genomes from distinct populations differ due to multiple factors, including who people decide to reproduce with, as well as human migration patterns. Also, in specific populations, certain variants became widespread as they provided an advantage that helped them adapt to environmental changes.

Genomes from distinct populations differ due to multiple factors

Factors displayed in the graphic include: reproduction, migration and random fluctuations.

Why should researchers collect genomic data from diverse populations?

Based on work completed before and after the Human Genome Project, researchers found that the genome sequences of human populations have changed significantly over 250,000 years of our species’ expansion and migration across the Earth.

Even with the high degree of similarity between any two human genomes, enough differences exist that it is not appropriate to use a single, or even a few, genomes to represent the world’s populations. This highlights that the original human genome reference sequence, produced by the Human Genome Project and based on just a handful of research participants, was just the starting point for human genomics.

To address this limitation, efforts are underway to create human reference genome sequences that better represent diverse populations. NHGRI funds the Human Pangenome Reference Program , which is generating a collection of reference genome sequences that better represent human diversity.

The following graphic displays ancestry populations included in large-scale genomic studies by percentages (from highest to lowest): 78% European; 10% Asian; 8.5% Unreported; 2% African; 1% Hispanic; and 0.5% Other Countries

The percentage of ancestry populations included in large-scale genomic studies is overwhelmingly European.

The  percentage  of ancestry populations included in large-scale genomic studies is overwhelmingly European.

How does studying diverse human genomes improve health outcomes?

Every human has some baseline genetic risk of developing a given disease. Extensive research has been performed to both understand and learn how to respond to these risks. In some cases, the same variant consistently causes a disease (e.g., Huntington's disease and cystic fibrosis), but this might not be the case for more complex diseases (e.g., coronary artery disease, obesity, cancer and Alzheimer’s disease).

By including populations that reflect the full diversity of human populations in genomic studies, researchers can identify genomic variants associated with various health outcomes at the individual and population levels. This way, researchers can better define a person's risk of developing a specific disease and design a clinical management strategy that is tailored to the individual. In addition, they can pursue genomic medicine strategies that benefit specific populations.

African doctor with a patient

Why has enhancing diversity in genomics research been a difficult task?

Increasing the representation of diverse participants in genomics research requires an investment of both resources and time to intentionally establish trusting and respectful long-term relationships between communities and researchers. To ensure that genomics research is both equitable and inclusive, it is crucial for the genomics research workforce to reflect a similar diversity as the communities that the research is intended to serve.

In the past, both inaccessible and insufficient communication left some research participants unclear about the benefits of their participation and how their data would be used after the studies concluded. To overcome this, researchers must seek to understand people’s reasons for not participating in genomic studies and to communicate with participants in a more accessible manner. This can take additional time, effort and resources, which may discourage some researchers from including these important, diverse populations in their studies. However, such exclusion can lead to notable gaps in scientific understanding and potentially reenforce existing disparities in genomics research.

Diverse group of scientists

What are some genomics research projects that are enhancing the diversity?

Genomics researchers have initiated dozens of research projects to enhance the representation of research participants in genomics research. These studies are addressing a variety of research topics, including the effects of genomic diversity on disease risk, how to tailor genomic medicine for underrepresented populations, the impact of genomics research on diverse and the history of the human population.

NIH's All of Us Research Program is working to build a diverse health resource by collecting genome-related data and other information from about 1 million people. The Global Alliance for Genomics and Health (GA4GH) is developing a framework for storing, analyzing and sharing genomic data among international researchers. The Human Cell Atlas aims to be a resource that includes in-depth information about all cell types found in people across the world.

How is NHGRI helping to improve diversity in genomics research?

NHGRI is dedicated to increasing diversity of the genomics workforce . In addition, NHGRI supports projects that work to increase the diversity of people participating in genomics research, including:

  • The 1,000 Genomes Project (2002 - 2015) The most extensive public catalog of human variation and genomic data, with over 2,000 genomic samples from 26 populations across the North and South America, Africa, Asia and Europe.  
  • Human Heredity and Health in Africa (H3Africa) (2012 - 2022) The largest pan-African genomic research consortium that investigates the genomics of disease in Africa. The project also aims to build a sustainable African genomics research enterprise. This project is a collaborative effort that also involves the NIH Common Fund, the Wellcome Trust and the African Academy of Sciences.  
  • Polygenic Risk Score (PRS) Diversity Consortium (2021 - 2027) The consortium uses insights from genomic diversity to predict health and disease risk across diverse populations using a PRS approach.  
  • Implementing Genomics in Practice (IGNITE) Network (2018 - 2022) This network assesses approaches for real-world applications of genomic medicine in diverse clinical settings.  
  • Electronic Medical Records and Genomics (eMERGE) Network (2020 - 2025) This network establishes protocols and methodologies for improved genomic risk assessments for diverse populations and to integrate their use in clinical care.

Companion Fact Sheet

Diversity in Genomics Workforce

Last updated: May 9, 2023

  • Open access
  • Published: 15 May 2024

Genetic diversity, phylogeography, and maternal origin of yak ( Bos grunniens )

  • Xingdong Wang 1 , 2 ,
  • Jie Pei 1 , 2 ,
  • Lin Xiong 1 , 2 ,
  • Pengjia Bao 1 , 2 ,
  • Min Chu 1 , 2 ,
  • Xiaoming Ma 1 , 2 ,
  • Yongfu La 1 , 2 ,
  • Chunnian Liang 1 , 2 ,
  • Ping Yan 1 , 2 &
  • Xian Guo 1 , 2  

BMC Genomics volume  25 , Article number:  481 ( 2024 ) Cite this article

Metrics details

There is no consensus as to the origin of the domestic yak ( Bos grunniens ). Previous studies on yak mitochondria mainly focused on mitochondrial displacement loop (D-loop), a region with low phylogenetic resolution. Here, we analyzed the entire mitochondrial genomes of 509 yaks to obtain greater phylogenetic resolution and a comprehensive picture of geographical diversity.

A total of 278 haplotypes were defined in 509 yaks from 21 yak breeds. Among them, 28 haplotypes were shared by different varieties, and 250 haplotypes were unique to specific varieties. The overall haplotype diversity and nucleotide diversity of yak were 0.979 ± 0.0039 and 0.00237 ± 0.00076, respectively. Phylogenetic tree and network analysis showed that yak had three highly differentiated genetic branches with high support rate. The differentiation time of clades I and II were about 0.4328 Ma, and the differentiation time of clades (I and II) and III were 0.5654 Ma. Yushu yak is shared by all haplogroups. Most (94.70%) of the genetic variation occurred within populations, and only 5.30% of the genetic variation occurred between populations. The classification showed that yaks and wild yaks were first clustered together, and yaks were clustered with American bison as a whole. Altitude had the highest impact on the distribution of yaks.

Conclusions

Yaks have high genetic diversity and yak populations have experienced population expansion and lack obvious phylogeographic structure. During the glacial period, yaks had at least three or more glacial refugia.

Peer Review reports

Introduction

The Qinghai-Tibet Plateau (QTP), one of the largest and youngest plateaus in the world, was formed around 40 million years ago (Ma) following the collision of the Indian tectonic plate with the Asian plate through several uplift events [ 1 ]. A large number of endemic species have appeared in the QTP and adjacent areas [ 2 ], due to its unique ecological environment. Yak is one of the representative species of QTP. At present, there are about 17.5 million yaks in the world, of which 94.4% are distributed in China [ 3 ]. Contemporary highland pastoralists rely on the strength and hardiness of domestic yak for transportation across vast mountainous terrain and for supplies of milk, meat, fiber, and dung for fuel [ 4 ]. Yaks play a vital role in socioeconomic development, pasture ecosystem maintenance, and agricultural biodiversity conservation in the QTP region [ 5 ]. Hence, yaks are referred to as “all-round animals” [ 6 ]. Morphological data suggest that yaks outside China originated from the Chinese yak [ 7 ]. According to previous studies, yaks were first domesticated in Tibet [ 8 ]. The combined archaeological and mitochondrial DNA (mtDNA) evidence suggests that Qinghai is one of the places where the yak either originated or was domesticated [ 7 , 9 , 10 ]. Qiu et al. re-sequenced the whole yak genome to find that the domestication of yak occurred 7 300 years ago [ 11 ]. However, the available genetic data do not provide a definitive conclusion and it is not known whether yak domestication occurred as a single event or multiple events in a single wild gene pool [ 7 ].

Domestication of animals is one of the major achievements of human civilization [ 12 ], although there had been many studies on yak ancestry, origin, and domestication, the answers to these questions are not clear. Studies have suggested an association between yak fossils and early human activity in Tibet [ 13 ], suggesting that yaks were first domesticated in Tibet [ 8 ]. The combined archaeological and mtDNA evidence suggests that Qinghai is one of the places where the yak either originated or was domesticated [ 7 , 9 , 10 ]. Qiu et al. re-sequenced the whole yak genome to find that the domestication of yak occurred 7 300 years ago [ 11 ]. However, the available genetic data do not provide a definitive conclusion and it is not known whether yak domestication occurred as a single event or multiple events in a single wild gene pool [ 7 ]. mtDNA is a good molecular marker for studying animal origins, evolution, classification, and population genetic diversity [ 14 ]. In recent years, the mtDNA D-Loop region sequence has been widely used to evaluate the origin, domestication, and genetic diversity of yaks [ 15 , 16 ], cytochrome b ( Cytb ) is also a commonly used marker gene for studying the molecular genetic diversity of populations. Qi et al. [ 17 ] conducted a cluster analysis of the mtDNA D-Loop region and Cytb gene of 428 yak individuals from 29 yak populations in China and its surrounding countries. Phylogenetic analysis showed that 29 yak populations were clustered into three categories. In addition to mtDNA D-Loop region and Cytb , other regions can also be used for genetic diversity and phylogenetic analysis. Zhao et al. [ 18 ] determined the mtDNA Cytochrome c oxidase polypeptide III ( COIII ) sequence of 111 yak individuals from 11 yak populations in Tibet, indicating that Tibetan yaks have rich genetic diversity. Recently, researchers have discussed the importance of conducting complete mtDNA sequencing, because such analysis can produce detailed genetic maps when the sample size is large enough [ 19 ]. Wang et al. [ 10 ] sequenced the whole mitochondrial genome of yak, and used the 10 710 bp protein coding sequence except NADH dehydrogenase6 ( ND6 ) for phylogenetic analysis. It was found that wild yaks were divided into three categories and domestic yaks were divided into two categories. It is speculated that there may be three maternal origins of wild yaks and two maternal origins of domestic yaks. However, most studies to date have been limited to the mitochondrial D-loop region and Cytb gene and have failed to clearly distinguish some important ancient clades in the domestic yak [ 15 , 20 , 21 ], the D-loop region is highly variable and information-rich in determining intraspecific diversity but often has parallel mutations [ 22 , 23 ]. Recent studies have emphasized the importance of complete mtDNA sequencing [ 19 ], as this allows the investigation of 18 times as many sites as the D-loop region [ 10 ], and more detailed information can be obtained.

At present, studies using the complete mitochondrial genome of the yak have mainly focused on a single yak genetic resource [ 24 , 25 , 26 ]. Also, there are few studies on the genetic evolution of the yak that have used the complete mitochondrial genome. Wild yaks that share a common ancestor with domestic yaks are still in existence, making the yak an excellent model species for studying the domestication of large animals. In our previous studies [ 27 , 28 ] and here, we collected samples of genetic resources from every yak breed (Fig.  1 ) in China to ensure a large sample size for a detailed genetic map. The genetic diversity and phylogenetic structure of the yak were analyzed by the sequencing of the complete mitochondrial genome sequencing of these samples, which also provides a foundation for the effective protection and utilization of yak genetic resources.

figure 1

Collection locations of experimental samples. The details of the yak population represented at each sampling site are provided in Supplementary Table S1

Identification and analysis of haplotypes

In total, 278 haplotypes were defined from 509 yaks. Of these, 28 haplotypes were shared by different breeds, and 250 haplotypes were unique to a specific breed. Among the 278 haplotypes, the H8 haplotype was the most common (found 68 times) and shared by 18 yak breeds except wild yak, Tianzhu white yak, and Pamir yak. In total, 23 haplotypes were identified in 25 wild yaks, of which only the H28 haplotype was shared by the wild yak and Changtai yak; the other 22 haplotypes were unique to wild yaks. In total, 11 haplotypes were defined in Tianzhu white yak; the H20 haplotype was shared by the Tianzhu white, Changtai, Huanhu, and Qinghai Plateau yaks, while the other 10 haplotypes were unique to the Tianzhu white yak. Twenty-two haplotypes were defined in 25 Pamir yaks; only the H150 haplotype was shared by the Pamir yak and Sibu yak, while the other 21 haplotypes were unique to the Pamir yak. Wild yak (96.65%), Pamir yak (95.45%), Tianzhu white yak (90.91%), Jiulong yak (89.47%), Yushu yak (81.25%), and Xueduo yak (80.95%) exhibited higher specific proportions of haplotypes (Table  1 ); for detailed information please see Supplementary Table S2 .

Phylogenetic analysis of yaks

Phylogenetic analysis was performed using 278 haplotypes (with Bison bison [GU947006.1, GU946996.1] as outgroups). The basic topological structures of the maximum likelihood (ML) and MrBayes phylogenetic trees were the same, showing three highly differentiated genetic clades with high support rates (Fig.  2 ). Most of the haplotypes were distributed within clades I and II. The yak phylogenetic tree contained six haplogroups (A-F) forming two clades, with clade I containing haplogroups A, C, E, and F and clade II containing haplogroups B and D; A-C were the three main haplogroups. Haplogroup A included all the yak populations, while haplogroup B included all populations except Tianzhu white yak and haplogroup C included all populations except Qinghai plateau yak, Tianzhu white yak, and Pamir yak. Haplogroups D-F contained only a few yak breeds. Among which the haplogroup D only contained the Yushu yak, wild yak, Sunan yak and Pamir yak; Haplogroup E only contained the Yushu yak; Haplogroup F only contained the Yushu yak and wild yak. Among all yak populations, the Yushu yak was common to all haplogroups.

figure 2

Phylogenetic tree of 278 haplotypes in the yak. Clades I, II, and III represent the three clades of yak. A, B, C, D, E, and F represent the six haplogroups of yak

Combined with the geographical distribution of yak populations, a haplotype network map was constructed based on yak mtDNA (Fig.  3 ). Consistent with the phylogenetic tree, the haplotype network diagram also revealed that yaks were divided into three clades and six haplogroups. The three main haplogroups A-C were distributed in a star-shaped radial pattern; the H8 and H36 haplotypes were shared by multiple individuals and located in the center of the star-shaped radial. The yaks in haplogroup A were distributed in all taxa. The yaks in haplogroup B were distributed in all yak distribution areas except Tianzhu in Gansu. The yaks in haplogroup C were distributed in all yak distribution areas except the Pamir region, Tianzhu in Gansu, and parts of Qinghai. Only the wild and Yushu yaks were widely distributed with the Yushu yak distributed in all haplogroups.

figure 3

Network diagram of haplotypes. ( a ) The total network of mtDNA of 509 individuals, with different colors indicating yaks from different provinces. The network diagram of haplogroups ( b ) A, ( c ) B, and ( d ) C

Genetic diversity analysis of yak mtDNA

Higher haplotype diversity ( H d) and nucleotide diversity ( P i) values are indicative of greater genetic diversity in a population. Genetic diversity analysis showed that the haplotype and nucleotide diversities of the complete mitochondrial genome sequences of 509 individuals from 21 yak breeds/populations were 0.979 ± 0.0039 and 0.00237 ± 0.00076, respectively, indicating a high overall genetic diversity in the yak. The haplotype diversity of wild yak (0.993 ± 0.013), Sunan yak (0.996 ± 0.015), Pamir yak (0.990 ± 0.014), and Xueduo yak (0.992 ± 0.015) was relatively high, while Muli yak (0.810 ± 0.063) and Tianzhu white yak (0.830 ± 0.068) exhibited the lowest haplotype diversity. The nucleotide diversity of wild yak (0.00352 ± 0.00145), Leiwuqi yak (0.00319 ± 0.00065), Pamir yak (0.00309 ± 0.00018), and Tibet Gaoshan yak (0.00309 ± 0.00067) was relatively high; the lowest nucleotide diversity was of Tianzhu white yak (0.00034 ± 0.00013). The comprehensive analysis of haplotype and nucleotide diversities revealed that the genetic diversity of the wild yak was the highest, and that of the Tianzhu white yak was the lowest. Additional details including variable loci ( S ), H d, and P i are shown in Table  2 .

Population dynamics analysis

By calculating the historical population dynamics of each haplogroup of yak (Table  3 ), the Tajima’s D value (-1.230) of the Total group was less than 0, and P  > 0.05. Fu and Li’s D test (-7.262 ( P  < 0.05)) and Fu and Li’s F test (-4.543 ( P  < 0.05)) showed that the neutral test results of the Total group were contradictory. The mismatch analysis showed that the SSD and H rag values of the yaks in the Total group were 0.0084 ( P  > 0.05) and 0.0038 ( P  > 0.05), respectively, this results indicated that the yak population had experienced the expansion. In haplogroups A and C, the results of neutral test showed P  < 0.05, and the results of mismatch distribution showed P  > 0.05, indicating that haplogroups A and C had experienced population expansion. In the haplogroup B, the results of the neutral test were P  > 0.05, and the results of the mismatch distribution were P  > 0.05, resulting in a contradiction in the calculation results of the historical population dynamics of the haplogroup B. The historical population dynamics of yaks (Supplementary Fig. S1 ) were analyzed by Bayesian Skyline Plot (BSP), revealing that each haplogroup of yaks experienced large-scale expansion. Due to the small number of individuals in the haplogroups D, E and F, the neutrality test and mismatch distribution analysis were not performed. We used the PermutCpSSR-2.0 software to analyze the N st and G st values at the population level. The results showed that N st = 0.05225 >  G st = 0.03993 ( P  > 0.05). Analysis of Molecular Variance (AMOVA) analysis of the yak whole-mtDNA genomes showed that most (94.70%) of the genetic variation in yaks occurred within populations with only 5.30% observed between populations. This indicated the lack of obvious phylogeographical structure in yaks.

Estimation of differentiation time

Based on the Bayesian method, the differentiation time of the respective yak clades was calculated. The differentiation time of the two major clades (I and II) was about 0.4328 Ma with a 95% highest posterior density (HPD) of 0.3218–0.5326 Ma. The differentiation time of clades I and II and clade III was about 0.5654 Ma, with a 95% HPD of 0.4283–0.7162 Ma.

Taxonomic status of the yak in the Bovidae

The Caprinae, including Ovis ammon (NC_047196.1) and Ovis aries (NC_001941.1), were selected as an outgroup for phylogenetic analysis. The results showed that Bos grunniens and Bos mutus were located together, and yaks as a whole clustered with Bison bison . The overall clustering relationship is shown in Fig.  4 : ((((((( Bos grunniens + Bos mutus ) +  Bison bison ) + ( Bos gaurus + Bos javanicus )) + (( Bos taurus + Bos primigenius + Bos indicus ) +  Bison bonasus )) + ((( Bubalus bubalis + Bubalus arnee ) + ( Bubalus depressicornis + Bubalus quarlesi )) + ( Syncerus caffer )) +  Pseudoryx nghetinhensis ) + (( Boselaphus tragocamelus + Tetracerus quadricornis ) +  Tragelaphus spekii )) + ( Ovis ammon  +  Ovis aries )). Based on the differentiation time of the respective Bovidae species, the differentiation time of Bos grunniens and Bison bison was 2.2011 Ma, the differentiation time of Bos grunniens, Bos gaurus , and Bos javanicus was 3.9735 Ma, and the differentiation time of Bos grunniens and Bos taurus was 4.7392 Ma.

figure 4

Taxonomic status of the yak in the Bovidae family

Population distribution dynamics of the yak

The dynamic distribution of the yak in different periods was simulated by the MaxEnt model based on 11 selected environmental factors. Similar average Area under the curves (AUCs) of training (0.918) and test (0.873) data in different periods, indicated high accuracy of MaxEnt model simulation (Supplementary Table S3 ). The contribution rate of each environmental factor was tested by the knife-cut method. Elevation (74.6%) and annual temperature range (0.0009%) contributed the highest and lowest, respectively (Supplementary Table S4 ). A dynamic distribution map of yaks from the last interglacial (LIG) to 2100 was constructed (Fig.  5 ), showing that yaks were mainly located at the edge of the QTP during the last glacial maximum (LGM).

figure 5

Dynamic distribution of yaks from the last interglacial period to 2100. LIG: last interglacial. LGM: last glacial maximum. MH: Mid-Holocene

In the neutral test, values of Tajima’s D  > 0 and P  < 0.05 indicated that the population underwent a bottleneck effect and equilibrium selection, while Tajima’s D  < 0 and P  < 0.05 indicated that the population experienced the expansion and directional selection [ 29 ]. Fu and Li’s D*&F* indices can be used for neutral test [ 30 ]. However, the historical dynamics of the population cannot be inferred only by the Tajima’s D value, which is usually analyzed by a combination of neutral test and mismatch distribution. In this experiment, the neutral test results of the Total group were contradictory, and the mismatch analysis showed that the Total group experienced population expansion. In haplogroup B, the neutral test results are contrary to the mismatch distribution results. Only when the neutral test was negative and P  < 0.05, it indicated that the population was significantly deviated from the neutral mutation, indicating that it was intervened by artificial or natural selection, and the curve of mismatch distribution analysis was unimodal distribution, indicating that the population was in an expanding state. Due to the neutral test results of Total group and haplogroup B are contradictory and the neutral test results are contrary to the mismatch distribution results, which makes it difficult to judge whether the population has expanded during evolution. The result of mismatch distribution is based on the ideal state, which is limited in practical applications. In fact, the historical population dynamics are often more complex than the parameter models involved in these methods. The BSP method is based on the clustering theory and is used to quantify the relationship between gene sequence lineages and population geographic history. Using molecular clock or fossil correction, with the help of BEAST series software, the Markov Chain Monte Carlo (MCMC) algorithm based on Markov chain is used to calculate the change of effective population size with time. Especially for the analysis of different genes and a small number of individuals, this method can better estimate the effective population size. Therefore, the neutral test results deviate from 0 and P  < 0.05 is only a prerequisite. If the neutral test results are inconsistent with the BSP results, the BSP results (Supplementary Fig. S1 ) should be used.

Genetic variation in yaks originated largely within populations without any obvious geographical structure. A study in goats also showed no obvious phylogeographical structure between the highly differentiated genetic clades and haplogroups due to the large-scale migration of goats around the world [ 31 ]. Notably, the domestication of yaks started 7300 years ago [ 11 ]. After domestication, yaks migrated with herdsmen on a large scale, enabling integration with yak populations in distant geographical regions which would account for the lack of an obvious pedigree geographical structure in yaks.

Genetic diversity is a central facet of biological diversity [ 32 ]. High H d and P i values indicated high genetic diversity in wild, Pamir, Xueduo, Qinghai Plateau, Yushu, Tibet Gaoshan, Sunan, and Jiulong yaks. The Gannan, Niangya, Bazhou, and Sibu yaks had high H d but low P i values. While a single base mutation can generate a new haplotype, this has little impact on nucleotide diversity. Compared with haplotype diversity, increases in nucleotide diversity take longer. Therefore, the bottleneck effect caused by repeated changes in the environment and the rapid population expansion and variation accumulation after the bottleneck effect can lead to high H d and low P i values [ 33 ]. The low H d and high P i values of Leiwuqi and Muli yaks can be attributed to selective pressure from a new environment after migration or the coming into contact of two relatively independent populations or simply to an insufficient number of samples [ 33 ]. This requires further analysis. Zhongdian and Tianzhu white yaks exhibited low H d and P i values, indicating the influence of the founder effect, i.e., the re-establishment of a new population from a few individuals. The numbers in this population would increase but without an increase in genetic diversity [ 33 ]. Low levels of genetic diversity lead to inbreeding and decreased population fitness [ 34 ]. Which is consistent with the artificial selection process seen in the Tianzhu white yak. For 130 years, the Tianzhu white yak has been bred by strict selection of fur color [ 35 ], and this strict artificial selection and control have led to low gene flow between the Tianzhu white yak and other yak populations, resulting in a high degree of genetic differentiation, consistent with the earlier findings that the Tianzhu white yak differed from other domestic yak populations [ 11 ].

Previous studies based on mtDNA D-loop fragments showed that yaks were divided into two clades (I and II), with a differentiation time of 100 000–130 000 years ago [ 7 ]. The differentiation time of the main clades in the yak phylogenetic tree was also calculated based on the third codon of the protein-coding gene of mtDNA. This new method and new fossil marker (2.5 Ma) revealed the differentiation time of three yak clades to be between 420 000 and 580 000 years ago [ 10 ]. Analysis of the whole mtDNA also revealed three main clades in the yak evolutionary tree. Differentiation between clades I and II occurred 0.4328 Ma with a 95% HPD of 0.3218–0.5326 Ma. Differentiation between clade III and clades (I, II) occurred before 0.5654 Ma with a 95% HPD of 0.4283–0.7162 Ma. This is in agreement with previous findings [ 10 ]. In addition, our estimated differentiation time is also consistent with the records of glacial events in the middle and late Pleistocene in the Tibetan Plateau [ 36 ], suggesting that glacial activity in the middle and late Pleistocene may have triggered yak migration and therefore genetic differentiation.

There have been three great glacial epochs in the evolutionary history of the earth. The Quaternary glacial epoch was the most recent in geological history and had a major impact on modern biology [ 37 ]. Due to specific buffering environmental characteristics, the glacial refugia contained unique genetic lineages during a series of climatic fluctuations occurring during the Tertiary and Quaternary epochs [ 38 ], allowing animals and plants to escape from the harsh climatic conditions of the glacial epoch [ 39 ]. Glacial refuges are the starting point for the post-glacial redistribution of species after deglaciation [ 40 ]. Large glacial refugia may last for hundreds of thousands of years or even longer, and, therefore, the isolation of glacial refugia may have accelerated the differentiation of a species’ populations [ 41 ]. The Hengduan Mountains in the southeastern part of the QTP rose rapidly between the Late Miocene and Late Pliocene [ 42 ]. The QTP was never completely covered by glaciers during the Quaternary [ 43 ] and is one of the most important biodiversity research hotspots in the world [ 44 ]. The QTP also formed an important refuge and place of origin for species during the glacial epoch, resulting in rich species diversity and unique geological characteristics [ 45 , 46 ]. Studies on Juniperus przewalskii [ 47 ], Metagentiana striata [ 48 ], and Pedicularis longiflora [ 49 ] indicated that some species may have retreated to refuges on the edge of the QTP during the glacial epoch and recolonized the plateau and adjacent highlands at the end of epoch. In addition, some glacial refugia on the QTP supported the survival of plant species, such as Hippophae rhamnoides [ 50 ] and Spiraea alpina [ 51 ], during climate change [ 1 , 52 ]. Research on Aconitum gymnandrum [ 53 ], Hippophae tibetana [ 54 ], Rhodiola alsia [ 55 ], and Rhodiola chrysanthemifolia [ 56 ] has shown that there were several miniature refugia on the QTP. The multiple glacier refuge scenario fits well with the hypothesis of multiregional and multiscale glaciation during the Pleistocene [ 49 , 57 ]. The mtDNA phylogeny analysis of yak revealed three clades. Wang et al. [ 53 ] found four Aconitum gymnandrum populations that possibly evolved from four independent glacial refugia during the LGM. The single refuge hypothesis generally advocates a network of stellate haplotypes in the entire population [ 58 ]. The network diagram of the yak mitochondrial genome (Fig.  3 ) indicated at least three stellate haplotype network structures in the yak, suggesting at least three glacial refuges in the yak evolutionary history. The MaxEnt model was used to predict the dynamic distribution of yaks from the LIG to the present time (Fig.  5 ). Yaks were mainly but not entirely located on the edge of the QTP during the LGM, suggesting that the yak refuges during the glacial epoch were associated mainly with the marginal areas of the QTP. In Alopex lagopus , the population distribution did not change to follow available habitats during the post-glacial contraction phase but instead went extinct [ 59 ]. This suggests that arctic species did not respond to climate change in the same way as temperate species, i.e., their distribution ranges contracted rather than expanded during interglacial warming. It is still unclear whether species migrated to the refuge to cope with climate change or populations outside the refuges were directly made extinct [ 58 ]. With the lack of detailed information about some sampling points, yak fossil information, and other data, it is impossible to assess the location of glacial refugia during the glacial epoch, or whether yaks outside the glacial refugia underwent direct extinction.

The yak classification in the Bovidae is very different. Linnaeus placed the yak together with Bos taurus and Bos indicus in the Bos genus. Based on its morphological and skeletal differences from other cattle species, Gray, Olsen, and Geraads classified the yak into an independent yak genus [ 60 ]. This study explored the taxonomic status of yak from the direction of maternal inheritance. The representative mtDNA of yaks from three clades and six haplogroups were selected and those of other cattle were downloaded from the GenBank database to determine the phylogenetic relationships among bovids. Each species was represented by at least two individuals. The clustering results showed that the individuals of each species were first clustered into one category; the yaks as a whole were clustered together with Bison bison , and then with other cattle. This is consistent with a previous report [ 60 ] on the Cytb gene of domestic yak mtDNA and another report [ 61 ] on the mtDNA of wild yak. The time of separation between the yak and other bovid species was calculated by the Bayesian method, finding a value of 2.2011 Ma for the separation between yak and Bison bison , which was consistent with the phylogenetic tree structure. This study supports the classification of Li [ 60 ] and Zhong [ 61 ] who classified the yak into a separate genus that included two species, namely, domestic and wild yak.

In conclusion, yaks generally showed high genetic diversity, among which wild yaks had the highest genetic diversity and Tianzhu white yaks had the lowest genetic diversity. The mtDNA genome-wide genetic variation of yak mainly occurs within the population and lacks obvious phylogeographic structure. Phylogenetic analysis revealed that yak had three highly differentiated genetic branches with high support rate. The differentiation time of clades I and II were about 0.4328 Ma, and the differentiation time of clades (I and II) and III were 0.5654 Ma. Yak contains 6 haplogroups. In all yak populations, Yushu yak is distributed in all haplogroups, and the three major haplogroups A-C are distributed in a star-shaped radial pattern. Yaks have experienced population expansion. At the last LGM, the yak’s glacial refugia was mainly located at the edge of QTP. The effects of altitude and annual temperature range on the dynamic distribution of yaks were the highest and lowest, respectively. The classification showed that yaks and wild yaks were first clustered together, and yaks were clustered with American bison as a whole.

Animals and sample collection

The mitochondrial genomes of 372 yaks were sequenced. These yaks were from 15 breeds identified by the National Livestock and Poultry Genetic Resources Commission. Supplementary Table S1 lists information on the breed, numbers, mitochondrial genome accession numbers and geographical coordinates of the sampling points of the yaks. Blood samples were collected from 20 to 26 yaks from each breed/population. There was no sex restriction, and the yaks were three to eight years old without any genetic relationships. Blood samples were collected from the jugular veins of the yaks into EDTA anticoagulant tubes, immediately stored in the vehicle refrigerator, and transported to the laboratory within 24 h. The samples were then stored at -80 °C in the Key Laboratory of Yak Breeding Engineering of Gansu Province. The blood storage numbers were R-5-1-001, R-5-1-002, and R-5-1-003, and the DNA was extracted from the samples within one month. No yaks were sacrificed in this experiment, and all blood samples were taken from live yaks. The jugular vein area of the yaks was locally disinfected with alcohol before blood sample collection. After collected blood samples, wiped with iodine to prevent any possible wounds being infected.

Extraction, amplification, and sequencing of the mitochondrial genome

The primers designed by Wang [ 10 ] and synthesized by Xi’an Qingke Biotechnology Co., Ltd. (Xi’an, China) were used to amplify the whole mitochondrial genome of the yaks. The reagents, methods, and reaction conditions used for the extraction and amplification of the mitochondrial genome were as previously described [ 27 ]. After amplification, the quality of the products was assessed using 1% agarose gel electrophoresis before sequencing at Xi’an Qingke Biotechnology Co., Ltd. (Xi’an, China).

Analysis of genetic structure and genetic diversity

The complete mitochondrial genomes (accession numbers OK375501–OK375872) of the 372 yak individuals were sequenced. After the inclusion of the mitochondrial sequences obtained in an earlier study (accession numbers MW414100–MW414210, MK124955.1), a total of 484 domestic yak individuals were used for experimental analysis. This final sample set covered all morphological groups and distribution ranges of the domestic yaks. In addition, the complete mitochondrial genome sequences (accession numbers GQ464246.1–GQ464266.1, MK033130.1, KR106993.1, KY829451.1, NC_025563.1) of 25 wild yaks were downloaded from National Center for Biotechnology Information (NCBI), increasing the total number of yak mitochondrial complete genome sequences to 509.

MAFFT 7.0 was used for sequence alignment [ 62 ]. AMOVA were performed using Arlequin v 3.5 software to determine the level of differentiation among the yak populations; the number of permutations was set to 10 000 [ 63 ]. DNAsp 6.0 [ 64 ] software was used to calculate the F st among populations, detect the degree of genetic differentiation among populations, and perform genetic diversity analysis and neutrality tests. PermutCpSSR-2.0 software was used to calculate the N st and G st values at the species level (parameters set to 1 000 substitutions) to explore the relationship between genetic and geographical distances and examine whether the species distribution had a genetic geographical structure [ 65 ]. The yak network diagram was constructed with Popart [ 66 ] software. Default settings were used for all the software not described in detail.

Construction of phylogenetic tree

The yak phylogenetic tree was constructed with PhyloSuite [ 67 ]. The IQ-tree module was used to construct the ML phylogenetic tree. The optimal base replacement model was K3Pu + F + R5, “Bootstrap” was Standard, “Num of Bootstrap” was 1 000, and the SH-alRT test was enabled. The default value of repeat sampling times was 1 000. The Mbayes module was used for the construction of 4 MCMC chains of a Bayesian phylogenetic tree, and the optimal base replacement model was HKY + F + I + G4. The number of generations was 100 000 000, the sampling frequency was 1000, the number of runs was 2, and 25% of aging samples were discarded.

Beast v1.10 [ 68 ] was used to estimate the differentiation time, with a Bayesian Information Criterion optimal model of TIM2 + F + I + G4. The corrected Akaike Information Criterion optimal model was GTR + F + R5. According to the fossil time query website ( http://www.timetree.org/ ), the point of divergence between Bos mutus and Bison bison was 1.5 (0.2–3.9) Ma; the relaxed molecular clock model from Beast v 1.10 was used to calculate the differentiation time. A total of 5 000 000 generations, obtaining a tree every 1 000 generations, and ESS > 200 were used.

Research on population distribution dynamics

The climate data from all periods were downloaded from the WorldClimate database ( http://www.worldclim.org/ ). Chinese soil data were obtained from the World Soil Database (HWSD) and downloaded from the National Ice Frozen Desert Science Data Center and the National Special Environment and Special Function Observation Research Station sharing service platform ( http://www.crensed.ac.cn/portal/ ). The map data were downloaded from the National Basic Geographic Information Center ( http://www.ngcc.cn/ngcc/ ). The mask method in ArcGIS 10.5 ( https://developers.arcgis.com/ ) was used to extract the variable data from all environmental factors based on the map data and environmental data of China. MaxEnt v 3.4.1 ( https://biodiversityinformatics.amnh.org/open_source/maxent/ ) was used to construct the distribution dynamics of species in different periods. AUC < 0.60 indicated the failure of the prediction model; AUC 0.60–0.70 indicated that the model prediction results are poor; AUC 0.70–0.80 indicated that the model prediction results are general; AUC 0.80–0.90 indicated that the model prediction results are accurate; AUC 0.90–1.00 indicated that the model prediction results are very accurate [ 69 ].

Data availability

The datasets generated and analyzed during the present study are available in the GenBank repository, under the accession number: OK375501–OK375872 (https://www.ncbi.nlm.nih.gov/genbank/).The data supporting the conclusions of this study are available in the supplementary table.

Abbreviations

Displacement loop

Qinghai-Tibet Plateau

Million years ago

  • Mitochondrial DNA

Cytochrome b

haplotype diversity

Nucleotide diversity

Highest posterior density

Last interglacial

last glacial maximum

National Center for Biotechnology Information

Area under the curves

Xia M, Tian Z, Zhang F, Khan G, Gao Q, Xing R, Zhang Y, Yu J, Chen S. Lancea tibeticaDeep Intraspecific divergence in the endemic Herb (Mazaceae) distributed over the Qinghai-Tibetan Plateau. Front Genet. 2018;9:492.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Liu JQ, Sun YS, Xue-Jun GE, Gao LM, Qiu YX. Phylogeographic studies of plants in China: advances in the past and directions in the future. J Syst Evol. 2012;50(4):9.

Article   CAS   Google Scholar  

Li Y, Zong W, Zhao S, Qie M, Yang X, Zhao Y. Nutrition and edible characteristics, origin traceability and authenticity identification of yak meat and milk: a review. Trends Food Sci Technol. 2023;139:104133.

Chen N, Zhang Z, Hou J, Chen J, Gao X, Tang L, Wangdue S, Zhang X, Sinding MS, Liu X, et al. Evidence for early domestic yak, taurine cattle, and their hybrids on the Tibetan Plateau. Sci Adv. 2023;9(50):eadi6857.

Shah M, Xu C, Wu S, Zhao W, Luo H, Yi C, Liu W, Cai X. Isolation and characterization of spermatogenic cells from cattle, yak and cattleyak. Anim Reprod Sci. 2018;193:182–90.

Article   PubMed   Google Scholar  

Wiener G, Han J, Long R. The yak 2nd edn. (Regional Office for Asia and the Pacific Food and Agriculture Organization of the United Nations The yak 2nd edn. (Regional Office for Asia and the Pacific Food and Agriculture Organization of the United Nations, Bangkok.

Guo S, Savolainen P, Su J, Zhang Q, Qi D, Zhou J, Zhong Y, Zhao X, Liu J. Origin of mitochondrial DNA diversity of domestic yaks. BMC Evol Biol. 2006;6(1):73.

Article   PubMed   PubMed Central   Google Scholar  

Li J. Genetic Diversity and phylogenetic analysis of Yak mtDNA in Karakoram Pamir Region. Master Kashgar Univ 2019, pp:12.

Ma Z, Xia X, Chen S, Zhao X, Zeng L, Xie Y, Chao S, Xu J, Sun Y, Li R, et al. Identification and diversity of Y-chromosome haplotypes in Qinghai yak populations. Anim Genet. 2018;49(6):618–22.

Article   CAS   PubMed   Google Scholar  

Wang Z, Shen X, Liu B, Su J, Yonezawa T, Yu Y, Guo S, Ho SYW, Vila C, Hasegawa M, et al. Phylogeographical analyses of domestic and wild yaks based on mitochondrial DNA: new data and reappraisal. J Biogeogr. 2010;37(12):2332–44.

Article   Google Scholar  

Qiu Q, Wang L, Wang K, Yang Y, Ma T, Wang Z, Zhang X, Ni Z, Hou F, Long R, et al. Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions. Nat Commun. 2015;6:10283.

Zhang S, Liu W, Liu X, Du X, Zhang K, Zhang Y, Song Y, Zi Y, Qiu Q, Lenstra J, et al. Structural variants selected during yak domestication inferred from Long-Read whole-genome sequencing. Mol Biol Evol. 2021;38(9):3676–80.

Brantingham PJ, Olsen JW, Schaller GB. Lithic assemblages from the Chang Tang region, Northern Tibet. Antiquity. 2001;75(288):319–27.

Ma ZJ, Zhong JC, Han JL, Xu JT, Liu ZN, Bai WL. Research progress on molecular genetic diversity of yaks. Genetics. 2013;35(02):151–60.

CAS   Google Scholar  

Ma ZJ, Zhong JC, Han JL, Xu JT, Dou QL, Chang HP. Genetic diversity of mtDNA D-Loop region in wild yaks (Bos grunniens mutus). J Ecol. 2009;29(09):4798–803.

Song QQ, Zhong JC, Zhang CF, Xin JW, Ji QM, Cai ZX. Analysis on genetic diversity and phyletic evolution of mitochondrial DNA from tibetan yaks. Acta Theriol Sinica. 2014;34(04):356–65.

Google Scholar  

Qi XB, Han JL, Blench R, et al. Understanding the yak pastoralism in central Asian highlands: genetic evidence for origin, domestication and dispersion of domestic yak. In: Sanchez-Mazas A, BlenchR. Ross MD, Peiros I, Lin M, editors. Past Human migrations in East Asia: matching Archaeology, Linguistics and Genetics[M]. London and New York: Routledge, Taylor & Francis Group; 2008. pp. 427–42.

Zhao SJ, Chen ZH, Ji QM, Cai ZX, Zhang CF, Xin JW, Zhong JC. Sequence analysis of mtDNA COIII of tibetan yaks. Scientia Agricultura Sinica. 2011;44(23):4902–10.

Derenko M, Denisova G, Malyarchuk B, Dambueva I, Bazarov B. Mitogenomic diversity and differentiation of the buryats. J Hum Genet. 2018;63(1):71–81.

Chang GB, Chang H, Chen GH, Chen R, Zhao J, Zhuo Y, Guan YP. Analysis of genetic diversity and phylogenetic status of Bazhou yak based on partial sequence of Cytb gene. Chin J Anim Sci. 2010;46(17):19–21.

Zhang CF, Xu LJ, Ji QM, Xin JW, Zhong JC. Genetic diversity and evolution relationship on mtDNA D-loop in tibetan yaks. Acta Ecol Sin. 2012;32(05):1387–95.

Excoffier L, Yang Z. Substitution rate variation among sites in mitochondrial hypervariable region I of humans and chimpanzees. Mol Biol Evol. 1999;16(10):1357–68.

Ingman M, Gyllensten U. Analysis of the complete human mtDNA genome: methodology and inferences for human evolution. J Heredity. 2001;92(6):454–61.

Bao P, Pei J, Ding X, Wu X, Chu M, Xiong L, Liang C, Guo X, Yan P. Characterisation of the complete mitochondrial genome of the Jinchuan Yak (Bos grunniens). Mitochondrial DNA B Resour. 2019;4(2):3856–7.

Liang CN, Wu X, Ding X, Wang H, Guo X, Chu M, Bao P, Yan P. Characterization of the complete mitochondrial genome sequence of wild yak (Bos mutus). Mitochondrial DNA Part DNA Mapp Sequencing Anal. 2016;27(6):4266–7.

Guo X, Wu X, Chu M, Bao P, Xiong L, Liang C, Ding X, Pei J, Yan P. Characterization of the complete mitochondrial genome of the Pamir yak (Bos grunniens). Mitochondrial DNA B Resour. 2019;4(2):3165–6.

Wang X, Pei J, Bao P, Cao M, Guo S, Song R, Song W, Liang C, Yan P, Guo X. Mitogenomic diversity and phylogeny analysis of yak (Bos grunniens). BMC Genomics. 2021;22(1):325.

Wu X, Zhou X, Ding X, Liang C, Guo X, Chu M, Wang H, Pei J, Bao P, Yan P. Characterization of the complete mitochondrial genome of the Huanhu Yak (Bos Grunniens). Mitochondrial DNA Part B. 2019;4(1):1235–6.

Zhong D, Ding L. Rising process of the Qinghai-Xizang (Tibet) Plateau and its mechanism. Sci China (Series D) 1996(04):289–95.

Chassot P, Nemomissa S, Yuan YM, Kupfer P. High paraphyly of Swertia L. (Gentianaceae) in the Gentianella-lineage as revealed by nuclear and chloroplast DNA sequence variation. Plant Syst Evol. 2001;229(1–2):1–21.

Luikart G, Gielly L, Excoffier L, Vigne J, Bouvet J, Taberlet P. Multiple maternal origins and weak phylogeographic structure in domestic goats. Proc Natl Acad Sci USA. 2001;98(10):5927–32.

Guo SC, Qi DL, Chen GH, Xu SX, Zhao XQ. Genetic diversity and classification of mitochondrial DNA (mtDNA) in yaks. J Ecol 2008(09):4286–94.

Hua Y. Population genetics and phylogeny of scorpionflies based on mitochondrial DNA. Doctor Northwest A&F Univ 2020, pp: 20–1.

Jump A, Marchant R, Peñuelas J. Environmental change and the option value of genetic diversity. Trends Plant Sci. 2009;14(1):51–8.

Zhang MQ, Xu X, Luo SJ. The genetics of brown coat color and white spotting in domestic yaks (Bos grunniens). Anim Genet. 2014;45(5):652–9.

Zheng BX, Xu QQ, Shen YP. The relationship between climate change and quaternary glacial cycles on the Qinghai-Tibetan Plateau: review and speculation. Quatern Int 2002, 97 – 8:93–101.

Bandelt H, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol. 1999;16(1):37–48.

Tang CQ, Matsui T, Ohashi H, Dong YF, Momohara A, Herrando-Moraira S, Qian S, Yang Y, Ohsawa M, Luu HT, et al. Identifying long-term stable refugia for relict plant species in East Asia. Nat Commun. 2018;9(1):4488.

Beck RA, Burbank DW, Sercombe WJ, Riley GW, Barndt JK, Berry JR, Afzal J, Khan AM, Jurgen H, Metje J, et al. Stratigraphic evidence for an early collision between northwest India and asia. Nature. 1995;373(6509):55–8.

Beheregaray LB. Twenty years of phylogeography: the state of the field and the challenges for the Southern Hemisphere. Mol Ecol. 2008;17(17):3754–74.

Molnar P, Boos WR, Battisti DS. Orographic Controls on Climate and Paleoclimate of Asia: Thermal and Mechanical Roles for the Tibetan Plateau. In: Annual Review of Earth and Planetary Sciences, Vol 38 Edited by Jeanloz R, Freeman KH, vol. 38; 2010: 77–102.

Sun BN, Wu JY, Liu YS, Ding ST, Li XC, Xie SP, Yan DF, Lin ZC. Reconstructing Neogene vegetation and climates to infer tectonic uplift in western Yunnan, China. Palaeogeography Palaeoclimatology Palaeoecology. 2011;304(3–4):328–36.

Shi Y. Characteristics of late quaternary monsoonal glaciation on the Tibetan Plateau and in East Asia. Quatern Int 2002, 97 – 8:79–91.

Myers N, Mittermeier RA, Mittermeier CG, da Fonseca GA, Kent J. Biodiversity hotspots for conservation priorities. Nature. 2000;403(6772):853–8.

He K, Jiang X. Sky islands of southwest China. I: an overview of phylogeographic patterns. Chin Sci Bull. 2014;59(7):585–97.

Marchese C. Biodiversity hotspots: a shortcut for a more complicated concept. Global Ecol Conserv. 2015;3:297–309.

Zhang Q, Chiang TY, George M, Liu JQ, Abbott RJ. Phylogeography of the Qinghai-Tibetan Plateau endemic Juniperus przewalskii (Cupressaceae) inferred from chloroplast DNA sequence variation. Mol Ecol. 2005;14(11):3513–24.

Chen S, Wu G, Zhang D, Gao Q, Duan Y, Zhang F, Chen S. Potential refugium on the Qinghai-Tibet Plateau revealed by the chloroplast DNA phylogeography of the alpine species Metagentiana striata (Gentianaceae). Bot J Linn Soc. 2008;157(1):125–40.

Yang F-S, Li Y-F, Ding X, Wang X-Q. Extensive population expansion of Pedicularis longiflora (Orobanchaceae) on the Qinghai-Tibetan Plateau and its correlation with the quaternary climate change. Mol Ecol. 2008;17(23):5135–45.

Jia DR, Abbott RJ, Liu TL, Mao KS, Bartish IV, Liu JQ. Out of the Qinghai-Tibet Plateau: evidence for the origin and dispersal of eurasian temperate plants from a phylogeographic study of Hippophaë rhamnoides (Elaeagnaceae). New Phytol. 2012;194(4):1123–33.

Khan G, Zhang F, Gao Q, Fu P, Zhang Y, Chen S. Spiroides shrubs on Qinghai-Tibetan Plateau: Multilocus phylogeography and palaeodistributional reconstruction of Spiraea alpina and S. Mongolica (Rosaceae). Mol Phylogenetics Evol. 2018;123:137–48.

Wang B, Xie F, Li J, Wang G, Li C, Jiang J. Phylogeographic investigation and ecological niche modelling of the endemic frog species Nanorana pleskei revealed multiple refugia in the eastern tibetan Plateau. PeerJ. 2017;5:e3770.

Wang L, Abbott RJ, Zheng W, Chen P, Wang Y, Liu J. History and evolution of alpine plants endemic to the Qinghai-Tibetan Plateau: Aconitum Gymnandrum (Ranunculaceae). Mol Ecol. 2009;18(4):709–21.

Wang H, Qiong L, Sun K, Lu F, Wang Y, Song Z, Wu Q, Chen J, Zhang W. Phylogeographic structure of Hippophae Tibetana (Elaeagnaceae) highlights the highest microrefugia and the rapid uplift of the Qinghai-Tibetan Plateau. Mol Ecol. 2010;19(14):2964–79.

Gao Q, Zhang D, Duan Y, Zhang F, Li Y, Fu P, Chen S. Intraspecific divergences of Rhodiola alsia (Crassulaceae) based on plastid DNA and internal transcribed spacer fragments. Bot J Linn Soc. 2012;168(2):204–15.

Gao QB, Zhang FQ, Xing R, Gornall RJ, Fu PC, Li Y, Gengji ZM, Chen SL. Phylogeographic study revealed microrefugia for an endemic species on the Qinghai-Tibetan Plateau: Rhodiola Chrysanthemifolia (Crassulaceae). Plant Syst Evol. 2016;302(9):1179–93.

Owen LA, Finkel RC, Barnard PL, Ma HZ, Asahi K, Caffee MW, Derbyshire E. Climatic and topographic controls on the style and timing of late quaternary glaciation throughout Tibet and the Himalaya defined by Be-10 cosmogenic radionuclide surface exposure dating. Q Sci Rev. 2005;24(12–13):1391–411.

Provan J, Bennett KD. Phylogeographic insights into cryptic glacial refugia. Trends Ecol Evol. 2008;23(10):564–71.

Dalen L, Nystrom V, Valdiosera C, Germonpre M, Sablin M, Turner E, Angerbjorn A, Arsuaga JL, Gotherstrom A. Ancient DNA reveals lack of postglacial habitat tracking in the arctic fox. Proc Natl Acad Sci USA. 2007;104(16):6726–9.

Li QF, Li YF, Zhao XB, Liu ZS, Zhang QB, Song DW, Qu XG, Li N, Xie Z. Sequencing of mitochondrial DNA cytochrome b gene of yak and its origin, classification and position. J Anim Husb Veterinary Med 2006(11):1118–23.

Zhong JC, Chai ZX, Ma ZJ, Wang Y, Yang WY, La H. Sequencing and phylogenetic analysis of the mitochondrial genome of wild yaks. J Ecol. 2015;35(05):1564–72.

Katoh K, Rozewicki J, Yamada K. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 2019;20(4):1160–6.

Excoffier L, Smouse P, Quattro J. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics. 1992;131(2):479–91.

Rozas J, Ferrer-Mata A, Sánchez-DelBarrio J, Guirao-Rico S, Librado P, Ramos-Onsins S, Sánchez-Gracia A. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299–302.

Hartnup K, Huynen L, Te Kanawa R, Shepherd L, Millar C, Lambert D. Ancient DNA recovers the origins of Māori feather cloaks. Mol Biol Evol. 2011;28(10):2741–50.

Leigh JW, Bryant D. POPART: full-feature software for haplotype network construction. Methods Ecol Evol. 2013;6:1110–6.

Zhang D, Gao F, Jakovlić I, Zou H, Zhang J, Li W, Wang G. PhyloSuite: an integrated and scalable desktop platform for streamlined molecular sequence data management and evolutionary phylogenetics studies. Mol Ecol Resour. 2020;20(1):348–55.

Suchard M, Lemey P, Baele G, Ayres D, Drummond A, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4(1):vey016.

Bellard C, Bertelsmeier C, Leadley P, Thuiller W, Courchamp F. Impacts of climate change on the future of biodiversity. Ecol Lett. 2012;15(4):365–77.

Download references

Acknowledgements

The authors would like to thank all the reviewers who participated in the review and the brothers and sisters for their help in the experiment.

This work was supported by the China Agriculture Research System of MOF and MARA (CARS-37) and the Innovation Project of Chinese Academy of Agricultural Sciences (25-LZIHPS-01).

Author information

Authors and affiliations.

Key Laboratory of Yak Breeding in Gansu Province, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, 730050, P.R. China

Xingdong Wang, Jie Pei, Lin Xiong, Pengjia Bao, Min Chu, Xiaoming Ma, Yongfu La, Chunnian Liang, Ping Yan & Xian Guo

Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou, 730050, P.R. China

You can also search for this author in PubMed   Google Scholar

Contributions

X.G. and X.W. conceptualized this study. P.B. and P.Y. helped in the investigation. J.P., L.X., and X.W. helped in methodology and software. X.W., Y.L., and X.M. performed data curation. X.W. and M.C. helped in writing the original draft. X.G., C.L., and X.W. helped in writing, reviewing, and editing the manuscript. X.G. helped in funding acquisition. All authors contributed to the interpretation of the results and writing of the manuscript.

Corresponding author

Correspondence to Xian Guo .

Ethics declarations

Ethics approval and consent to participate.

All procedures involving animals were performed according to the guidelines of the China Council on Animal Care and the Ministry of Agriculture of the People’s Republic of China and approved by the Animal Ethics Committee of Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Wang, X., Pei, J., Xiong, L. et al. Genetic diversity, phylogeography, and maternal origin of yak ( Bos grunniens ). BMC Genomics 25 , 481 (2024). https://doi.org/10.1186/s12864-024-10378-z

Download citation

Received : 26 December 2023

Accepted : 06 May 2024

Published : 15 May 2024

DOI : https://doi.org/10.1186/s12864-024-10378-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Genetic diversity
  • Phylogeography
  • Glacial refugia

BMC Genomics

ISSN: 1471-2164

research about genetic diversity

Genetic Research Boosts Black-footed Ferret Conservation Efforts

decorative blue wavy line graphic

DENVER –  Black-footed ferret recovery efforts aimed at increased genetic diversity and disease resistance took a bold step forward Dec. 10, 2020, with the birth of “Elizabeth Ann,” created from the frozen cells of “Willa,” a black-footed ferret that lived more than 30 years ago. The groundbreaking effort to explore solutions to help recover this endangered species results from an innovative partnership among the U.S. Fish and Wildlife Service and species recovery partners and scientists at Revive & Restore, ViaGen Pets & Equine, San Diego Zoo Global, and the Association of Zoos and Aquariums.

“The Service sought the expertise of valued recovery partners to help us explore how we might overcome genetic limitations hampering recovery of the black-footed ferret, and we’re proud to make this announcement today,” said  Noreen Walsh, Director of the Service’s Mountain-Prairie Region , where the Service’s National Black-footed Ferret Conservation Center is located. “Although this research is preliminary, it is the first cloning of a native endangered species in North America, and it provides a promising tool for continued efforts to conserve the black-footed ferret.”

“Maintaining and increasing wild populations and suitable habitat continues to be essential for black-footed ferret recovery and will remain a priority for the Service,”  Walsh continued . “Successful genetic cloning does not diminish the importance of addressing habitat-based threats to the species or the Service’s focus on addressing habitat conservation and management to recover black-footed ferrets.”

Today, all black-footed ferrets are descended from seven individuals, resulting in unique genetic challenges to recovering this species. Cloning may help address significant genetic diversity and disease resilience barriers to support habitat conservation and reestablishment of additional populations in the wild. Without an appropriate amount of genetic diversity, a species often becomes more susceptible to diseases and genetic abnormalities, as well as limited adaptability to conditions in the wild and a decreased fertility rate. Limited genetic diversity makes it extremely difficult to fully recover a species.

This effort is consistent with the Service’s black-footed ferret recovery plan, which addresses the use of various assisted reproductive techniques. It also encourages the incorporation of any newly discovered black-footed ferrets into the current captive population and the use of these genetic materials to maximize genetic diversity.

Once thought to be extinct and currently listed as an endangered species, black-footed ferrets were brought back from nearly vanishing forever by the Service and its partners after a Wyoming rancher discovered a small population on his land in 1981. Ferrets from this population were captured by the Wyoming Game & Fish Department and others to begin a captive breeding program to recover the species.

This small number of individuals has put limitations on the species’ genetic diversity, creating challenges for resiliency to changing environments and emerging disease threats. Willa, a black-footed ferret captured among the last wild individuals, has no living descendants and is therefore not one of the seven founders. The Wyoming Game & Fish Department had the foresight to preserve her genes and sent tissue samples from Willa to San Diego Zoo Global’s Frozen Zoo in 1988. The Frozen Zoo established a cell culture and stewarded these precious frozen cells ever since, making today’s achievement possible.

“San Diego Zoo Global’s Frozen Zoo was created more than 40 years ago with the hope that it would provide solutions to future conservation challenges,” said  Oliver Ryder, Director of Conservation Genetics, San Diego Zoo Global . “We are delighted that we have been able to cryobank and, years later, provide viable cell cultures for this groundbreaking project.”

A genomic study revealed Willa’s genome possessed three times more unique variations than the living population. Therefore, if Elizabeth Ann successfully mates and reproduces, she could provide unique genetic diversity to the species.

“We’ve come a long way since 2013 when we began the funding, permitting, design and development of this project with the U.S. Fish and Wildlife Service. Genomics revealed the genetic value that Willa could bring to her species,” said  Ryan Phelan, Revive & Restore Executive Director . “But it was a commitment to seeing this species survive that has led to the successful birth of Elizabeth Ann. To see her now thriving ushers in a new era for her species and for conservation-dependent species everywhere. She is a win for biodiversity and for genetic rescue.”

In 2018, the Service issued the first-ever recovery permit for cloning research of an endangered species, allowing Revive & Restore to initiate genetic analyses and proof of concept trials. This work builds upon recent advancements in cloning processes developed by ViaGen Pets & Equine, which successfully created embryos from the frozen cell line and implanted them into a domestic ferret surrogate.

“The ability to utilize our proven Somatic Cell Nuclear Technology to enable the cloning of such an ecologically important species is a great privilege,” said  ViaGen Pets and Equine President Blake Russell .

The surrogate mother was transferred from ViaGen Pets & Equine to the Service’s National Black-Footed Ferret Conservation Center (NBFFCC) mid-gestation to give birth to the cloned kit under the Service's authority. The NBFFCC staff’s extensive experience breeding and caring for black-footed ferrets ensured the safe arrival of the first U.S. endangered species clone. This research is still in the early stages, and researchers continue to closely monitor the young kit for viability and other developments. Elizabeth Ann and her surrogate mother are kept separate from other breeding black-footed ferrets, and she will live her life at the NBFFCC as additional research is completed. The team is working to produce more black-footed ferret clones in the coming months as part of continuing research efforts.

“Zoos accredited by the Association of Zoos and Aquariums, like San Diego Zoo, have worked for decades with the U.S. Fish and Wildlife Service to breed and reintroduce black-footed ferrets,” said  Dan Ashe, President and CEO of the Association of Zoos and Aquariums . “Today’s news is another exciting step in this inspiring recovery story, and AZA looks forward to more successful cooperation with the Service through its Saving Animals From Extinction (SAFE) program.”

The U.S. Fish and Wildlife Serviceworks with others to conserve, protect, and enhance fish, wildlife, plants, and their habitats for the continuing benefit of the American people. For more information, visit  www.fws.gov , or connect with us through any of these social media channels:  Facebook ,  Twitter ,  Flickr ,  YouTube , and  Instagram .

Revive & Restore  is the leading wildlife conservation organization promoting the incorporation of biotechnologies into standard conservation practice. The Sausalito, California nonprofit was formed in 2012 with the idea that 21st century biotechnology can and should be used to enhance genetic diversity, build disease resistance, facilitate adaptation and more. Its mission is to enhance biodiversity through the genetic rescue of endangered and extinct species.

ViaGen Pets & Equine  is the worldwide leader in cloning the animals we love. We provide the option of hope through DNA storage of your unique dog, cat or horse. Then through our amazing cloning technology we provide joy to clients all over the world with a genetic twin to their original animal. Our team is dedicated to providing outstanding service, quality animal care and a love that lasts forever. ViaGen Pets and Equine is dedicated to conversation through partnership efforts with the San Diego Zoo and Revive & Restore.

San Diego Zoo Global  - Bringing species back from the brink of extinction is the goal of San Diego Zoo Global. As a leader in conservation, the work of San Diego Zoo Global includes on-site wildlife conservation efforts (representing both plants and animals) at the San Diego Zoo, San Diego Zoo Safari Park, and San Diego Zoo Institute for Conservation Research, as well as international field programs on six continents. The work of these entities is made accessible to over 1 billion people annually, reaching 150 countries via social media, our websites and the San Diego Zoo Kids network, in children’s hospitals in 12 countries. The work of San Diego Zoo Global is made possible with support from our incredible donors committed to saving species from the brink of extinction.

Association of Zoos and Aquariums  - Founded in 1924, the AZA is a nonprofit organization dedicated to the advancement of zoos and aquariums in the areas of conservation, animal welfare, education, science, and recreation. AZA is the accrediting body for the top zoos and aquariums in the United States and 12 other countries. Look for the AZA accreditation logo whenever you visit a zoo or aquarium as your assurance that you are supporting a facility dedicated to providing excellent care for animals, a great experience for you, and a better future for all living things. The AZA is a leader in saving species and your link to helping animals all over the world. To learn more, visit www.aza.org .

Media Contacts

Latest press releases.

dunes sagebrush lizard crawls over sand and twigs

You are exiting the U.S. Fish and Wildlife Service website

You are being directed to

We do not guarantee that the websites we link to comply with Section 508 (Accessibility Requirements) of the Rehabilitation Act. Links also do not constitute endorsement, recommendation, or favoring by the U.S. Fish and Wildlife Service.

ScienceDaily

High genetic diversity discovered in South African leopards

Researchers say the discovery of very high genetic diversity in leopards found in the Highveld region of South Africa has increased the need for conservation efforts to protect leopards in the country.

Declan Morris, a PhD candidate with the University of Adelaide's School of Animal and Veterinary Sciences, led the research project, which discovered that the two maternal lineages of leopards found in Africa overlap in the Highveld, leading to the high genetic diversity.

One lineage can be found across most of the African continent, while the other is confined mostly to the Western Cape, Eastern Cape, KwaZulu-Natal and Mpumalanga regions of South Africa.

"We compiled the most comprehensive mitochondrial DNA (mtDNA) data set to date to explore the trends and leopard genetics on a continental scale," says Morris.

"The results of our analysis, using a combination of mtDNA, microsatellites, and comparisons with results of other published studies, is what enabled us to determine that the leopard population in the Highveld of Mpumalanga had the highest levels of genetic diversity in the country."

Genetic diversity is important for a species' long-term survival.

"High genetic diversity increases the ability for a species to adapt to a changing environment around it; therefore, it can make species more resilient to events such as climate change or the introduction of new diseases," says Morris.

"The discovery that the leopards in the Highveld have the highest recorded levels of genetic diversity in South Africa is significant as it places a high conservation priority for the population in the region."

It is likely the two lineages of leopards diverged between 960,000-440,000 years ago due to the aridification of the Limpopo basin between 1,000,000-600,000 years ago. Both leopard lineages are now comingling in the Mpumalanga Province where Morris' PhD work was conducted.

"We had originally hypothesised that the Highveld leopards would be isolated as they exist in a highly fragmented region, but this discovery shows us that it's not as isolated as we thought," Morris says.

"Gene flow is occurring with Lowveld areas and Kruger National Park. We found an unexpected level of connectivity, even across landscapes highly modified by humans."

Morris, whose research team included the University of Adelaide's Dr Todd McWhorter and Associate Professor Wayne Boardman, and collaborators from University of Pretoria and University of Venda, hopes this discovery will place a higher importance on the conservation of leopard populations in South Africa.

"This information will hopefully help change attitudes towards the management of leopards and be used to inform management decisions -- such as choosing translocation instead of issuing destruction permits for problem-causing animals," he says.

"One of the biggest measures that could protect leopards in the Highveld is community engagement. Building better, stronger relationships between the community, government, researchers, and conservation organisations allows for efficient, targeted management programs to be designed."

This discovery was published in the journal PeerJ and builds upon another recent leopard study published by the research team.

  • Evolutionary Biology
  • New Species
  • Endangered Animals
  • Biodiversity
  • Environmental Policy
  • Environmental Awareness
  • Conservation ethic
  • Biodiversity Action Plan
  • Conservation biology
  • Black Rhinoceros
  • Marine conservation

Story Source:

Materials provided by University of Adelaide . Original written by Johnny von Einem. Note: Content may be edited for style and length.

Journal Reference :

  • Declan R. Morris, Todd J. McWhorter, Wayne S. J. Boardman, Gregory Simpson, Jeanette Wentzel, Jannie Coetzee, Yoshan Moodley. Unravelling the maternal evolutionary history of the African leopard (Panthera pardus pardus) . PeerJ , 2024; 12: e17018 DOI: 10.7717/peerj.17018

Cite This Page :

Explore More

  • High-Efficiency Photonic Integrated Circuit
  • Life Expectancy May Increase by 5 Years by 2050
  • Toward a Successful Vaccine for HIV
  • Highly Efficient Thermoelectric Materials
  • Toward Human Brain Gene Therapy
  • Whale Families Learn Each Other's Vocal Style
  • AI Can Answer Complex Physics Questions
  • Otters Use Tools to Survive a Changing World
  • Monogamy in Mice: Newly Evolved Type of Cell
  • Sustainable Electronics, Doped With Air

Trending Topics

Strange & offbeat.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

AFRICAN GENETIC DIVERSITY: Implications for Human Demographic History, Modern Human Origins, and Complex Disease Mapping

Michael c. campbell.

1 Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19107; ude.nnepu.dem.liam@macm

Sarah A. Tishkoff

2 Department of Biology, University of Pennsylvania, School of Arts and Sciences, Philadelphia, Pennsylvania 19104; ude.nnepu.dem.liam@ffokhsit

Associated Data

Comparative studies of ethnically diverse human populations, particularly in Africa, are important for reconstructing human evolutionary history and for understanding the genetic basis of phenotypic adaptation and complex disease. African populations are characterized by greater levels of genetic diversity, extensive population substructure, and less linkage disequilibrium (LD) among loci compared to non-African populations. Africans also possess a number of genetic adaptations that have evolved in response to diverse climates and diets, as well as exposure to infectious disease. This review summarizes patterns and the evolutionary origins of genetic diversity present in African populations, as well as their implications for the mapping of complex traits, including disease susceptibility.

INTRODUCTION

One of the “grand challenges” of the post-genome era is to “develop a detailed understanding of the heritable variation in the human genome” ( 36 ). By characterizing genetic variation among individuals and populations, we may gain a better understanding of differential susceptibility to disease, differential response to pharmacological agents, human evolutionary history, and the complex interaction of genetic and environmental factors in producing phenotypes. Africa is an important region to study human genetic diversity because of its complex population history and the dramatic variation in climate, diet, and exposure to infectious disease, which result in high levels of genetic and phenotypic variation in African populations. A better understanding of levels and patterns of variation in African genomes, together with phenotype data on variable traits, including susceptibility to disease and drug response, will be critical for reconstructing modern human origins, the genetic basis of adaptation to diverse environments, and the development of more effective vaccines and other therapeutic treatments for disease. This information will also be important for identifying variants that play a role in susceptibility to a number of complex diseases in people of recent African ancestry ( 172 , 187 , 208 ).

HUMAN EVOLUTIONARY HISTORY IN AFRICA

Africa is a region of considerable genetic, linguistic, cultural, and phenotypic diversity. There are more than 2000 distinct ethno-linguistic groups in Africa, speaking languages that constitute nearly a third of the world’s languages ( http://www.ethnologue.com/ ) ( Figure 1 ). These populations practice a wide range of subsistence patterns including various modes of agriculture, pastoralism, and hunting-gathering. Africans also live in climates that range from the world’s largest desert and second largest tropical rainforest to savanna, swamps, and mountain highlands, and these climates have, in some cases, undergone dramatic changes in the recent past ( 106 , 172 ).

An external file that holds a picture, illustration, etc.
Object name is nihms235397f1.jpg

A map of African language family distributions and hypothesized migration events within and out of Africa. African languages have been classified into four major language families: Niger-Kordofanian (spoken predominantly by agriculturalist populations across a broad geographic distribution in Africa), Afro-Asiatic (spoken predominantly by northern and eastern Africa pastoralists and agropastoralists), Nilo-Saharan (spoken predominantly by eastern and central African pastoralists), and Khoisan (a language containing click-consonants, spoken by southern and eastern African hunter-gatherer populations). Also plotted are the geographic origins of African samples included in the Center d’Etude du Polymorphisme Humain (CEPH) Human Genome Diversity Panel (CEPH-HGDP). Diagram adapted from Reference 170 .

According to the Out of Africa (OOA) model of modern human origins, anatomically modern humans originated in Africa and then spread across the rest of the globe within the past ~100,000 years ( 206 ). The transition to modern humans within Africa was not sudden; rather, the paleobiological record indicates an irregular mosaic of modern, archaic, and regional morphological and behavioral traits that occurred over a substantial period of time and across a broad geographic range within Africa ( 127 ). The earliest known derived suite of morphological traits associated with modern humans appears in fossil remains from Ethiopia, dated to ~150--190 kya ( 128 , 229 ). However, this finding does not rule out the existence of modern morphological traits in other regions of Africa before 100 kya, particularly where specimens may be less well preserved and/or where extensive archaeological and paleobiological investigations have not been conducted ( 172 ). Indeed, a multiregional origin model for modern humans within Africa is not as unlikely as it would be for global populations, considering the greater potential for migration and admixture within a single continental region ( 172 , 241 ). A more fully modern suite of traits appears in East Africa and Southwest Asia around 90 kya, followed by a rapid spread of modern humans throughout the rest of Africa and Eurasia within the past 40,000--80,000 years ( 120 , 172 ) ( Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is nihms235397f2.jpg

Ancestral Africans have maintained a large and subdivided population structure and have experienced complex patterns of population expansions, contractions, migration, and admixture during their evolutionary history. The bottleneck associated with the founding of non-African populations (~50–100 kya) resulted in lower levels of genetic diversity, an increase in linkage disequilibrium (LD), and more similar patterns of LD. In addition, several recent studies have suggested that a serial founder model of migration occurred in the history of non-Africans in which the geographic expansion of these populations occurred in many small steps, and each migration involved a sampling of variation from the previous population ( 36 , 90 , 109 , 167 ). Solid horizontal lines indicate gene flow between populations and the dashed horizontal line indicates recent gene flow from Asia to Australia/Melanesia.

Two migration routes of modern humans out of Africa have been proposed. The presence of modern humans in Oceania as early as ~50 kya ( 65 , 66 ), which predates their presence in Europe ~40 kya, has suggested a southern coastal route around the Indian Ocean in which modern humans first left Africa (possibly via Ethiopia) by crossing the Bab-el-Mandeb strait at the mouth of the Red Sea and then rapidly migrated to Southeast Asia and Oceania ( 62 , 172 ) ( Figure 1 ). This migration model is supported by the presence of very old mtDNA haplotypes in South Asia and their absence in the Levant ( 120 , 168 , 197 ). Other models have traditionally favored a second (or single) northern route via the Sinai Peninsula into the Levant ( 62 , 172 ) ( Figure 1 ). Regardless of the route of migration of modern humans out of Africa, the shared patterns of genetic diversity among non-African populations [e.g., at the CD4 locus ( 200 )] and the divergent patterns of genetic variation among African populations argue against repeated sampling of African diversity from multiple source populations ( 172 , 200 , 206 ). However, analyses of more independent loci and a larger number of African populations, particularly from East Africa, will be necessary to better estimate the number and source of migration events out of Africa ( 172 ). After modern humans migrated from Africa, there could have been some admixture of modern humans with archaic populations in Eurasia, such as Neanderthals. This hypothesis remains a topic of considerable interest and debate and is the subject of a number of recent studies and reviews ( 46 , 59 , 71 , 73 , 77 , 144 , 158 , 172 , 184 , 185 , 224 )

The migration of modern humans out of Africa is thought to be accompanied by a population bottleneck. The size of the population(s) migrating out of Africa is estimated to be ~600 effective founding females (i.e., census size of ~1800 females) on the basis of mtDNA evidence ( 62 , 120 ), to be ~1000 effective founding males and females (i.e., census size of ~3000 individuals) based on the analysis of 783 autosomal microsatellites genotyped in the Center d’Etude du Polymorphisme Humain (CEPH) human genome diversity panel (HGDP) ( 112 ), and to be ~1500 (i.e., a census size of ~4500 individuals) based on a combined analysis of mtDNA, Y chromosome, and X chromosome nucleotide diversity data ( 72 ). These estimates imply that Eurasians must have rapidly expanded to a larger size to account for estimates of a long-term effective population size (N e ) of ~10,000 individuals (census size of ~30,000 individuals) for global populations ( 172 , 243 ). Indeed, several recent studies indicate a rapid expansion of Eurasian populations within the past ~50,000 years, whereas Africans have maintained a large effective population size ( 72 , 125 , 243 ).

PATTERNS OF GENETIC VARIATION IN AFRICA

The pattern of genetic variation in modern African populations is influenced by demographic history (e.g., changes in population size, short- and long- range migration events, and admixture) as well as locus-specific forces such as natural selection, recombination, and mutation. For example, the migration of agricultural Bantu speakers from West Africa throughout sub-Saharan Africa within the past ~4000 years and subsequent admixture with indigenous populations has had a major impact on patterns of variation in modern African populations ( 157 , 167 , 172 , 201 , 235a ) ( Figure 1 ). Although Africa is critical for understanding modern human origins and genetic risk factors for disease, it has been under-represented in human genetic studies. Much of what we currently know about genetic diversity is from a limited number of the ~2000 ethno-linguistic groups in Africa, and the majority of these data are from mtDNA and Y chromosome studies. Large-scale autosomal studies of African genetic diversity are only now beginning to become available.

Mitochondrial DNA and Y Chromosome Variation

Phylogenetic analyses of both mtDNA and Y chromosome DNA indicate that the oldest lineages are specific to Africa and have a Time to Recent Common Ancestry (TMRCA) of ~200 kya ( 75 , 206 ). Interestingly, the most ancient mtDNA lineage (L0d) [dated to ~106 kya ( 75 )], which is common in click-speaking southern African Khoisan (SAK) populations, has recently been identified at low frequency (5%) in the click-speaking Sandawe population from Tanzania ( 75 , 201 ). Maximum likelihood estimates for the time of divergence of these populations based on all mtDNA lineages is ~44 kya, indicating that any common ancestry is quite old. This finding supports studies of classical polymorphisms as well as archeological data that suggest that Khoisan-speaking populations may have originated in eastern Africa and subsequently migrated into southern Africa ( 26 ), although a southern African origin of Khoisan-speakers cannot be ruled out.

Phylogenetic analysis indicates that the most recent African specific mtDNA haplogroup lineage, L3, is the likely precursor of modern European and Asian mtDNA haplotypes ( 226 ). Indeed a subset of this lineage (M1) is observed at high frequency in Ethiopian populations ( 101 , 168 ) and may have expanded out of Africa ~60 kya ( 168 ). This observation adds strength to the proposal that the dispersal of modern humans out of Africa may have occurred via Ethiopia ( 117 , 200 ). However, more recent analysis of whole mtDNA genomes suggests that the M1 lineage may have originated in southwestern Asia and then was introduced into East Africa from Asia ~40--45 kya ( 150 ), whereas others have argued for a much more recent introduction of the M1 lineage into Africa from the Middle East ( 63 ).

Nucleotide and Haplotype Variation

The migration of modern humans out of Africa resulted in a population bottleneck and a concomitant loss of genetic diversity ( 112 , 169 ). Numerous studies have shown higher levels of nucleotide and haplotype diversity in Africans compared to non-Africans in both nuclear and mitochondrial genomes ( 40 , 72 , 93 , 111 , 200 , 202 , 206 , 208 ). Non-African populations appear to have a subset of the genetic diversity present in sub-Saharan Africa and more private alleles and haplotypes are observed in Africa relative to other regions ( 38 , 93 , 111 , 169 , 200 , 202 , 206 , 208 , 243 ) as expected under an OOA model. For example, a resequencing study of 3873 genes in 154 chromosomes from European, Latino/Hispanic, Asian, and African American populations observed that African Americans had the highest percentage of rare single nucleotide polymorphisms (SNPs) (64%) and the lowest percentage of common SNPs (36%). Additionally, 44% of all SNPs in this population were private ( 78 ). The high level of genetic diversity in African populations is also consistent with a larger long-term effective population size ( N e ) compared to non-Africans ( 72 , 195 , 196 , 202 , 206 ; N e is estimated to be ~15,000 for Africans and ~7500 for non-Africans based on a resequencing analysis of several 10-kb regions ( 243 ) (see Supplemental Material ).

Structural Variation

Although most studies of genetic variation in humans have focused on nucleotide and microsatellite diversity, a number of recent studies have demonstrated considerable amounts of structural variation (SV) in the human genome, including both copy number variation (which can include insertions and deletions as well as gene duplications) and inversions ( 17 , 37 , 191 , 211 ) ( http://projects.tcag.ca/variation/ ). Some of these structural variants are also associated with phenotypic variability ( 37 , 171 , 193 ). For example, variation in copy number of the amylase gene, which plays a role in digestion of starch, is correlated with enzyme activity level and with diet in ethnically diverse human populations ( 156 ). Additionally, SVs may play an important role in susceptibility to common disease ( 109 , 124 ). A recent study that used high-resolution paired-end mapping to identify SVs in the genomes of a single African (Yoruba from Nigeria) individual and an individual of European descent led to the identification of 1175 insertions/deletions (INDELs) and 122 inversions ( 103 ). By extrapolation, these researchers predicted 761 and 887 SVs in the full genomes of these European and African individuals, respectively. Additionally, 45% of the SVs were shared between these samples, suggesting that a large proportion of SV events occurred prior to the divergence of African and non-African populations. The majority of these SVs were less than 10 kb in size, but at least 15% were larger than 100 kb and some SVs were predicted to be several megabases in size in both the European and African sample, indicating that the genomes of healthy individuals may differ by megabases of nucleotide sequence ( 103 ). To date, few population genetic studies of SVs across ethnically diverse populations have been performed ( 37 ). Instead, most studies have focused on the European, Japanese, Chinese, and African (Yoruba) HapMap populations ( 37 ). A study of 67 common copy number variants (CNVs) in these populations indicated that 11% of the variation was due to differences among populations and that many of the variants were shared among populations from different regions, further supporting the argument that these variants existed prior to migration of modern humans out of Africa ( 171 ). There are currently no studies of SV variability within and between ethnically diverse African populations. Such knowledge will be informative for reconstructing human evolutionary history and for understanding the role of SVs in normal phenotypic diversity and in susceptibility to disease.

POPULATION STRUCTURE IN AFRICA

Measures of population structure on a global level indicate that only ~10%--16% (Wright’s fixation index, F ST = 0.10--0.16) of observed genetic variation is due to differences among populations from Africa, Europe, and Asia ( 26 , 40 , 206 , 228 ). Analysis of population structure using the program STRUCTURE, ( 162 ,) based on 1048 individuals from the CEPH human diversity panel genotyped for 993 genome-wide microsatellite and insertion/deletion markers, indicates that individuals cluster into five major geographic regions: Africa, Europe/Middle East, East Asia, Oceania, and the New World ( 175 ). Two recent studies of >500,000 SNPs genotyped in the CEPH diversity panel support these initial findings ( 93 , 111 ). Analyses within the African populations indicate that additional substructure exists, particularly between hunter-gatherer and agriculturalist populations ( 93 , 111 ). However, the CEPH diversity panel includes just eight African populations, four of which are agricultural Bantu-speakers likely to share recent common ancestry ( Figure 1 ). Thus, results from these studies may not reflect the full extent of population structure within Africa.

Several studies of nucleotide and haplotype variation have indicated that ancestral African populations were geographically structured prior to the migration of modern humans out of Africa ( 72 , 73 , 82 , 158 , 200 , 241 ). Additionally, a recent study of 800 short tandem repeat polymorphisms (STRPs) and 400 /INDELs genotyped in more than 3000 geographically and ethnically diverse Africans indicates the presence of at least 13 genetically distinct ancestral populations in Africa and high levels of population admixture in many regions (F.A. Reed and S.A Tishkoff unpublished data). Population clusters are correlated with self-described ethnicity and shared cultural and/or linguistic properties (e.g., Pygmies, Khoisan-speaking hunter-gatherers, Bantu speakers, Cushitic speakers). This study reveals extensive admixture between inferred ancestral populations in most African populations. One exception is among West African Niger-Kordofanian (i.e., Bantu) speakers who are more genetically homogeneous compared with other African populations, likely reflecting the recent and rapid spread of Bantu speakers from a common origin in Cameroon/Nigeria (although fine-scale genetic structure can be detected amongst these populations). Thus, the pattern of genetic diversity in Africa indicates that African populations have maintained a large and subdivided population structure throughout much of their evolutionary history ( Figure 2 ). Historic subdivision among African populations is likely due to ethnic and linguistic barriers, as well as a number of geographic, ecological, and climatic factors (including periods of glaciation and warming) that could have contributed to population expansions, contractions, fragmentations, and extinctions during recent human evolution in Africa ( 172 , 206 ).

PATTERNS OF LINKAGE DISEQUILIBRIUM

Linkage disequilibrium (LD), the nonrandom association between alleles at different loci, is typically measured using two different estimators: D ’ and r 2 ( 161 ). Levels and patterns of LD depend on a number of demographic factors including population size and structure, as well as locus-specific factors such as selection, mutation, recombination ( 1 , 161 , 206 , 207 ), and gene conversion (see Supplemental Material ). LD is particularly useful for inferring evolutionary and demographic processes, as well as for mapping disease-susceptibility loci. Therefore, an understanding of levels and patterns of LD has broader implications for studies of human evolutionary history and disease.

Empirical Studies of Linkage Disequilibrium

Several haplotype studies have indicated lower levels of LD in African populations compared to non-Africans ( 200 , 202 , 206 , 207 ). Studies of long-range LD between SNP markers at multiple nuclear loci confirmed these initial results and demonstrated that haplotype blocks (where SNPs are in strong LD) extend over greater genomic distances and are more uniform in non-Africans compared to African populations ( 69 , 116 , 174 , 183 )

Given that recombination is an important determinant of the extent of LD, an alternative way to assess LD is to estimate the population recombination rate (ρ = 4N e r , where N e is effective population size and r is the meiotic recombination rate/kb) ( 161 ). Empirical studies have shown that African Americans have higher ρ estimates compared to Europeans and Asians ( 33 , 47 , 60 ), consistent with the results of previous studies that described less LD in Africans relative to non-African populations. The divergent patterns of LD can be explained by the distinct demographic histories of African and non-African populations ( 206 , 208 ) ( Figure 2 ). Specifically, African populations have shorter blocks of LD because ancestral Africans maintained a larger effective population size (N e ), and because there has been more time for recombination to decrease levels of LD. Greater LD in non-African populations is likely the result of a founding event during expansion of modern humans out of Africa within the past 100,000 years ( 200 , 206 , 208 ).

However, an ongoing challenge has been to characterize patterns of LD among populations within continental regions, especially in Africa. Some evidence has suggested variance in levels and patterns of LD among subpopulations in Africa. Tishkoff and colleagues ( 200 ) noted that African populations have divergent patterns of LD; specifically, alleles that were in positive association in one population were in negative association in another. Additionally, a resequencing analysis of the IL-13 gene in 126 geographically diverse Africans identified divergent patterns of LD across West and East African populations ( 195 ) These observations suggest that not all African populations are characterized by a single discrete pattern of LD and each may have distinct haplotype block structures ( Figure 3 ). Theoretically, under a model of population subdivision, allelic associations can differ between populations due to the stochastic effects of genetic drift ( 1 ).

An external file that holds a picture, illustration, etc.
Object name is nihms235397f3.jpg

Several analyses have indicated that haplotype blocks [where single nucleotide polymorphisms (SNPs) are in strong linkage disequilibrium (LD)] extend over greater genomic distances and are more uniform in non-Africans compared to African populations. Additionally, the size and location of haplotype blocks can vary among African samples owing to the distinct demographic histories of populations from different geographic regions in Africa. The blue bars represent haplotype blocks and the thin orange bars denote regions of recombination. Vertical lines indicate SNPs and vertical arrows indicate haplotype tag SNPs (htSNPs). Because haplotype blocks are more variable in Africans compared to non-Africans, identification of htSNPs in diverse African ethnic groups and more dense tag htSNP coverage are needed to detect an association between marker(s) and disease loci in association studies.

Recombination Hotspots

Recombination hotspots, where historical crossing-over events are clustered and separate relatively large haplotype blocks, have been a topic of considerable interest in the scientific literature ( 12 , 42 , 43 ). The occurrence of recombination hotspots in human DNA has been demonstrated empirically from studies of single sperm DNA and from pedigree analyses ( 12 , 43 , 94 ). However, the extent to which this pattern is a general feature of the genome remains unknown, particularly because genetic drift can result in a similar pattern of LD ( 12 , 47 ).

Several recent studies have observed that the locations of most hotspots tend to be shared between diverse populations ( 38 , 47 , 76 ). However, several datasets have suggested that some hotspots may be population specific ( 34 , 38 , 47 , 76 ) and that African and African-American populations have more recombination hotspots relative to non-Africans ( 34 , 47 ). Given that recombination rates vary between species ( 165 , 223 ) and even individuals ( 43 , 101a , 102 ), it is possible that hotspots could also differ among ethnically distinct populations, including Africans. Indeed the identification of haplotypes at the RNF212 gene associated with recombination rate, which occur at different frequencies in the HapMap populations ( 102 ), raises the possibility that population-specific genetic variants may influence recombination rates.

Implications of Linkage Disequilibrium for Association Studies

The mapping of complex disease genes relies on the identification of an association of polymorphic markers, either individually or as haplotypes, with disease susceptibility loci ( 207 ). The International HapMap Project ( http://www.hapmap.org/ ) has characterized patterns of haplotype structure and LD across the human genome to facilitate mapping of complex disease genes ( 40 , 41 , 129 ). Another goal of this project has been to identify haplotype tag SNPs (htSNPs) that distinguish major haplotypes, thereby reducing the number of SNPs needed for association studies ( 207 ).

To date, 3.4 million SNPs have been characterized in 270 individuals from four populations: Yoruba from Nigeria, European-Americans Japanese, and Chinese. Knowledge of the frequency and distribution of these SNPs across ethnically diverse populations is important to assess their usefulness as markers for gene mapping studies in diverse ethnic groups ( 206 , 207 ). A survey of 3024 SNPs spaced across 36 genomic regions genotyped in 927 unrelated individuals from the CEPH human genetic diversity panel indicates that although haplotype block sharing with the HapMap populations is high in European and East Asian populations, sharing for most other populations is low, particularly for haplotypes in African hunter-gatherer populations ( 38 ). These results suggest that development of distinct panels of htSNPs and more dense coverage of SNPs will be needed for African populations ( 38 , 207 ). Seven additional populations have been added to the HapMap inititative: Luhya (Bantu) and Maasai (NiloSaharan) from Kenya, Tuscans from Italy, Gujarati Indians from Texas, metropolitan Chinese in Denver, people of Mexican ancestry in Los Angeles, and African Americans from the Southwest United States . The characterization of SNP and haplotype diversity in these additional populations will be important for the identification of htSNPs that are more informative across ethnically diverse populations.

Although the SNPs used in the HapMap study have been highly informative for use in association mapping studies, the initial identification of SNPs in one or a few populations can result in an ascertainment bias (AB) toward high-frequency, presumably older, SNPs. Several studies have shown that AB can distort estimates of migration rate ( 221 ), mutation rates ( 140 ), recombination rates ( 33 , 140 ), and LD ( 7 , 33 ). Although the effects of AB can sometimes be corrected ( 33 , 142 , 143 ), these correction methods make a number of assumptions that are not applicable to African populations, including the assumption of no population substructure ( 142 ). To more accurately infer human genetic variation it will be necessary to characterize the entire frequency distribution of nucleotide variants in diverse populations. Additionally, because variants associated with disease could be geographically restricted due to new mutation, genetic drift, or regional-specific selection pressure, de novo identification of genetic variation in diverse African populations will be important. The HapMap ENCODE ( http://www.hapmap.org ) and the proposed “1000 genomes” ( http://www.genome.gov ) resequencing projects aim to discover novel variation, including rare SNPs and structural variants, in targeted regions of the genome (as well as in whole genomes for a subset of samples) from the extended HapMap populations and other ethnically diverse populations. The extensive levels of substructure identified in Africa will likely require analysis of additional ethnically and geographically diverse African populations.

NATURAL SELECTION

Natural selection, the process by which favorable heritable traits become more common in successive generations, operates to either increase or decrease the frequency of mutations that have an effect on an individual’s fitness. When a mutation is advantageous it can rapidly increase in frequency, together with linked variants (i.e., genetic hitchhiking), due to positive selection and replace pre-existing variation in a given population (i.e., a selective sweep) ( 83 , 141 , 180 , 206 ). The strength of selection and local rates of recombination dictate how large of a genomic region is affected by a selective sweep. If selection is recent, there may not be enough time for the selected variant to become fixed in the population, resulting in an incomplete selective sweep. The genetic signatures of a selective sweep include a region of extensive LD [extended haplotype homozygosity (EHH)] and low variation on high-frequency chromosomes with the derived beneficial mutation relative to chromosomes with the ancestral allele ( 179 , 203 , 219 ). After this selective sweep, given enough time, new mutations and recombination will occur, leading to an excess of rare variants and a decrease in the extent of LD. Weak purifying selection is also expected to result in an increase of low frequency variants. Under this scheme of selection, deleterious mutations entering the population generally remain at low frequencies because their adverse effect on fitness makes it unlikely that they will reach high frequencies. In contrast, long-term balancing selection (resulting from greater fitness of heterozygotes or when maintenance of multiple alleles in a population is adaptively advantageous) is expected to result in an excess of alleles at intermediate frequency.

Demographic processes can also cause similar skews in the frequency of polymorphisms. For example, when population size rapidly increases, genetic drift has less effect in a rapidly expanding population, leading to an excess of rare polymorphisms (mimicking the pattern seen under positive or purifying selection). In contrast, a population bottleneck is expected to cause the loss of low-frequency variants, and thus produce an excess of intermediate-frequency variants (mimicking the pattern observed under balancing selection) ( 141 ). Although natural selection and demographic history can cause similar departures from a neutral equilibrium model, it is possible to distinguish these forces either by simulating the expected pattern of variation under different demographic scenarios or by using an outlier approach in which targets of selection are identified because they show an unusual pattern of variation or population differentiation compared with an empirical distribution observed at other loci across the genome ( 98 , 141 ). Given the vast number of studies published on natural selection, the next sections focus on a few case studies of genetic and phenotypic adaptation in sub-Saharan Africa.

Malaria Resistance in Africa

Malaria (caused by infection with the Plasmodium falciparum parasite) is a major cause of mortality in sub-Saharan Africa, resulting in more than 1 million deaths (primarily children) each year ( 107 ). Given the enormous impact of malaria in Africa, it is not surprising that this disease has exerted strong selective pressure on African populations during recent human evolutionary history.

A number of genetic variants in African populations have been shown to confer resistance to malaria. One of the best known genetic adaptations is the HbS mutation in the β-globin gene, which causes sickle cell anemia in homozygous individuals. Individuals who are heterozygous for the sickle cell trait are protected against malarial infection and have higher reproductive fitness ( 107 ) which results in the maintenance of the HbS allele at high frequency in many malaria endemic regions. A recent genetic study observed long range LD extending over 400 kb at the β-globin locus on haplotypes with the HbS mutation in West African and Caribbean African populations, consistent with the pattern expected under positive selection ( 81 ). Other well-known hemoglobin variants associated with malaria resistance in African populations include hemoglobins C (HbC) and E (HbE). Studies have also identified pattens of LD on chromosomes that contain either the HbC variant ( 236 ) or the HbE variant ( 147 ) that are consistent with recent positive selection.

Glucose-6-phosphate dehydrogenase ( G6PD ) mutations that result in reduced enzyme activity are also associated with malaria resistance ( 205 ). The most common G6PD mutation in Africa, G6PD A-, occurs at a frequency of ~25% in malaria endemic regions ( 205 ). Several empirical studies have found evidence for recent selection at the G6PD locus in African populations. For example, a study of SNP and microsatellite haplotype variability demonstrated a signature of strong positive selection of the A- variant ( 205 ). On the basis of the breakdown of LD between the microsatellite markers and the G6PD A- variant, the age of this variant was estimated to be between 3840 and 11,760 years ( 205 ). Similarly, nucleotide sequence analyses of the G6PD locus in Africa also showed patterns of variation consistent with recent positive selection ( 181 , 217 ). A signature of a recent partial selective sweep was also supported by two studies that showed extensive LD extending >400 kb on chromosomes with the G6PD A- mutation ( 179 , 182 ). A comparative analysis of human and nonhuman primates suggested that signatures of selection at the A- allele are unique to humans ( 218 ). Overall, these data are consistent with other evidence suggesting that the malaria parasite has had a significant impact on humans only within the past 10,000 years ( 96 , 220 ), possibly corresponding with the development of agriculture and/or pastoralism in Africa ( 205 ).

The Duffy gene on chromosome 1 confers resistance to malaria caused by Plasmodium vivax , which is not prevalent in Africa today but may have been in the past. The Duffy gene encodes a receptor on the surface of erythrocytes and is characterized by three alleles ( FY*A , FY*B , and FY*O ). The frequency of the FY*O allele is at or close to fixation in most sub-Saharan African populations, but is very rare outside of Africa ( 79 ). A resequencing study of nucleotide variation at the FY locus in five sub-Saharan African populations, and in a comparative Italian population ( 79 ), reported that variation at this locus is two- to three- fold lower in Africans than in the Italian sample, which is the opposite pattern observed at most loci. A more extensive resequencing analysis of this locus ( 80 ) also revealed reduced sequence variation around the FY*O mutation and an excess of high-frequency derived alleles at linked sites, consistent with a selective sweep of this region. Additionally, researchers observed unusually large F ST values for the FY*A and FY*O variants at this locus across African, European, and Asian populations, consistent with local adaptation in different geographic regions. These results have led to the conclusion that positive selection has been a dominant force in shaping the distribution of Duffy alleles among human populations.

Dietary Adaptations in African Populations

Lactase persistence (LP), the ability to digest milk and other dairy products into adulthood, is a classic example of a genetic adaptation in humans. LP varies in frequency in different human populations; it is most common in northern Europeans and certain African and Arabian nomadic tribes that practice pastoralism and is at low frequency in East Asians and West sub-Saharan Africans ( 88 ). A number of studies have demonstrated a strong association between LP and the presence of the T allele at a C/T SNP located −13910 kb upstream from the lactase gene ( LCT ) in European populations ( 58 , 159 ). In vitro studies also showed that the T-13910 variant enhances gene transcription from the LCT promoter ( 110 , 148 ). In a study of long-range LD in Europeans, Bersaglieri and colleagues ( 21 ) found that haplotypes containing the LP-associated T-13910 variant were largely identical over nearly a 1-Mb region, consistent with a strong selective sweep. In genome-wide scans of selection in the HapMap samples, the LCT gene showed the strongest signal of positive selection in Europeans ( 180 , 219 ).

Although the T-13910 variant is likely the causal mutation of the lactase persistence trait in Europeans, analyses of this SNP in ethnically and geographically diverse African populations indicated that it is present in only a few West African pastoralist populations, such as the Fulani (or Fulbe) and Hausa from Cameroon ( 35 , 135 , 136 ). These results suggested that the T-13910 allele may not be a strong predictor of lactase persistence in most sub-Saharan Africans. A more recent genotype and phenotype association study in a sample of 43 populations from Tanzania, Kenya, and the Sudan identified three novel SNPs located ~14 kb upstream of LCT that are significantly associated with the LP trait in African populations ( 203 ). These SNPs are located within 100 bp of the European LP-associated variant (C/T-13910). One LP-associated SNP (G/C-14010) is common in Tanzanian and Kenyan pastoralist populations, whereas the other two (T/G-13915 and C/G-13907) are common in northern Sudanese and Kenyans. The derived alleles at these loci (C-14010, G-13915, and G-13907) were shown to enhance transcription from the LCT promoter in vitro ( 203 ). Genotyping of 123 SNPs across a 3-Mb region in these populations demonstrated that these African LP-associated variants exist on haplotype backgrounds that are distinct from the European LP-associated variant and from each other. In addition, haplotype homozyogisty extends >2 Mb on chromosomes with the LP-associated C-14010 variant, consistent with an ongoing selective sweep over the past 3000--7000 years. An independent study of Sudanese populations also identified a significant association of the G-13915 allele with LP in that region ( 90 ) and a recent study confirmed the enhancer effect of the G-13915 variant, together with a C-3712 variant, on the same haplotype background ( 57 ). These data indicate a striking example of convergent evolution and local adaptation due to strong selective pressure resulting from shared cultural traits (e.g. cattle domestication and adult milk consumption) in Europeans and Africans. These studies also demonstrate the effect of local adaptation on patterns of genetic variation and the importance of resequencing across geographically and ethnically diverse African populations to identify population-specific variants associated with variable traits, including disease susceptibility.

Another important dietary adaptation in human populations is the ability to taste bitter compounds. A hypothesized selective advantage of bitter taste is that it helps individuals avoid ingesting toxic substances in plants. Variation at the TAS2R genes is associated with sensitivity to bitter taste substances ( 54 ). An analysis of nucleotide and haplotype variation at 24 TASR2 genes in 55 globally diverse individuals also identified substantial amino acid diversity, an excess of nonsynonymous substitutions, and high levels of population differentiation at variable sites, suggesting that amino acid variability at these loci may be maintained due to natural selection ( 99 ).

There have also been a number of in-depth studies at individual TAS2R loci for which there are known associations with bitter taste perception. For example, the ability to taste phenylthiocarbamide (PTC), a synthetic bitter substance, is a highly variable trait in humans ( 83 ). Although several TAS2R loci contribute to variability in PTC taste perception ( 54 , 55 ), 50%--85% of the phenotypic variance in PTC sensitivity is attributed to variation at the TAS2R38 gene ( 237 ).. Studies have identified three amino acid substitutions at TAS2R38 that are in nearly complete LD in non-African populations and that form two common amino acid haplotypes (a taster haplotype PAV and a nontaster haplotype AVI; PAV is dominant). Furthermore, considerably more haplotype variability has been observed in Africa (M.C. Campbell and S.A. Tishkoff, unpublished data, 238) and these haplotypes are associated with a broad range of taste perception phenotypes (M.C. Campbell and S.A. Tishkoff, unpublished data).

Genetic analyses of both African and non-African populations have detected signatures of balancing selection at the TAS2R38 locus, including an excess of intermediate-frequency variants, a low amount of genetic differentiation between the continental populations ( F ST = 0.056), and an ancient divergence between the major taster and nontaster haplotypes ( 238 ). Furthermore, Wooding and coworkers ( 237 ) showed that PTC taste sensitivity in chimpanzees is associated with different amino acid haplotypes at the TAS2R38 gene compared to humans, implying a unique origin of the taster/nontaster variants in humans and chimpanzees.

It has also been suggested that low sensitivity to bitter taste may provide a selective advantage against malarial infection in some African populations. Recent data have shown that the K172 allele at the TAS2R16 gene (which is associated with low sensitivity to bitter taste substances, including salicin) occurs at moderately high frequencies in malaria endemic regions in central Africa ( 189 ). Furthermore, sequence analysis of the entire TAS2R16 coding region, as well as part of the 5' and 3' untranslated regions (UTRs), in 997 individuals from 60 human populations detected a signature of positive selection on chromosomes with the K172N variant ( 189 ). Although the variant driving the signal of positive selection was not conclusively determined, the authors speculated that differential selection in malarial (favoring the K variant) and non-malarial (favoring the N variant) environments has maintained both alleles at relatively high frequencies in Africa. An earlier study suggested that the higher dietary intake of naturally occurring bitter substances, such as organic cyanogens, may be protective against malarial infection in populations from Central and Southeast Africa ( 91 ]]). Additionally, an inhibitory effect of cyanide on the normal development of the P. falciparum parasite has been observed in vitro ( 137a ]). Thus, individuals with a low sensitivity to the bitter taste of cyanide compounds may have a survival advantage against malarial infection through a higher intake of this bitter compound ( 189 ). However, this hypothesis remains to be tested.

Selection in African Versus Non-African Populations

Several studies have reported more evidence for positive selection in populations outside of Africa relative to those in Africa. For example, Akey and colleagues ( 6 ) examined 132 genes in 24 African Americans and 23 European Americans and found evidence for selection at eight genes only in the European-derived population. A number of genome scans for selection (which aim to identify de novo targets of selection) have identified differential patterns of selection in Africans and non-Africans ( 25 , 84 , 97 , 98 , 137 , 180 , 192 , 219 , 225 , 234 ). Several of these studies have observed more loci under recent selection in non-African relative to African populations ( 25 , 97 , 137 , 192 , 234 ). Furthermore, it has been hypothesized that non-African populations have experienced more recent strong local adaptation as modern humans migrated out of Africa into novel and diverse environments ( 6 , 84 , 192 , 234 ).

Although an increase in positive selection might conceivably occur in populations that have migrated into new environments, it is premature to conclude that the amount of recent positive selection is greater in non-African versus African populations. For example, Voight and colleagues ( 219 ) identified widespread evidence of recent selection in each of the HapMap samples in a genome-wide scan. Furthermore, they observed the strongest signals of selection in the HapMap Nigerian population compared to HapMap European and Asian populations (although the power to detect selection may be greater in larger African populations). Additionally, a recent study demonstrated that non-Africans have an excess proportion of nonsynonymous variation, including many variants that are likely to be deleterious ( 114 ), which is attributed to population demographic history rather than increased adaptive evolution. Therefore, demographic factors may have influenced the differential pattern of selection observed in African and non-African populations.

However, the primary limitation in comparing the frequency of selective events in African and non-African populations is that African populations have been severely understudied. For instance, many studies have used African Americans as the sole representative of African populations. However, the statistical power to detect selective sweeps is likely to be lower in studies using AfricanAmerican samples because of their recent admixture with Europeans ( 25 , 234 ). Demographic parameters such as a population bottleneck in the Eurasian populations may also have mimicked patterns of variation caused by selective events in non-African populations ( 25 ). Indeed, one might predict that Africans would have relatively high amounts of local adaptation, considering that Africa has the highest levels of genetic diversity and contains populations living in a wide range of environments and with high exposure to infectious disease. However, signatures of selection in African populations may be missed because studies have relied mainly on one or two African populations. To gain a clearer understanding of genetic and phenotypic adaptations in Africa, it is important to scan for genetic signatures of selection in a broad range of ethnically diverse African populations living in distinct environments. Additionally, it will be important to clearly identify functional variants that are likely to be targets of selection and to verify their impact on phenotypic variation ( 32 ) before the relative number of selection events in African and non-African populations can be clearly determined .

INFECTIOUS DISEASE

Host genetic variation plays a key role in influencing susceptibility to many infectious diseases in humans. Through recurrent exposure to different pathogens, a number of genetic adaptations have evolved that provide resistance to infection. Although the number of known candidate genes related to infectious disease has expanded, progress in the identification of genes that influence infectious disease susceptibility and/or resistance in diverse African populations has been slow. Understanding the genetic basis of infectious disease in Africans may provide useful insight into devising effective strategies to combat these diseases that have a large impact on African populations. Here we focus on three infectious diseases that cause the highest number of deaths in Africa: acquired immune deficiency syndrome (AIDS) tuberculosis (TB) and malaria.

Acquired Immune Deficiency Syndrome

It is estimated that 42 million people are infected worldwide with HIV, the virus that causes AIDS ( 146 ). Moreover, greater than 75% of HIV-1 infections and 84% of all AIDS-related deaths occur in sub-Saharan Africa ( 235 ). Although most individuals exposed to HIV rapidly progress to more advanced stages of this disease, researchers have identified a number of Africans who do not progress to AIDS despite exposure to HIV ( 146 ). This observation suggests that polymorphisms associated with disease susceptibility and resistance may be present in African populations ( Table 1 ).

Infectious disease loci

Chemokine receptors, which aid in the entry of HIV into the host cell, play a role in AIDS susceptibility ( 146 ). The protective effect of the Δ32 mutation at the chemokine receptor 5 ( CCR5 ) gene has been well established in populations of Northern European ancestry ( 108 ). Because this mutation occurs at a relatively low frequency in sub-Saharan Africa, it is unlikely to play a major role in disease resistance in that region. However, several studies have demonstrated that mutations at the CCR2 gene play a role in resistance to HIV in Africans; for example, the CCR2–64I allele is associated with delayed HIV disease progression in African and AfricanAmerican populations ( 118 , 233 ), although this effect was not observed in a large number of individuals from Uganda ( 170 ), indicating that underlying genetic and/or environmental factors that affect resistance may vary across African populations. See Table 1 for more details about other genes identified in African popluations.

Tuberculosis

Mycobacterium tuberculosis infection leading to TB causes significant mortality throughout the world, particularly in resource-poor countries. Furthermore, HIV infection is strongly associated with an increased risk for TB in sub-Saharan Africa ( 45 , 119 ). In Africa, the rates of tuberculosis range from 50 to greater than 300 cases per 100,000 individuals ( 215 ). Genetic variation has been shown to influence susceptibility to TB in African populations. For example, polymorphisms in the NRAMP1 gene have been associated with increased TB susceptibility in ethnically diverse populations from the Gambia ( 20a ) and a single population in northern Tanzania ( 188 ). Other genes that are thought to play a role in TB susceptibility are shown in Table 1 .

As previously mentioned, malaria infection has been a strong selective force in recent human evolution. Approximately 40% of the world is at risk for malaria infection, and approximately 90% of all malaria deaths occur in sub-Saharan Africa ( 67 ). It is estimated that 500 million new cases of malarial illness caused by the Plasmodium falciparum parasite occur every year in Africa ( http://www.rbm.who.int/amd2003/amr2003/ch1.htm ).

Most of the common variants associated with resistance to malarial infection in Africans are expressed in red blood cells or play a role in immune response. These variants include hemoglobin HbS, HbC, HbE and α+-thalassemia, the G6PD A- allele, the FY*O Duffy allele (which prevents P. vivax from invading erythrocytes), and a number of HLA alleles ( 67 , 83 , 86 ).

Interestingly, several studies have shown that ethnically diverse African populations may differ in regard to genetic susceptibility to malarial infection. A variant in the promoter region of the IL4 gene, for example, is associated with a decrease in P. falciparum infection in the pastoralist Fulani from Mali, as evidenced by lower parasite load, but no such genetic association is observed in the neighboring agriculturalist Dogon population ( 52 , 213 ). Other studies also reported lower prevalence of malaria parasites and fewer clinical attacks of malaria among the Fulani compared to other ethnically distinct populations in neighboring villages ( 107 , 133 , 152a ). Differences in the expression profile of genes involved in immune response in the Fulani have been suggested to explain the distinct resistance to malaria in this population, but not in neighboring African populations ( 210 ). Therefore, novel genetic adaptations to malaria have evolved in genetically distinct populations in response to differential exposure to pathogens. These studies demonstrate that ethnically diverse African populations may have different resistance or susceptibility alleles ( Table 1 ), motivating the need to include a large number of genetically distinct populations in studies of infectious disease susceptibility in Africa.

OTHER COMPLEX DISEASES IN AFRICAN POPULATIONS

There are a number of useful approaches for mapping complex disease traits that involve analyses of the association of markers and disease traits in pedigrees or parent/offspring trios [i.e., linkage and transmission disequilibrium test (TDT) analyses], and/or in populations (i.e., case/control association studies) ( 87 ). Several genome-wide SNP panels have been developed for use in genome-wide association studies (GWAS) in populations. Association mapping for complex disease relies on having some a priori knowledge of genetic diversity, population structure, and LD in both case and control populations to identify polymorphisms associated with disease genes and to avoid erroneous associations ( 122 ). Given the high levels of substructure and admixture between genetically distinct populations in Africa, even within small geographic regions, it is particularly important to control for population heterogeneity and substructure in GWAS of African populations [using, for example, programs such as PLINK ( 166 )]. Another useful method for mapping complex traits in highly admixed populations (e.g., African Americans who have African and European ancestry) is mapping by admixture linkage disequilibrium (MALD) (see Supplemental Material ). The MALD approach assumes that the genomic region that contains the susceptibility allele for a given disease will have enriched ancestry from populations in which the disease phenotype is more prevalent. Thus, detailed characterization of allelic variation across ancestral West African populations, which will enable accurate inference of African ancestry, will be important for the success of this approach.

Overall, the genetic factors underlying complex diseases are still poorly understood. To date, two models of complex disease have been proposed. The common disease/common variant (CD/CV) hypothesis posits that alleles influencing complex diseases are relatively common and are therefore found in multiple populations ( 51 , 206 ). In contrast, some data have suggested that complex disease is caused by rare susceptibility alleles at many loci with small effect ( 50 , 160 ). Additionally, gene × environment interactions as well as epistatic interactions among loci likely influence complex disease susceptibility (including infectious disease susceptibility) ( 44 , 230 , 232 ). Due to local adaptation, there may be population or regional specific susceptibility alleles underlying some complex diseases in Africa. Here we discuss three complex diseases that are common in populations of recent African descent (see Supplemental Material for a review of genetic susceptibility to prostate cancer in African descent populations and Reference 187 for a detailed review of additional genetic disease susceptibility studies in Africa).

Obesity is a multifactorial disease that disproportionately affects African Americans and Afro-Caribbeans living in the United States ( 151 ). Moreover, this disease is increasing in prevalence in sub-Saharan Africa ( 13 ), particularly among urban residents . A recent study reported that the incidence of obesity in urban West Africa has more than doubled (114%) over the past 15 years ( 2 ). Obesity is a serious health concern because it is closely associated with other common disorders, such as type 2 diabetes and hypertension ( 208 ).

Although environmental factors are important determinants of obesity ( 92 , 132 ), studies have also identified candidate loci that contribute to the onset of this disease. For example, the human uncoupling gene UCP3 has been correlated with obesity and lower resting energy expenditure in African Americans and the Mende tribe of Sierra Leone ( 10 , 100 ); in contrast, risk for obesity associated with this gene varied among non-African populations ( 48 , 134 , 152 ). Another closely linked human uncoupling gene, UCP2 , was also correlated with obesity in AfricanAmerican children ( 240 ). However, whether or not these tightly linked genes exert an independent effect on obesity in populations of African ancestry has not been firmly established.

Other candidate gene studies have reported associations between polymorphisms in promoter regions and weight-related phenotypes in populations of African descent. For example, polymorphisms in the promoter of the angiotensin-converting enzyme ( ACE ) gene are correlated with obesity in Nigerians and African Americans ( 105 ). Also, allelic variation in the promoter of the agouti-related protein ( AGRP ) gene was found to affect gene expression and was implicated in the regulation of body weight in people of African origin ( 11 , 22 ).

Genome-wide linkage analyses of obesity-related phentoypes have indicated strong linkage to chromosomes 1, 2, 5, 7, 8, and 11 in West Africans ( 4 , 29 ). These results demonstrate that multiple loci, together with environmental factors, likely contribute to the phenotypic variance of obesity-related traits.

Type 2 Diabetes Mellitus

Type 2 diabetes mellitus (T2DM) is a late-onset metabolic disorder characterized by reduced insulin secretion and insulin action ( 89 ). African Americans have a twofold increase in risk for T2DM compared to other populations in the United States. Furthermore, the prevalence of T2D is lower in Africa (~1%--2%) than among people of African descent in industrialized nations (~11%--13%)( 177 ).

One of the earliest T2DM susceptibility loci implicated in this disease is the calpain 10 ( CAPN10 ) gene on chromosome 2 in Mexican Americans ( 89 ). Subsequent association studies, however, have yielded inconsistent results among geographically distinct populations ( 70 ). In addition, the risk associated with this gene also differs between ethnically diverse populations from West Africa. Specifically, a CAPN10 haplotype defined by known risk polymorphisms was associated with T2DM in populations from Nigeria, but not in distinct populations from Ghana ( 31 ). Further study in a larger set of Africans will be needed to resolve this discrepancy.

Other recently identified candidate loci associated with diabetes or diabetes-related traits in Africans and African Americans include the AGRP gene ( 22 ) the transcription factor 7-like 2 ( TCF7L2 ) gene ( 85 ), and the proprotein convertase subtilisin/kexin-type 2 ( PCSK2 ) gene ( 108a ). Genome-wide linkage analyses in West African families have also indicated suggestive linkage to diabetes on chromosomes 12 and 19 as well as stronger evidence of linkage on chromosome 20 ( 177 ). Other studies have reported strong linkage for several quantitative traits that contribute to diabetes on chromosomes 4, 6, 8, 10, 15, 16, 17, and 18 ( 28 , 30 , 178 ).

Increased risk for type 2 diabetes in Africans and other indigenous populations is also suggested to be due to changes in selective pressure (i.e., the thrifty gene hypothesis) ( 139 ). Prior to 10,000 years ago, all modern humans subsisted as hunter-gatherers who likely experienced frequent cycles of feast and famine. According to the thrifty gene hypothesis, ancestral genetic variants that once promoted the efficient absorption, storage, or utilization of nutrients in this ancestral environment are now maladaptive in more modern environments, increasing risk for disease ( 51 , 153 ). Although genetic evidence for this hypothesis has been inconclusive ( 34 , 68 , 214 ), recent data from the Macaque Genome Project indicated that a number of polymorphisms in the macaque correspond to known disease-predisposing alleles in humans ( 39 ). These results suggest that ancestral variants may influence disease susceptibility in humans .

Hypertension

Hypertension (HT) disproportionately affects people of African descent living in Western environments. For example, African Americans have a 1.6-fold higher prevalence of HT than European Americans ( 49 ). However, HT was not considered to be a major disease in sub-Saharan Africa until recently. Studies have now reported a growing incidence of HT, especially in urban settings in Nigeria, Cameroon, and Tanzania ( 56 , 212 ).

To date, candidate gene studies have shown inconsistent associations with HT. For example, increased risk for disease was correlated with the G protein β3-subunit ( GNB3 ) gene, angiotensin (AGT), and angiotensin II receptor (AGTR1) in some populations of African ancestry ( 53 , 123 , 186 , 245 ), but these results were not replicated in other studies ( 61 , 216 , 231 ). A study of the heritability of AGT and ACE levels in Nigerians and African Americans indicated that heritability is high in Nigerians (77% for AGT and 67% for ACE) but low in African Americans (18% for AGT and ACE), suggesting a strong environmental component to variability in African American populations ( 44 ).

Other evidence has suggested that interactions between susceptibility loci may play an important role in HT susceptibility. For example, polymorphisms in the ACE, ACE4, and ACE8 genes were jointly associated with blood pressure through epistatic interaction in Nigerian families ( 244 ). Moreover, interaction between genes ( ACE and GRK4 ) on different chromosomes was also found to influence blood pressure in a cohort of West African families ( 231 ). These studies suggest a complex model of disease susceptibility that involves epistatic interactions and may account for inconsistent associations in previous studies.

The results of genome-wide linkage studies have also suggested that chromosome 6q24 and 21q21 likely contain genes that influence risk for hypertension in African Americans. In another study, fine mapping of HT susceptibility loci on chromosome 6 uncovered polymorphisms in the vanin 1 ( VNN1 ) gene that were correlated with HT in African Americans and Mexican Americans ( 246 ,).

From an evolutionary perspective, salt retention (a characteristic of HT) has been proposed to represent a phenotypic adaptation to heat. Specifically, ancient human populations living in hot, humid areas, who consumed low levels of dietary salt, were theorized to have adapted to their environment by retaining salt, while populations in cooler, temperate climates adapted to conditions of higher sodium levels ( 14 ). Under this scenario, polymorphisms that promote salt retention will increase in frequency in hot and humid environments due to selective pressure. This hypothesis was recently supported by several studies that observed the highest frequency of variants associated with salt retention and/or high blood pressure in Africans and decreasing allele frequency outside of Africa ( 138 , 198 , 242 ).

Pharmacogenetics

Individuals with distinct ancestry are known to vary in drug response. For example, drugs commonly used to treat heart disease are known to be less effective in individuals of African descent relative to individuals of European descent (see Supplemental Material ). To better understand variation in drug response, it is important to characterize levels and patterns of genetic diversity at genes that may influence drug response, drug metabolism, and/or transport in ethnically diverse populations ( 23 ). To date, some of the most intensively studied drug metabolism enzyme (DME) loci include cytochrome P450s (CYPs), N-acetyl transferases (NATs), and multidrug transporters (MDR). These genes are highly polymorphic and their variation results in proteins with increased, normal, or decreased activity ( 187 ).

Studies have shown differences in the distribution of variation at these DME loci in African and non-African populations. For example, the CYP2B6 gene is involved in the metabolism of several clinically important drugs, including artemisinin and efavirenz used to treat malaria and HIV infection, respectively. Common CYP2B6 polymorphisms vary in frequency between ethnically diverse populations from West Africa, as well as between African and non-African populations. The functional significance of most of these variants is not yet known ( 131 ). However, recent studies have shown that two polymorphisms (983T>C and 516G>T), found at a higher frequency in African populations relative to non-Africans, are associated with a reduction in CYP2B6 protein expression as well as a large increase in levels of the antiretroviral drug efavirenz in the plasma of African HIV patients ( 130 , 145 , 239 ).

Additionally, the frequency of functional variants at the NAT2 gene, known to play a role in the metabolism of the drug isoniazid (used to treat TB), was found to vary among ethnically diverse African populations. In particular, haplotypes associated with the fast-acetylation and the slow-acetylation phenotypes at the NAT2 gene differ in frequency among African populations, particularly between hunter-gatherer and agriculturalist populations (H.M. Mortensen and S.A.Tishkoff, unpublished data, 154, 155). However, Africans have high levels of haplotype diversity and the effect of many of these haplotypes remains unknown. These studies imply that ethnically diverse Africans may differ in response to drugs used to treat infectious disease due to variation at genes involved in drug metabolism or transport. Given the high numbers of deaths due to infectious disease in Africa, the study of variation at genes that play a role in response to these drugs across ethnically diverse Africans is of critical importance.

FUTURE DIRECTIONS

To date, only a fraction of the ~2000 linguistically distinct ethnic groups in Africa has been extensively studied for genome-wide variation. Extensive sampling of East African populations will be informative for testing models of the origin and dispersal of modern humans out of Africa, whereas in depth sampling of West African populations will be informative for determining African American ancestry and for the identification of markers and populations useful in MALD mapping. It is important to use an ethical approach when collecting samples to be used for genetic diversity studies, particularly if those samples and genetic data will be made publicly available. In addition to obtaining research permits from the local African governments and informed consent from individual participants (including benefits and risks involved in use of samples in current and future studies), results should ideally be made available to participants after translation into the local language. Additionally, efforts should be made to train local African scientists and to build resources across Africa for independent human genetics research. The African Society of Human Genetics was recently formed in 2004 to help achieve many of these goals ( 176 ) ( http://www.afshg.org ).

As we begin to build diverse sets of samples, a shift toward genome-wide studies of genetic diversity will be particularly informative for making inferences about African demographic history and the genetic basis of complex traits. A better understanding of the distribution and frequency of structural variation and its role in regard to phenotypic diversity will be of particular interest. The development of high-throughput SNP genotyping methodology (i.e., the Affymetrix 6.0 and Illumina 1M SNP chips), which is rapidly becoming less expensive, will facilitate the possibility of GWAS in a large number of Africans. Indeed, several recent GWAS in Europeans that have identified genetic variants associated with traits such as height, skin pigmentation, and eye color indicate that very large sample sizes (>3000--5000) are required to identify variants associated with complex traits influenced by multiple loci and the environment ( 194 , 227 ). However, to date no GWAS have been perfomed on a comparable scale in Africa. Such studies will be highly informative for identifying genes that play a role in a number of common traits (e.g., height), as well as for identifying genes that play a role in susceptibility to infectious and other complex diseases. The high levels of genetic substructure in Africa, even within small geographic regions ( F.A. Reed and S.A. Tishkoff, unpublished data), require determination of individual ancestry and proper correction for substructure in association studies.

Genotyping ethnically diverse African populations living in distinct climates and with distinct subsistence patterns for these high density SNP panels will also be useful for conducting wholegenome scans of selection to identify genes that have played a role in local adaptation and disease. The continued development of statistical and computational methods for inferring demographic history and natural selection will shed light on human evolutionary history in Africa. Approaches that incorporate detailed geographic information such as natural boundaries (i.e., mountain ranges, rivers, deserts, etc.) will be particularly useful for inferring African demographic history ( 163 , 172 ).

Given that African populations possess a large fraction of population-specific alleles and may have experienced local adaptation, resequencing across diverse African populations will be important for identifying population-specific functional variants. Targeted resequencing of genes that play a key role in disease susceptibility and drug response will be particularly important for the design of more effective treatments in individuals of recent African descent. Additionally, whole genome resequencing (a goal of the 1000 genomes project) will be informative for identifying large-scale structural variants and rare variants that may play an important role in disease and for reconstructing human evolutionary history.

Supplementary Material

Acknowledgments.

We thank S.M. Williams, J.M. Akey, J.S. Friedlaender, and N.A. Rosenberg for critical review of sections of the manuscript and/or figures and for helpful suggestions. The authors are funded by the U.S. National Science Foundation (NSF) grant BSC-0552486, U.S. National Institutes of Health (NIH) grant R01GM076637, and a David and Lucile Packard Career Award to S.A.T.

DISCLOSURE STATEMENT

The authors are not aware of any biases that might be perceived as affecting the objectivity of this review.

LITERATURE CITED

Genetic Diversity: Sources, Threats, and Conservation

  • Living reference work entry
  • First Online: 25 June 2019
  • Cite this living reference work entry

research about genetic diversity

  • Marina Nonić 7 &
  • Mirjana Šijačić-Nikolić 7  

Part of the book series: Encyclopedia of the UN Sustainable Development Goals ((ENUNSDG))

890 Accesses

3 Citations

Diversity of genes ; Genetic differences ; Genetic variation ; Genetic variety

Genetic diversity is a fundamental source of biodiversity which has been defined by different authors as “any measure that quantifies the magnitude of genetic variability within a population” (Hughes et al. 2008 ) or “the very makeup of the variation of organisms and species on Earth” (Elliott 2002 ). According to Ennos et al. ( 2000 ), genetic diversity presents “the range and sum of genetic variation within a population or populations,” where the term diversity , which simply means the state of displaying dissimilarities, differences, or variety, has acquired an extended meaning which signifies the sum of the variation.

Each individual species is made up of individuals that possess their genes, which are the source of its own unique features (genes are responsible for both the similarities and the differences between organisms). A species may have different populations, each having different...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Abdel-Mawgood AL (2012) DNA based techniques for studying genetic diversity. In: Caliskan M (ed) Genetic diversity in microorganisms. InTech, Rijeka, pp 95–123

Google Scholar  

Avise JC (2004) Molecular markers, natural history, and evolution, 2nd edn. Sinauer Associates, Sunderland

Avise JC, Hamrick JL (1996) Conservation genetics: case histories from nature. Chapman and Hall, New York

Book   Google Scholar  

Baye TM, Abebe T, Wilke RA (2011) Genotype-environment interactions and their translational implications. Per Med 8(1):59–70

Article   Google Scholar  

Beatty CR, Cox NA, Kuzee ME (2018) Biodiversity guidelines for forest landscape restoration opportunities assessments, 1st edn. IUCN, Gland. v + 43pp

Bošković J, Isajev V (2007) Genetika (Genetics). Megatrend University, Belgrade, 552 pages

Brook BW, Tonkyn DW, O’Grady JJ, Frankham R (2002) Contribution of inbreeding to extinction risk in threatened species. Conserv Ecol 6(1):16. http://www.consecol.org/vol6/iss1/art16

Caliskan M (2012).Genetic diversity in microorganisms. InTech, Rijeka. https://doi.org/10.5772/2641 . ISBN: 978-953-51-0064-5

Candolin U, Heuschele J (2008) Is sexual selection beneficial during adaptation to environmental change? Trends Ecol Evol 23(8):446–452

Charlesworth B (2001) The effect of life-history and mode of inheritance on neutral genetic variability. Genet Res 77:153–166

Article   CAS   Google Scholar  

Charlesworth B (2009) Effective population size and patterns of molecular evolution and variation. Nat Rev Genet 10:195–205

Corl A, Ellegren H (2012) The genomic signature of sexual selection in the genetic diversity of the sex chromosomes and autosomes. Evolution 66(7):2138–2149

Dražić G (2015) Biological resources in the service of ecoremediation. In: Milovanović J, Djordjević S (eds) Conservation and enhancement of biological resources in the service of ecoremediation. Faculty of applied ecology Futura, Singidunum University, Belgrade, pp 13–28. ISBN: 978-86-86859-41-9

Elliott LJ (2002) The effects of the decline of genetic biodiversity on the prosperity and well-being of mankind. In: Genetics in human affairs. https://projects.ncsu.edu/cals/course/gn301/GeneticBiodiversity.html

Ennos RA, Worrell R, Arkle P, Malcolm DC (2000) Genetic diversity and conservation. In: Genetic variation and conservation of British native trees and shrubs current knowledge and policy implications. Forestry commission technical paper 31. Forestry Commission, Edinburgh

Eriksson G, Ekberg I (2001) An introduction to forest genetics. Swedish University of Agricultural Sciences, Uppsala, pp 1–166

Fjeldsa J, Lovett J (1997) Biodiversity and environmental stability. Biodivers Conserv 6:315–323

Freeman WH (2000) Sources of variation. In: Griffiths AJF, Miller JH, Suzuki DT, Lewontin RC, Gelbart WM (eds) An introduction to genetic analysis, 7th edn. W.H. Freeman, New York. ISBN: 10: 0-7167-3520-2. https://www.ncbi.nlm.nih.gov/books/NBK22012/

Galov A (2007) Genetic diversity of bottlenose dolphin ( Tursiops truncatus ) and notes on genetic diversity of other cetacean species in the Adriatic Sea. Doctoral thesis, University of Zagreb-Faculty of Science Department of Biology, 105 pages

Gardner EJ (1964) Principles of genetics, 2nd edn. Wiley, New York, 342 pages

GD (2018) Genetic diversity. http://www.biodiversity.ru/coastlearn/bio-eng/boxes/geneticdiv.html

Gilpin ME, Soule ME (1986) Minimum viable populations: processes of species extinction. In: Soule ME (ed) Conservation biology: the science of scarcity and diversity. Sinauer Associates, Sunderland

Govindaraj M, Vetriventhan M, Srinivasan M (2015) Importance of genetic diversity assessment in crop plants and its recent advances: an overview of its analytical perspectives. Genet Res Int 2015, 1–14, Article ID 431487. https://doi.org/10.1155/2015/431487 . Hindawi Publishing Corporation

Hamrick JL, Godt MJW (1996) Conservation genetics of endemic plant species. In: Avise JC, Hamrick JL (eds) Conservation genetics: case histories from nature. Chapman and Hall, New York, pp 281–304

Chapter   Google Scholar  

Hartl DL, Ruvolo M (2012) Genetics: analysis of genetics and genomes. Jones & Bartlett, Burlington, 804 pages

Hattemer HH (1991) Measuring genetic variation. In: Müller-Starck G, Ziehe M (eds) Genetic variation in European populations of forest trees. Sauerlander’s Verlag, Frankfurt am Main, pp 2–20

Hughes RA, Brian D, Inouye BD, Marc TJJ, Underwood N, Vellend M (2008) Ecological consequences of genetic diversity. Ecol Lett 11:609–623

IPGRI and CU – International Plant Genetic Resources Institute and Cornell University (2003) Genetic diversity analysis with molecular marker data: learning module. Measures of Genetic Diversity, IPGRI/Cornell University, Maccarese/Ithaca

Isajev V, Šijačić-Nikolić M (2003) Praktikum iz genetike sa oplemenjivanjem biljaka (Auxiliary University textbook: “Practicum in genetics with plant breeding”). Faculty of Forestry-University of Belgrade, Belgrade-Serbia and Faculty of Forestry-University of Banja Luka, Banja Luka-Bosnia and Herzegovina, pp 1–240

Kirkpatrick M, Ryan MJ (1991) The evolution of mating preferences and the paradox of the lek. Nature 350:33–38

Levin SA (2001) Encyclopedia of biodiversity. Academic, San Diego

Maehr DS, Crowley P, Cox JJ, Lacki MJ, Larkin JL, Hoctor TS, Harris LD, Hall PM (2006) Of cats and haruspices: genetic intervention in the Florida panther. Anim Conserv 9:127–132. https://doi.org/10.1111/j.1469-1795.2005.00019.x . Response to Pimm et al

Markert JA, Champlin DM, Gutjahr-Gobell R, Grear JS, Kuhn A, McGreevy TJ, Roth A, Bagley MJ, Nacci DE (2010) Population genetic diversity and fitness in multiple environments. BMC Evol Biol 10:205

Milligan BG, Leebens-Mack J, Strand AE (1994) Conservation genetics: beyond the maintenance of marker diversity. Mol Ecol 12:844–855

Morić M (2016) Genetic diversity of pedunculated oak ( Quercus robur L.) in field trials whit progeny from selected seeds stands. Doctoral thesis, University of Zagreb – Faculty of Forestry, pp 14–19

Nonić M (2016) Improving mass production of leaf-ornamental beech cultivars by grafting. Doctoral thesis, University of Belgrade – Faculty of Forestry, 280 pages

NRC – National Research Council (1993) Livestock. The National Academies Press, Washington, DC. https://doi.org/10.17226/1584

Nunney L (1993) The influence of mating system and overlapping generations on effective population-size. Evolution 47:1329–1341

Oliver TH (2018) Biodiversity generation and loss. Environ Sci. https://doi.org/10.1093/acrefore/9780199389414.013.96 . http://environmentalscience.oxfordre.com/view/10.1093/acrefore/9780199389414.001.0001/acrefore-9780199389414-e-96

Panhuis TM, Butlin R, Zuk M, Tregenza T (2001) Sexual selection and speciation. Trends Ecol Evol 16(7):364–372

Pérez-González J, Costa V, Santos P, Slate J, Carranza J, Fernández-Llario P, Zsolnia A, Monteiro NM, Anton I, Buzgo J, Varga G, Beja-Pereira A (2014) Males and females contribute unequally to offspring genetic diversity in the polygynandrous mating system of wild boar. PLoS One 9(12):e115394

Primack RB, Milić D, Radenković S, Obreht D, Bjelić-Čabrilo O, Vujić A (2015) An introduction to conservation biology (Uvod u konzervacionu biologiju). Faculty of Natural Sciences and Mathematics, Novi Sad, 372 pages

Schaal BA, Leverich WJ, Rogstad SH (1991) Comparison of methods for assessing genetic variation in plant conservation biology. In: Falk DA, Holsinger KE (eds) Genetics and conservation of rare plants. Oxford University Press, New York, pp 123–134

Shanshan L, Chiang TY, Gong X (2006) High genetic diversity vs. low genetic differentiation in Nouelia insignis (Asteraceae), a narrowly distributed and endemic species in China, revealed by ISSR fingerprinting. Ann Bot 98:583–589. https://doi.org/10.1093/aob/mcl129

Šijačić-Nikolić M, Milovanović J (2010) Conservation and directed utilization of forest genetic resources. Faculty of Forestry – University of Belgrade, Planeta Print, Belgrade, pp 1–200

Šijačić-Nikolić M, Milovanović J, Nonić M (2014) Conservation of forest genetic resources. In: Ahuja MR, Ramawat KG (eds) Biotechnology and biodiversity. Series: Sustainable development and biodiversity, vol 4. Springer International Publishing Switzerland, pp 103–129

Todd PM, Miller GF (1997) Biodiversity through sexual selection. In: Langton CG, Shimohara K (eds) Artificial life V – proceedings of the fifth international workshop on the synthesis and simulation of living systems. MIT Press/Bradford Books, Cambridge, MA, pp 289–299

Tucović A (1990) Genetika sa oplemenjivanjem biljaka (Genetics and plant breeding). Naučna knjiga, Beograd (Scientific Book – Belgrade), pp 1–596

Westemeier R, Brawn J, Simpson S, Esker T, Jansen R, Walk J, Kershner E, Bouzat J, Paige K (1998) Tracking the long-term decline and recovery of an isolated population. Science 282:1695–1698. https://doi.org/10.1126/science.282.5394.1695

White TL, Adams WT, Neale DB (2007) Forest genetics. CAB International, Wallingford, 682 pages

Download references

Author information

Authors and affiliations.

Faculty of Forestry, University of Belgrade, Belgrade, Serbia

Marina Nonić & Mirjana Šijačić-Nikolić

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Marina Nonić .

Editor information

Editors and affiliations.

FTZ-ALS, Hamburg University of Applied Sciences FTZ-ALS, Hamburg, Germany

Walter Leal Filho

Ctr Neurosci & Cell Biology, 1 Floor, Univ Coimbra, Edf Fac Medicina, Coimbra, Portugal

Anabela Marisa Azul

Faculty of Engineering and Architecture, Passo Fundo University Faculty of Engineering and Architecture, Passo Fundo, Brazil

Luciana Brandli

Istinye University, Istanbul, Turkey

Pinar Gökcin Özuyar

International Centre for Thriving, University of Chester, Chester, UK

Section Editor information

No affiliation provided

Adriana Consorte McCrea

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this entry

Cite this entry.

Nonić, M., Šijačić-Nikolić, M. (2019). Genetic Diversity: Sources, Threats, and Conservation. In: Leal Filho, W., Azul, A., Brandli, L., Özuyar, P., Wall, T. (eds) Life on Land. Encyclopedia of the UN Sustainable Development Goals. Springer, Cham. https://doi.org/10.1007/978-3-319-71065-5_53-1

Download citation

DOI : https://doi.org/10.1007/978-3-319-71065-5_53-1

Received : 11 September 2018

Accepted : 06 January 2019

Published : 25 June 2019

Publisher Name : Springer, Cham

Print ISBN : 978-3-319-71065-5

Online ISBN : 978-3-319-71065-5

eBook Packages : Springer Reference Earth and Environm. Science Reference Module Physical and Materials Science Reference Module Earth and Environmental Sciences

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

A .gov website belongs to an official government organization in the United States.

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Genetics Basics
  • Family Health History
  • About Cascade Testing

Genetic Disorders

What to know.

Genetic disorders are health problems that happen because of some type of abnormality in a person's genetic material. There are several types of genetic disorders. Some disorders are caused by a genetic change (mutation) in a single gene; some are caused by an abnormality in one of the chromosomes; and some are complex, involving numerous genes and influences from environmental factors.

A broken DNA double helix, representing a genetic mutation

Genetic disorders are health problems that happen because of some type of abnormality in a person's genetic material. There are several types of genetic disorders. In some cases, a genetic change in a single gene can cause someone to have a disease or condition. In other cases, the gene does not have a genetic change, but a person has more or fewer copies of the gene than most people, and this causes a disease or condition. Some diseases or conditions occur when a person does not have the same number of chromosomes as most people or has part of a chromosome that is missing, extra, or not in the right place.

Most genetic disorders happen due to the combination of many genetic changes acting together with a person's behaviors and environment. These are sometimes called complex conditions.

A detailed description of the basic concepts of genetics can be found here .

Single gene disorders

DNA contains the instructions for making your body work. DNA is made up of two strands that wind around each other. Each DNA strand includes chemicals called nitrogen bases—T (thymine), A (adenine), C (cytosine), and G (guanine)—that make up the DNA code. Genes are specific sections of DNA that have instructions for making proteins. Proteins make up most of the parts of your body and make your body work the right way.

Some diseases and conditions happen when a person has a genetic change (sometimes called a mutation) in one of their genes. These types of diseases are called single gene disorders. Sometimes, what happens is that one of the DNA bases is changed. For example, part of a gene that usually has the sequence TAC is changed to the sequence TTC. This can change the way the gene works, for example, by changing the protein that is made. In other cases, one or more of the bases in the DNA sequence are missing altogether, or there are extra bases.

Genetic changes can be passed down to a child from their parents. When this happens, the disease or condition is called hereditary or inherited. Or the changes can happen for the first time in the process of making the sperm or egg or early in development, so the child will have the genetic change but the parents will not.

DNA, genes, and chromosomes‎

Single gene disorders that affect a gene on one of the 22 autosomal chromosome pairs are called autosomal disorders. Disorders that affect the sex chromosome are called X-linked disorders. Disorders are further described according to whether the affected genetic change is dominant or recessive.

For some diseases and conditions, everyone who inherits the genetic change will have the disease or condition, but how serious it is can vary from person to person. In other cases, people who have the genetic change will be more likely to develop the disease or condition, but some of them will never develop it.

Autosomal dominant

With autosomal dominant diseases or conditions, a person only needs a genetic change in one copy of the gene to have the disease. If one parent has an autosomal dominant disease or condition, each child has a 50% (1 in 2) chance of inheriting the genetic change that causes the condition.

Examples of autosomal dominant conditions include hereditary breast and ovarian cancer caused by genetic changes (mutations) to the BRCA1 and BRCA2 genes ; Lynch syndrome ; and familial hypercholesterolemia .

Autosomal recessive

With autosomal recessive diseases or conditions, a person needs a genetic change in both copies of the gene to have the disease or condition. While a person with a genetic change in only one copy of the gene will not have the disease or condition, they can still pass the genetic change down to their children. These parents are sometimes called "carriers" of the disease because they "carry" the genetic change that causes the disease or condition but do not have the disease themselves.

A parent who is a carrier of a disease has a 50% (1 in 2) chance of passing the gene with the genetic change on to each of their children. If both parents are carriers of the disease, each child has a 25% (1 in 4) chance of inheriting two genes with the genetic change and thus of having the disease. Carrier screening looks for autosomal recessive genetic changes in parents to see if they could have a child with the disease or condition.

Examples of autosomal recessive disorders are sickle cell disease and cystic fibrosis .

Females have two X chromosomes, and males have one X chromosome and one Y chromosome. Each daughter gets an X from her mother and an X from her father. Each son gets an X from his mother and a Y from his father.

Some diseases or conditions happen when a gene on the X chromosome has a genetic change. Because males only have one copy of all the genes on the X chromosome, they are much more likely to be affected by X-linked genetic disorders than females. A female with a genetic change on only one of her two X chromosomes may not have the disease or condition at all. However, in some cases, females with the genetic change on one of their X chromosomes can have the disease or condition, but it is often a milder form of the disease than usually occurs in males.

Because males inherit an X chromosome from their mother, a female with a genetic change on one copy of the gene has a 50% (1 in 2) chance of passing the genetic change on to each of her sons. Her sons could have the disease or condition even though she does not.

Examples of X-linked conditions include fragile X syndrome , Duchenne muscular dystrophy , and hereditary hemophilia .

Chromosomal abnormalities

Different number of chromosomes.

People usually have 23 pairs of chromosomes. But sometimes a person is born with a different number. Having an extra chromosome is called trisomy. Missing a chromosome is called monosomy.

For example, people with Down syndrome have an extra copy of chromosome 21. This extra copy changes the body's and brain's normal development and causes intellectual and physical problems for the person. Some disorders are caused by having a different number of sex chromosomes. For example, people with Turner syndrome usually have only one sex chromosome, an X. Women with Turner syndrome can have problems with growth and heart defects.

Changes in chromosomes

Sometimes chromosomes are incomplete or shaped differently than usual. Missing a small part of a chromosome is called a deletion. A translocation is when part of one chromosome has moved to another chromosome. An inversion is when part of a chromosome has been flipped over.

For example, people with Williams syndrome are missing a small part of chromosome 7. This deletion can result in intellectual disability and a distinctive facial appearance and personality.

Complex conditions

Complex disorders are caused by genetic changes in many different genes working together with environmental factors. Environmental factors include exposures and behaviors such as air pollution, smoking, alcohol use, the amount of exercise a person gets, or the foods they eat. Having a family health history of a complex condition can make you more likely to have that condition yourself. However, genetic testing would not be recommended because there is not a single genetic change causing the condition that could be found by genetic testing.

Most chronic diseases, such as most cases of heart disease , cancer , diabetes , osteoporosis , and asthma , are complex disorders. So are most cases of developmental disabilities, such as autism spectrum disorder and attention deficit / hyperactivity disorder (ADHD) , and mental health conditions, such as depression and schizophrenia .

The vast amount of genetic information available has allowed researchers to develop methods to study which types of genetic changes are found more often in people with a given disease or condition. This allows researchers to estimate a person's risk for a particular disorder based on which genetic changes they have. This estimate is known as the polygenic risk score.

Some important issues need to be considered before polygenic risk scores can be routinely used in health care and public health. Studies are looking at how useful polygenic risk scores are in real-life clinical practice. Information on how each gene change affects disease risk comes from population-level genetic studies. Addressing diversity in development of polygenic risk scores is important, because polygenic risk scores developed from studies in one population (for example, people of Northern European ancestry) might not work as well for other populations (for example, people of West African ancestry). Also, how each gene change affects the polygenic risk score varies from study to study.

Once polygenic risk scores are ready to be used routinely in clinical practice, public health efforts will be needed to address issues such as access, insurance coverage, and sharing of results across health systems.

Genomics and Your Health

Learn more about genomics and its importance for your health

Medical College of Wisconsin

  • Driven Toward Diversity: MCW Student Works to Increase Representation in Field of Genetic Counseling

May 16, 2024

  • MCWknowledge /

Driven toward diversity: MCW student works to increase representation in field of genetic counseling

Florida native and Medical College of Wisconsin (MCW) graduate student Christopher Estrella, MS, is on a mission to break the mold in genetic counseling. His identities are underrepresented in field of genetic counseling, where five percent of genetic counselors identify as male, according to a 2023 study brief published by the National Society of Genetic Counselors, and only three percent as Hispanic.

“I’m male and I’m a Spanish-speaking individual of Spanish descent,” says Estrella, who will graduate from MCW’s Master of Science in Genetic Counseling Program  soon. “It’s nice when you have a provider who speaks your language. It’s nice to have a provider who maybe looks like you. I just hope to bring diversity into the field.”

In terms of what interests him about genetic counseling, Estrella, who enjoys traveling and exploring nature with his wife, says it’s the way it combines different fields of information.

“It brings together all the scientific knowledge of genetic conditions, inheritance, DNA and treatment for these conditions, and it meshes the psychological understanding of human connection, responsiveness, the psychology of interacting with people, and it brings it together into one field,” he says. “That’s pretty unique in the health care world.”

The path Estrella took to where he is today is pretty unique as well.

Road to Genetic Counseling

Estrella has assumed many roles through the years, each of which he was passionate about.

As a young teenager, he began working as a private tutor, teaching everything from basic reading to college-level math and physics over the years. He earned a scholarship to Florida International University, completing an undergraduate and master’s degree in biomedical engineering. Simultaneously, he became fluent in French, earning a second undergraduate degree.

His experience in college taught him that he enjoyed interacting with people more than research and development. With three degrees but no clear direction for the future, Estrella reached a standstill before learning of a teaching position from a friend. Leaning back on his experience as a tutor, he took the chance, teaching math to 260 students a year as the COVID-19 pandemic broke out. He calls it a tough time for educators but an experience that ultimately helped him.

“I feel that the only way you grow is by placing yourself in an uncomfortable position. It really pushes you outside of your boundaries and teaches you about yourself,” Estrella says.

Still, he was uneasy about his career path, not exactly sure what his future held. Since college, he knew he wanted to be in science and health care, but not as a doctor. Despite all his achievements in his 20s, Estrella once again felt lost.

A video he came across online changed all that.

“I came across a TED Talk from a genetic counselor on YouTube,” Estrella says. “I was enthralled. I automatically knew this was a field that I see myself in.”

Genetic counselors assess a patient's risk for inherited conditions, helping them make health care decisions by reviewing family histories, personal risk factors and genetic testing options.

After watching the video, Estrella dug deeper into the profession, ordering books on the subject, reading information online about the training and even reaching out to 30 genetic counselors across the US to see what a day in their profession looks like.

“I really wanted to make sure that if I transitioned to this, it was going to be the right fit and not something impulsive,” he says.

Initially, he was hesitant to apply, knowing how competitive genetic counseling schools are. In Florida, where he lived at the time, there was only one program that admitted just five students each year.

Excelling at MCW

Jennifer Geurts, MS, CGC, and Christopher Estrella

“[Geurts] really took the time to know me as a person and less as a number on an application. All of the leadership team was invested in who I was as a person,” Estrella said. “I loved it.”

He and his wife, who was still his fiancée at the time, made the 20-hour move to Wisconsin to begin a new life.

“Experiencing new places, new cities, new careers and new people helps me to learn as much about myself as I learn in a classroom,” he says.

There were many highs and some lows during his time at MCW, Estrella says. But he greatly appreciates the intimate setting that allowed him to build strong relationships with colleagues and professors, the access that being connected to nearby hospital systems at Children’s Wisconsin and Froedtert Hospital provided, and intangibles like getting tips on how to learn about the city from mentors such as Erin Syverson, MGC, CGC , assistant professor of genetic counseling at MCW.

He also appreciates the facilities at MCW, and most importantly, the school’s dedication to diversity.

“Even in MCW’s values that are plastered around the school, they really want to push for diversity in health care, which I think is so needed,” Estrella says. “I see MCW putting active effort into representation in the community and making sure they are putting some actions behind their words.”

As for his future, Estrella will soon begin a position with a genetic counseling program at the University of South Carolina. It’s the latest adventure for an individual who's full of them. Eventually, he says, he wants to be a program director and push diversity in the field further.

“My aspiration is to be the best director for future genetic counselors in training and help them also find their guiding voice the way that I was guided along my journey,” Estrella says. “I couldn’t see myself anywhere else.”

Share This Story

Read more about.

Education Innovation

Recommended

View more MCW stories

MCW student pursues passion for providing culturally competent care for Asian Americans

Community Front Door

MCW Student Pursues Passion for Providing Culturally Competent Care for Asian Americans

Medical student Adileen Sii is an advocate for culturally competent care.

Pioneering research at the intersection of climate change and public health: A spotlight on Jean Bikomeye

Pioneering Research at the Intersection of Climate Change and Public Health: A Spotlight on Jean Bikomeye

Jean Bikomeye is stepping up to confront the impacts of climate change on our health head-on.

Empowering tomorrow’s doctors: ChatClinic revolutionizes education as an AI tool for medical students

Empowering Tomorrow’s Doctors: ChatClinic Revolutionizes Education as an AI Tool for Medical Students

AI is a transformative force, reshaping the way healthcare providers learn and practice medicine.

IMAGES

  1. What Is Genetic Diversity and Why Does it Matter? (2023)

    research about genetic diversity

  2. Genetic diversity Definition and Examples

    research about genetic diversity

  3. Largest catalog of human genetic diversity

    research about genetic diversity

  4. What is genetic diversity and why is it important? (2023)

    research about genetic diversity

  5. 15 Reasons Why Genetic Diversity Is Important?

    research about genetic diversity

  6. Genetic Diversity: The Hidden Secret of Life

    research about genetic diversity

VIDEO

  1. The Role of Genetic Research in Global Health

  2. Episode 10: Diversity in Genetics

  3. Impactful research

  4. Unraveling the Genetic Marvel: Exploring the Complexity of Human DNA #dna #human

  5. Seeds for Resilience: Zambia

  6. Measuring genetic diversity and distances in populations

COMMENTS

  1. Determinants of genetic diversity

    Genetic diversity also seems to be predictable from the life history of a species. ... through N e, is an important challenge for future research in this field .

  2. Global genetic diversity status and trends: towards a suite of

    Genetic diversity is the foundation of the three levels of biodiversity, supporting and complementing species and ecosystems diversity. Genetic diversity provides resilience against abrupt changes and allows species and ecosystems to adapt to changing environments, climates, and other challenges (including diseases).

  3. (PDF) Genetic Diversity: Its Importance and Measurements.

    Genetic diversity helps to adapt to environmental variability. Organisms live in complex environment that vary in spatial and temporal scale and. is characterized by several factors such as ...

  4. Genetic Diversity: Sources, Threats, and Conservation

    Genetic diversity is a fundamental source of biodiversity which has been defined by different authors as "any measure that quantifies the magnitude of genetic variability within a population" (Hughes et al. 2008) or "the very makeup of the variation of organisms and species on Earth" (Elliott 2002).According to Ennos et al. (), genetic diversity presents "the range and sum of genetic ...

  5. Genetic diversity goals and targets have improved, but remain

    Genetic diversity and adaptive potential within populations of all [wild and domestic] species is safeguarded, and all genetically distinct populations are maintained by 2030, and at least 99% of genetic diversity within populations is maintained by 2050 ... Taft HR, McCoskey DN, Miller JM, et al. Research-management partnerships: an ...

  6. Genetic diversity loss in the Anthropocene

    Although genetic diversity is a key dimension of biodiversity (), it has been overlooked in international conservation initiatives ().Only in 2021 did the United Nations (UN) Convention of Biological Diversity propose to preserve at least 90% of all species' genetic diversity (10, 11).Recent meta-analyses of animal populations with genetic marker samples have been used as proxies to quantify ...

  7. Embracing Genetic Diversity to Improve Black Health

    Embracing Genetic Diversity to Improve Black Health. As researchers whose work is largely focused on genetics and who self-identify as Black men, arguably one of the most disadvantaged groups in ...

  8. Insights into human genetic variation and population history ...

    Abstract. Genome sequences from diverse human groups are needed to understand the structure of genetic variation in our species and the history of, and relationships between, different populations. We present 929 high-coverage genome sequences from 54 diverse human populations, 26 of which are physically phased using linked-read sequencing ...

  9. Genetic diversity

    Genetic diversity is the total number of genetic characteristics in the genetic makeup of a species, it ranges widely from the number of species to differences within species and can be attributed to the span of survival for a species. ... In a review of current research, Teixeira and Huber (2021), ...

  10. Diversity and inclusion in genomic research: why the uneven progress?

    Increasing diversity in genomic research is critically important to the global efforts to develop a comprehensive catalog of genetic variations to address fundamental questions about human origin, peopling of the world, genetic admixture, and adaptive capabilities to different environmental conditions, all of which have played a role in shaping ...

  11. Genetic Diversity, Conservation, and Utilization of Plant Genetic

    Genetic diversity within and between plant species allows plant breeders to select superior genotypes, which can then be used for the development of genetic stock for hybridization programs or the release of a crop variety . ... Digitized molecular data are vital to numerous aspects of scientific research and genetic resource use. Substantial ...

  12. What Is Genetic Diversity and Why Does it Matter?

    Genetic Rescue: ↑ A conservation strategy, new individuals are moved into a population to increase genetic diversity and improve population health. Conflict of Interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

  13. Genetic diversity goals and targets have improved, but remain

    Genetic diversity among and within populations of all species is necessary for people and nature to survive and thrive in a changing world. Over the past three years, commitments for conserving genetic diversity have become more ambitious and specific under the Convention on Biological Diversity's (CBD) draft post-2020 global biodiversity framework (GBF). This Perspective article comments on ...

  14. What Is Genetic Diversity and Why Does it Matter?

    Genetic diversity has become a hot research topic due to its importance in the health of species and ecosystems. 66 Similarly, gene polymorphism is an essential topic in research because it ...

  15. Diversity in Genomic Research

    NHGRI is dedicated to increasing diversity of the genomics workforce. In addition, NHGRI supports projects that work to increase the diversity of people participating in genomics research, including: The 1,000 Genomes Project (2002 - 2015) The most extensive public catalog of human variation and genomic data, with over 2,000 genomic samples ...

  16. Genetic diversity, phylogeography, and maternal origin of yak (Bos

    Yaks have high genetic diversity and yak populations have experienced population expansion and lack obvious phylogeographic structure. ... Research progress on molecular genetic diversity of yaks. Genetics. 2013;35(02):151-60. CAS Google Scholar Ma ZJ, Zhong JC, Han JL, Xu JT, Dou QL, Chang HP. Genetic diversity of mtDNA D-Loop region in wild ...

  17. Genetic Research Boosts Black-footed Ferret Conservation Efforts

    DENVER - Black-footed ferret recovery efforts aimed at increased genetic diversity and disease resistance took a bold step forward Dec. 10, 2020, with the birth of "Elizabeth Ann," created from the frozen cells of "Willa," a black-footed ferret that lived more than 30 years ago. The groundbreaking effort to explore solutions to help recover this endangered species results from an ...

  18. High genetic diversity discovered in South African leopards

    Researchers say the discovery of very high genetic diversity in leopards found in the Highveld region of South Africa has increased the need for conservation efforts to protect leopards in the ...

  19. AFRICAN GENETIC DIVERSITY: Implications for Human Demographic History

    The high level of genetic diversity in African populations is also consistent with a larger long-term effective population size (N e) compared to non-Africans ... In addition to obtaining research permits from the local African governments and informed consent from individual participants (including benefits and risks involved in use of samples ...

  20. Genetic Diversity: Sources, Threats, and Conservation

    Genetic diversity is a fundamental source of biodiversity which has been defined by different authors as "any measure that quantifies the magnitude of genetic variability within a population" (Hughes et al. 2008) or "the very makeup of the variation of organisms and species on Earth" (Elliott 2002).According to Ennos et al. (), genetic diversity presents "the range and sum of genetic ...

  21. Animals

    The analysis of the genetic diversity and historical dynamics of endemic endangered goose breeds structure has attracted great interest. Although various aspects of the goose breed structure have been elucidated, there is still insufficient research on the genetic basis of endemic endangered Chinese goose breeds. In this study, we collected blood samples from Lingxiang White (LX), Yan (YE ...

  22. Full article: Genetic diversity and population structure of Canarian

    Previous reports and research on this population of hens from Fuerteventura Island indicate that scientists are trying to recover the autochthonous breed of hen ... studying the genetic diversity, population structure of indigenous chicken ecotypes in KwaZulu-Natal (Nxumalo et al. Citation 2020) and two indigenous chicken ecotypes from Pakistan ...

  23. Genetic Disorders

    Genetic changes can be passed down to a child from their parents. When this happens, the disease or condition is called hereditary or inherited. ... Information on how each gene change affects disease risk comes from population-level genetic studies. Addressing diversity in development of polygenic risk scores is important, because polygenic ...

  24. Driven Toward Diversity: MCW Student Works to Increase Representation

    Through research, health care education, workforce development, and community health initiatives, AHW drives change. ... Driven Toward Diversity: MCW Student Works to Increase Representation in Field of Genetic Counseling May 16, 2024 ... Genetic counselors assess a patient's risk for inherited conditions, helping them make health care ...