The distribution of transposable elements (TEs) in a genome reflects a balance between insertion rate and selection against new insertions. Understanding the distribution of TEs therefore provides insights into the forces shaping the organization of genomes. Past research has shown that TEs tend to accumulate in genomic regions with low gene density and low recombination rate. However, little is known about the factors modulating insertion rates across the genome and their evolutionary significance. One candidate factor is gene expression, which has been suggested to increase local insertion rate by rendering DNA more accessible. We test this hypothesis by comparing the TE density around germline- and soma-expressed genes in the euchromatin of Drosophila melanogaster. Because only insertions that occur in the germline are transmitted to the next generation, we predicted a higher density of TEs around germline-expressed genes than soma-expressed genes. We show that the rate of TE insertions is greater near germline- than soma-expressed genes. However, this effect is partly offset by stronger selection for genome compactness (against excess noncoding DNA) on germline-expressed genes. We also demonstrate that the local genome organization in clusters of coexpressed genes plays a fundamental role in the genomic distribution of TEs. Our analysis shows that-in addition to recombination rate-the distribution of TEs is shaped by the interaction of gene expression and genome organization. The important role of selection for compactness sheds a new light on the role of TEs in genome evolution. Instead of making genomes grow passively, TEs are controlled by the forces shaping genome compactness, most likely linked to the efficiency of gene expression or its complexity and possibly their interaction with mechanisms of TE silencing.

Volkman, SK, PC Sabeti, D DeCaprio, DE Neafsey, SF Schaffner, Jr. Milner, D. A., JP Daily, et al. 2007. “A genome-wide map of diversity in Plasmodium falciparum.” Nat Genet 39: 113-9. Abstract

Genetic variation allows the malaria parasite Plasmodium falciparum to overcome chemotherapeutic agents, vaccines and vector control strategies and remain a leading cause of global morbidity and mortality. Here we describe an initial survey of genetic variation across the P. falciparum genome. We performed extensive sequencing of 16 geographically diverse parasites and identified 46,937 SNPs, demonstrating rich diversity among P. falciparum parasites (pi = 1.16 x 10(-3)) and strong correlation with gene function. We identified multiple regions with signatures of selective sweeps in drug-resistant parasites, including a previously unidentified 160-kb region with extremely low polymorphism in pyrimethamine-resistant parasites. We further characterized 54 worldwide isolates by genotyping SNPs across 20 genomic regions. These data begin to define population structure among African, Asian and American groups and illustrate the degree of linkage disequilibrium, which extends over relatively short distances in African parasites but over longer distances in Asian parasites. We provide an initial map of genetic diversity in P. falciparum and demonstrate its potential utility in identifying genes subject to recent natural selection and in understanding the population genetics of this parasite.

Volkman, SK, E Lozovsky, AE Barry, T Bedford, L Bethke, A Myrick, KP Day, DL Hartl, DF Wirth, and SA Sawyer. 2007. “Genomic heterogeneity in the density of noncoding single-nucleotide and microsatellite polymorphisms in Plasmodium falciparum.” Gene 387: 1-6. Abstract

The density and distribution of single-nucleotide polymorphisms (SNPs) across the genome has important implications for linkage disequilibrium mapping and association studies, and the level of simple-sequence microsatellite polymorphisms has important implications for the use of oligonucleotide hybridization methods to genotype SNPs. To assess the density of these types of polymorphisms in P. falciparum, we sampled introns and noncoding DNA upstream and downstream of coding regions among a variety of geographically diverse parasites. Across 36,229 base pairs of noncoding sequence representing 41 genetic loci, a total of 307 polymorphisms including 248 polymorphic microsatellites and 39 SNPs were identified. We found a significant excess of microsatellite polymorphisms having a repeat unit length of one or two, compared to those with longer repeat lengths, as well as a nonrandom distribution of SNP polymorphisms. Almost half of the SNPs localized to only three of the 41 genetic loci sampled. Furthermore, we find significant differences in the frequency of polymorphisms across the two chromosomes (2 and 3) examined most extensively, with an excess of SNPs and a surplus of polymorphic microsatellites on chromosome 3 as compared to chromosome 2 (P=0.0001). Furthermore, at some individual genetic loci we also find a nonrandom distribution of polymorphisms between coding and flanking noncoding sequences, where completely monomorphic regions may flank highly polymorphic genes. These data, combined with our previous findings of nonrandom distribution of SNPs across chromosome 2, suggest that the Plasmodium falciparum genome may be a mosaic with regard to genetic diversity, containing chromosomal regions that are highly polymorphic interspersed with regions that are much less polymorphic.

Chookajorn, T, MS Costanzo, DL Hartl, and KW Deitsch. 2007. “Malaria: a peek at the var variorum.” Trends Parasitol 23: 563-5. Abstract

Geneticists encountering the diversity of the malaria parasite's var gene family for the first time often complain that its complexity is a nightmare. A new article by Barry et al. presents the latest and most systematic attempt to date to decipher the var variorum. This important work, combined with other recent articles on var global variation such as that by Kraemer et al., suggests that only the tip of the var diversity iceberg is currently in view. In this article, we discuss these recent results and provide an overview of current understanding of var diversity.

Hartl, DL, and DJ Fairbanks. 2007. “Mud sticks: on the alleged falsification of Mendel's data.” Genetics 175: 975-9.
Depristo, MA, DL Hartl, and DM Weinreich. 2007. “Mutational reversions during adaptive protein evolution.” Mol Biol Evol 24: 1608-10. Abstract

Adaptation is often regarded as the sequential fixation of individually, intrinsically beneficial mutations. Contrary to this expectation, we find a surprisingly large number of evolutionary trajectories on which natural selection first favors a mutation, then favors its removal, and later still favors its ultimate restoration during the course of antibiotic resistance evolution. The existence of reversion trajectories implies that natural selection may not follow the most parsimonious path separating two alleles, even during adaptation. Altogether, this discovery highlights the unusual and potentially circuitous routes natural selection can follow during adaptation.

We examined patterns and putative mechanisms of sequence diversification in the merozoite surface protein-2 (MSP-2) of Plasmodium falciparum, a major dimorphic malaria vaccine candidate antigen, by analyzing 448 msp-2 alleles from all continents. We describe several nucleotide replacements, insertion and deletion events, frameshift mutations, and proliferations of repeat units that generate the extraordinary diversity found in msp-2 alleles. We discuss the role of positive selection exerted by naturally acquired type- and variant-specific immunity in maintaining the observed levels of polymorphism and suggest that this is the most likely explanation for the significant excess of nonsynonymous nucleotide replacements found in dimorphic msp-2 domains. Hybrid sequences created by meiotic recombination between alleles of different dimorphic types were observed in few (3.1%) isolates, mostly from Africa. We found no evidence for an extremely ancient origin of allelic dimorphism at the msp-2 locus, predating P. falciparum speciation, in contrast with recent findings for other surface malarial antigens.

Ferreira, MU, ND Karunaweera, M da Silva-Nunes, NS da Silva, DF Wirth, and DL Hartl. 2007. “Population structure and transmission dynamics of Plasmodium vivax in rural Amazonia.” J Infect Dis 195: 1218-26. Abstract

Understanding the genetic structure of malaria parasites is essential to predict how fast some phenotypes of interest originate and spread in populations. In the present study, we used highly polymorphic microsatellite markers to analyze 74 Plasmodium vivax isolates, which we collected in cross-sectional and longitudinal surveys performed in an area of low malaria endemicity in Brazilian Amazonia, and to explore the transmission dynamics of genetically diverse haplotypes or strains. P. vivax populations are more diverse and more frequently comprise multiple-clone infections than do sympatric Plasmodium falciparum isolates, but these features paradoxically coexist with high levels of inbreeding, leading to significant multilocus linkage disequilibrium. Moreover, the high rates of microsatellite haplotype replacement that we found during 15 months of follow-up most likely do not result from strong diversifying selection. We conclude that the small-area genetic diversity in P. vivax populations under low-level transmission is not severely constrained by the low rates of effective meiotic recombination, with clear public health implications.

Dopman, EB, and DL Hartl. 2007. “A portrait of copy-number polymorphism in Drosophila melanogaster.” Proc Natl Acad Sci U S A 104: 19920-5. Abstract

Thomas Hunt Morgan and colleagues identified variation in gene copy number in Drosophila in the 1920s and 1930s and linked such variation to phenotypic differences [Bridges CB (1936) Science 83:210]. Yet the extent of variation in the number of chromosomes, chromosomal regions, or gene copies, and the importance of this variation within species, remain poorly understood. Here, we focus on copy-number variation in Drosophila melanogaster. We characterize copy-number polymorphism (CNP) across genomic regions, and we contrast patterns to infer the evolutionary processes acting on this variation. Copy-number variation in D. melanogaster is nonrandomly distributed, presumably because of a mutational bias produced by tandem repeats or other mechanisms. Comparisons of coding and noncoding CNPs, however, reveal a strong effect of purifying selection in the removal of structural variation from functionally constrained regions. Most patterns of CNP in D. melanogaster suggest that negative selection and mutational biases are the primary agents responsible for shaping structural variation.

Sawyer, SA, J Parsch, Z Zhang, and DL Hartl. 2007. “Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila.” Proc Natl Acad Sci U S A 104: 6504-10. Abstract

We have estimated the selective effects of amino acid replacements in natural populations by comparing levels of polymorphism in 91 genes in African populations of Drosophila melanogaster with their divergence from Drosophila simulans. The genes include about equal numbers whose level of expression in adults is greater in males, greater in females, or approximately equal in the sexes. Markov chain Monte Carlo methods were used to sample key parameters in the stationary distribution of polymorphism and divergence in a model in which the selective effect of each nonsynonymous mutation is regarded as a random sample from some underlying normal distribution whose mean may differ from one gene to the next. Our analysis suggests that approximately 95% of all nonsynonymous mutations that could contribute to polymorphism or divergence are deleterious, and that the average proportion of deleterious amino acid polymorphisms in samples is approximately 70%. On the other hand, approximately 95% of fixed differences between species are positively selected, although the scaled selection coefficient (N(e)s) is very small. We estimate that approximately 46% of amino acid replacements have N(e)s < 2, approximately 84% have N(e)s < 4, and approximately 99% have N(e)s < 7. Although positive selection among amino acid differences between species seems pervasive, most of the selective effects could be regarded as nearly neutral. There are significant differences in selection between sex-biased and unbiased genes, which relate primarily to the mean of the distributions of mutational effects and the fraction of slightly deleterious and weakly beneficial mutations that are fixed.

Tao, Y, JP Masly, L Araripe, Y Ke, and DL Hartl. 2007. “A sex-ratio meiotic drive system in Drosophila simulans. I: an autosomal suppressor.” PLoS Biol 5: e292. Abstract

Sex ratio distortion (sex-ratio for short) has been reported in numerous species such as Drosophila, where distortion can readily be detected in experimental crosses, but the molecular mechanisms remain elusive. Here we characterize an autosomal sex-ratio suppressor from D. simulans that we designate as not much yang (nmy, polytene chromosome position 87F3). Nmy suppresses an X-linked sex-ratio distorter, contains a pair of near-perfect inverted repeats of 345 bp, and evidently originated through retrotransposition from the distorter itself. The suppression is likely mediated by sequence homology between the suppressor and distorter. The strength of sex-ratio is greatly enhanced by lower temperature. This temperature sensitivity was used to assign the sex-ratio etiology to the maturation process of the Y-bearing sperm, a hypothesis corroborated by both light microscope observations and ultrastructural studies. It has long been suggested that an X-linked sex-ratio distorter can evolve by exploiting loopholes in the meiotic machinery for its own transmission advantage, which may be offset by other changes in the genome that control the selfish distorter. Data obtained in this study help to understand this evolutionary mechanism in molecular detail and provide insight regarding its evolutionary impact on genomic architecture and speciation.

Tao, Y, L Araripe, SB Kingan, Y Ke, H Xiao, and DL Hartl. 2007. “A sex-ratio meiotic drive system in Drosophila simulans. II: an X-linked distorter.” PLoS Biol 5: e293. Abstract

The evolution of heteromorphic sex chromosomes creates a genetic condition favoring the invasion of sex-ratio meiotic drive elements, resulting in the biased transmission of one sex chromosome over the other, in violation of Mendel's first law. The molecular mechanisms of sex-ratio meiotic drive may therefore help us to understand the evolutionary forces shaping the meiotic behavior of the sex chromosomes. Here we characterize a sex-ratio distorter on the X chromosome (Dox) in Drosophila simulans by genetic and molecular means. Intriguingly, Dox has very limited coding capacity. It evolved from another X-linked gene, which also evolved de nova. Through retrotransposition, Dox also gave rise to an autosomal suppressor, not much yang (Nmy). An RNA interference mechanism seems to be involved in the suppression of the Dox distorter by the Nmy suppressor. Double mutant males of the genotype dox; nmy are normal for both sex-ratio and spermatogenesis. We postulate that recurrent bouts of sex-ratio meiotic drive and its subsequent suppression might underlie several common features observed in the heterogametic sex, including meiotic sex chromosome inactivation and achiasmy.

Landry, CR, CI Castillo-Davis, A Ogura, JS Liu, and DL Hartl. 2007. “Systems-level analysis and evolution of the phototransduction network in Drosophila.” Proc Natl Acad Sci U S A 104: 3283-8. Abstract

Networks of interacting genes are responsible for generating life's complexity and for mediating how organisms respond to their environment. Thus, a basic understanding of genetic variation in gene networks in natural populations is important for elucidating how changes at the genetic level map to higher levels of biological organization. Here, using the well-characterized phototransduction network in Drosophila, we analyze variation in gene expression within and between two closely related species, Drosophila melanogaster and Drosophila simulans, under different environmental conditions. Gene expression levels in the pathway are largely conserved between these two sibling species. For most genes in the network, differences in level of gene expression between species are correlated with degree of polymorphism within species. However, one gene encoding the light-induced ion channel TRPL (transient receptor potential-like) shows an excess of expression divergence relative to polymorphism, suggesting a possible role for natural selection in shaping this expression difference between species. Finally, this difference in TRPL expression likely has significant functional consequences, because it is known that a high level of rhabdomeral TRPL leads to increased sensitivity to dim background light and an increased response to a wider range of light intensities. These results provide a preliminary quantification of variation and divergence of gene expression between species in a known gene network and provide a foundation for a system-level understanding of functional and evolutionary change.


Protein sequences frequently contain regions composed of a reduced number of amino acids. Despite their presence in about half of all proteins and their unusual prevalence in the malaria parasite Plasmodium falciparum, the function and evolution of such low-complexity regions (LCRs) remain unclear. Here we show that LCR abundance and amino acid composition depend largely, but not exclusively, on genomic A+T content and obey power-law growth dynamics. Further, our results indicate that LCRs are analogous to microsatellites in that DNA replication slippage and unequal crossover recombination are important molecular mechanisms for LCR expansion. We support this hypothesis by demonstrating that the size of LCR insertions/deletions among orthologous genes depends upon length. Moreover, we show that LCRs enable intra-exonic recombination in a key family of cell-surface antigens in P. falciparum and thus likely facilitate the generation of antigenic diversity. We conclude with a mechanistic model for LCR evolution that links the pattern of LCRs within P. falciparum to its high genomic A+T content and recombination rate.

Weinreich, DM, NF Delaney, MA Depristo, and DL Hartl. 2006. “Darwinian evolution can follow only very few mutational paths to fitter proteins.” Science 312: 111-4. Abstract

Five point mutations in a particular beta-lactamase allele jointly increase bacterial resistance to a clinically important antibiotic by a factor of approximately 100,000. In principle, evolution to this high-resistance beta-lactamase might follow any of the 120 mutational trajectories linking these alleles. However, we demonstrate that 102 trajectories are inaccessible to Darwinian selection and that many of the remaining trajectories have negligible probabilities of realization, because four of these five mutations fail to increase drug resistance in some combinations. Pervasive biophysical pleiotropy within the beta-lactamase seems to be responsible, and because such pleiotropy appears to be a general property of missense mutations, we conclude that much protein evolution will be similarly constrained. This implies that the protein tape of life may be largely reproducible and even predictable.

Bethke, LL, M Zilversmit, K Nielsen, J Daily, SK Volkman, D Ndiaye, ER Lozovsky, DL Hartl, and DF Wirth. 2006. “Duplication, gene conversion, and genetic diversity in the species-specific acyl-CoA synthetase gene family of Plasmodium falciparum.” Mol Biochem Parasitol 150: 10-24. Abstract

While genes encoding antigens and other highly polymorphic proteins are commonly found in subtelomeres, it is unusual to find a small family of housekeeping genes in these regions. We found that in the species Plasmodium falciparum only, a non-subtelomeric acyl-CoA synthetase (ACS) gene has expanded into a family of duplicated genes mainly located in the subtelomeres of the genome. We identified the putative parent of the duplicated family by analysis of synteny and phylogeny relative to other Plasmodium ACS genes. All ten ACS paralogs are transcribed in erythrocytic stages of laboratory and field isolates. We identified and confirmed a recent double gene conversion event involving ACS genes on three different chromosomes of isolate 3D7, resulting in the creation of a new hybrid gene. Southern hybridization analysis of geographically diverse P. falciparum isolates provides evidence for the strikingly global conservation of the ACS gene family, but also for some chromosomal events, including deletion and recombination, involving the duplicated paralogs. We found a dramatically higher rate of non-synonymous substitutions per non-synonymous site than synonymous substitutions per synonymous site in the closely related ACS paralogs we sequenced, suggesting that these genes are under a form of selection that favors change in the state of the protein. We also found that the gene encoding acyl-CoA binding protein has expanded and diversified in P. falciparum. We have described a new class of subtelomeric gene family with a unique capacity for diversity in P. falciparum.

Landry, CR, JP Townsend, DL Hartl, and D Cavalieri. 2006. “Ecological and evolutionary genomics of Saccharomyces cerevisiae.” Mol Ecol 15: 575-91. Abstract

Saccharomyces cerevisiae, the budding yeast, is the most thoroughly studied eukaryote at the cellular, molecular, and genetic levels. Yet, until recently, we knew very little about its ecology or population and evolutionary genetics. In recent years, it has been recognized that S. cerevisiae occupies numerous habitats and that populations harbour important genetic variation. There is therefore an increasing interest in understanding the evolutionary forces acting on the yeast genome. Several researchers have used the tools of functional genomics to study natural isolates of this unicellular fungus. Here, we review some of these studies, and show not only that budding yeast is a prime model system to address fundamental molecular and cellular biology questions, but also that it is becoming a powerful model species for ecological and evolutionary genomics studies as well.

Ponce, R, and DL Hartl. 2006. “The evolution of the novel Sdic gene cluster in Drosophila melanogaster.” Gene 376: 174-83. Abstract

The origin of new genes and of new functions for existing genes are fundamental processes in molecular evolution. Sdic is a newly evolved gene that arose recently in the D. melanogaster lineage. The gene encodes a novel sperm motility protein. It is a chimeric gene formed by duplication of two other genes followed by multiple deletions and other sequence rearrangements. The Sdic gene exists in several copies in the X chromosome, and is presumed to have undergone several duplications to form a tandemly arrayed gene cluster. Given the very recent origin of the gene and the gene cluster, the analysis of the composition of this gene cluster represents an excellent opportunity to study the origin and evolution of new gene functions and the fate of gene duplications. We have analyzed the nucleotide sequence of this region and reconstructed the evolutionary history of this gene cluster. We found that the cluster is composed by four tandem copies of Sdic; these duplicates are very similar but can be distinguished by the unique pattern of insertions, deletions, and point mutations in each copy. The oldest gene copy in the array has a 3' exon that has undergone accelerated diversification, and also shows divergent regulatory sequences. Moreover, there is evidence that this might be the only gene copy in the tandem array that is transcribed at a significant level, expressing a novel sperm-specific protein. There is also a retrotransposon located at the 3' end of each Sdic gene copy. We argue that this gene cluster was formed in the last two million years by at least three tandem duplications and one retrotransposition event.

One of the most important aspects of the evolution of development and physiology is the interplay between gene expression and the environment, by which traits become altered in response to environmental triggers. This feature is known as phenotypic plasticity. When different genotypes show different levels of plasticity for a trait, then they show genotype-by-environment interaction, or GEI. It is now clear that gene expression plays an important role in organismic-level phenotypic plasticity, but we know very little about whether gene expression itself is subject to genetic variation for phenotypic plasticity (GEI). Given that gene regulation is likely to have evolved to respond to environmental changes, it is of central importance to understand how environmental and genetic variation interact to produce variation in gene expression. Here we investigate genetic variation for phenotypic plasticity in the yeast transcriptome for the whole genome. Six strains of Saccharomyces cerevisiae were grown in four different environments representing a continuum of rich and poor natural conditions. Using DNA-microarray data and an ANOVA analysis with a stringent criterion of significance, we found significant genetic variation for transcriptional plasticity (GEI) among strains for approximately 5% of the genes in the genome. There are about twice as many genes that show genetic variation for phenotypic plasticity as show genetic variation in transcription level independent of the environment. We also found that genes with genetic variation for plasticity were less likely to be essential and were significantly biased towards genes that have paralogs.

Chookajorn, T, and DL Hartl. 2006. “Position-specific polymorphism of Plasmodium falciparum Stuttering motif in a PHISTc PFI1780w.” Exp Parasitol 114: 126-8. Abstract

Several genes of Plasmodium falciparum are positively selected due to the pressure from the host immune system. This is a pattern completely opposite to that found in most housekeeping genes, which have few synonymous mutations. The discrepancy is an important topic in Plasmodium biology. We searched for unique polymorphism patterns in P. falciparum and identified a repetitive Stuttering motif in PFI1780w which was recently grouped as a gene in the PHIST family. The repeat has a position-specific polymorphism pattern in the otherwise highly conserved gene. Its mutations are limited to only one small region, and they are not consistent with replication slippage or gene conversion commonly found in low complexity regions. The repeat variation was analyzed in different strains of P. falciparum. The PFI1780w Stuttering motif can be a model to study gene diversification and used as a tool for strain typing.