Barry, AE, A Leliwa-Sytek, K Man, JM Kasper, DL Hartl, and KP Day. 2006. “Variable SNP density in aspartyl-protease genes of the malaria parasite Plasmodium falciparum.” Gene 376: 163-73. Abstract

An analysis of the diversity of the aspartyl proteases of Plasmodium falciparum, known as plasmepsins (PMs), was completed in view of their possible role as drug targets. DNA sequence polymorphisms were identified in nine pm genes including their non-coding (introns and 5' flanking) sequences. All genes contained at least one single nucleotide polymorphism (SNP). Extensive microsatellite diversity was observed predominantly in non-coding sequences. All but one non-synonymous polymorphism (a conservative substitution) were mapped to the surface of the predicted protein, contradicting a possible role in enzymatic activity. The distribution of SNPs was found to be non-random among pm genes, with pm6 and pm10 having significantly higher SNP densities, suggesting they were under selection. For pm6 the majority of the SNPs were in introns and some of these may contribute to splice site variation. SNPs were found at a high density in both the coding and non-coding sequences of pm10. Recombination was important in generating additional diversity at this locus. Although direct selection for pm10 mutations could not be ruled out, the presence of balancing selection and a high density of SNPs in non-coding sequence led us to propose that another gene under selection may be influencing the diversity in the region. By sequencing short DNA tags in a 200 kb region flanking pm10 we show that a cluster of antigen genes, known to be under diversifying selection, may contribute to the observed diversity. We discuss the importance of diversity and local selection effects when choosing drug targets for intervention strategies.

We compared intron positions in conserved regions of 3479 orthologous gene pairs from Plasmodium falciparum and Plasmodium yoelii, which likely diverged >or=100 million years ago (Mya). Only 27 out of 2212 positions were specific to one of the two species. Intron presence in related species shows that at least 19 and possibly 26 of the changes are due to intron loss, depending on phylogeny. The implied intron loss and gain rates are much lower than previously estimated for nematodes, arthropods, fungi, and plants, and are comparable only with the rates in vertebrates. That all observed changes were exact, occurring without loss or gain of flanking coding sequence, suggests intron loss via an mRNA intermediate, as does a nonsignificant trend toward loss of introns at adjacent positions. Many of the intron changes occurred in genes encoding proteins involved in nucleic acid-related processes, as previously found for intron gains in nematodes. Two changes occurred in the chloroquine resistance transporter, suggesting a role for positive selection in intron loss in Plasmodium. The dearth of intron loss and gain could be explained by the lack of known transposable elements in Plasmodium, since transposable elements and/or reverse transcriptase are thought to be necessary for both processes. The observed pattern suggests that the availability of stochastic intron loss and gain mutations can be a major determinant of changes in intron number.

Landry, CR, PJ Wittkopp, CH Taubes, JM Ranz, AG Clark, and DL Hartl. 2005. “Compensatory cis-trans evolution and the dysregulation of gene expression in interspecific hybrids of Drosophila.” Genetics 171: 1813-22. Abstract

Hybrids between species are often characterized by novel gene-expression patterns. A recent study on allele-specific gene expression in hybrids between species of Drosophila revealed cases in which cis- and trans-regulatory elements within species had coevolved in such a way that changes in cis-regulatory elements are compensated by changes in trans-regulatory elements. We hypothesized that such coevolution should often lead to gene misexpression in the hybrid. To test this hypothesis, we estimated allele-specific expression and overall expression levels for 31 genes in D. melanogaster, D. simulans, and their F1 hybrid. We found that 13 genes with cis-trans compensatory evolution are in fact misexpressed in the hybrid. These represent candidate genes whose dysregulation might be the consequence of coevolution of cis- and trans-regulatory elements within species. Using a mathematical model for the regulation of gene expression, we explored the conditions under which cis-trans compensatory evolution can lead to misexpression in interspecific hybrids.

We describe the complete opsin gene families from the sequenced fugu and Tetraodon pufferfish genomes. We report the convergent loss of function of an anciently duplicated, functionally divergent RH2 or "green-sensitive" opsin gene in both pufferfish lineages, designated RH2-2. In fugu, RH2-2 apparently ceased to function very recently following a transposon-induced deletion that truncated the N-terminal 115 amino acids from the translated protein. Although a lack of frameshift or nonsense mutations in the fugu RH2-2 pseudogene suggests that the gene was lost very recently in this lineage, we were unable to detect any evidence of a selective sweep associated with the fixation of the truncated allele from population data. Interspecific comparison of the remaining fugu RH2-2 coding sequence paradoxically indicates that the gene was under strong purifying selection until the truncation occurred.

Blumenstiel, JP, and DL Hartl. 2005. “Evidence for maternally transmitted small interfering RNA in the repression of transposition in Drosophila virilis.” Proc Natl Acad Sci U S A 102: 15965-70. Abstract

Hybrid dysgenesis in Drosophila is a syndrome of gonadal atrophy, sterility, and male recombination, and it occurs in the progeny of crosses between males that harbor certain transposable elements (TEs) and females that lack them. Known examples of hybrid dysgenesis in Drosophila melanogaster result from mobilization of individual families of TEs, such as the P element, the I element, or hobo. An example of hybrid dysgenesis in Drosophila virilis is unique in that multiple, unrelated families of TEs become mobilized, but a TE designated Penelope appears to play a major role. In all known examples of hybrid dysgenesis, the paternal germ line transmits the TEs in an active state, whereas the female germ line maintains repression of the TEs. The mechanism of maternal maintenance of repression is not known. Recent evidence suggests that the molecular machinery of RNA interference may function as an important host defense against TEs. This protection is mediated by the action of endogenous small interfering RNAs (siRNAs) composed of dsRNA molecules of 21-25 nt that can target complementary transcripts for destruction. In this paper, we demonstrate that endogenous siRNA derived from the Penelope element is maternally loaded in embryos through the female germ line in D. virilis. We also present evidence that the maternal inheritance of these endogenous siRNAs may contribute to maternal repression of Penelope.

Neafsey, DE, DL Hartl, and M Berriman. 2005. “Evolution of noncoding and silent coding sites in the Plasmodium falciparum and Plasmodium reichenowi genomes.” Mol Biol Evol 22: 1621-6. Abstract

We compared levels of sequence divergence between fourfold synonymous coding sites and noncoding sites from the intergenic and intronic regions of the Plasmodium falciparum and Plasmodium reichenowi genomes. We observed significant differences in the level of divergence between these classes of silent sites. Fourfold synonymous coding sites exhibited the highest level of sequence divergence, followed by introns, and then intergenic sequences. This pattern of relative divergence rates has been observed in primate genomes but was unexpected in Plasmodium due to a paucity of variation at silent sites in P. falciparum and the corollary hypothesis that silent sites in this genome may be subject to atypical selective constraints. Exclusion of hypermutable CpG dinucleotides reduces the divergence level of synonymous coding sites to that of intergenic sites but does not diminish the significantly higher divergence level of introns relative to intergenic sites. A greater than expected incidence of CpG dinucleotides in intergenic regions less than 500 bp from genes may indicate selective maintenance of regulatory motifs containing CpGs. Divergence rates of different classes of silent sites in these Plasmodium genomes are determined by a combination of mutational and selective pressures.

Organismic evolution requires that variation at distinct hierarchical levels and attributes be coherently integrated, often in the face of disparate environmental and genetic pressures. A central part of the evolutionary analysis of biological systems remains to decipher the causal connections between organism-wide (or genome-wide) attributes (e.g., mRNA abundance, protein length, codon bias, recombination rate, genomic position, mutation rate, etc) as well as their role-together with mutation, selection, and genetic drift-in shaping patterns of evolutionary variation in any of the attributes themselves. Here we combine genome-wide evolutionary analysis of protein and gene expression data to highlight fundamental relationships among genomic attributes and their associations with the evolution of both protein sequences and gene expression levels. Our results show that protein divergence is positively coupled with both gene expression polymorphism and divergence. We show moreover that although the number of protein-protein interactions in Drosophila is negatively associated with protein divergence as well as gene expression polymorphism and divergence, protein-protein interactions cannot account for the observed coupling between regulatory and structural evolution. Furthermore, we show that proteins with higher rates of amino acid substitutions tend to have larger sizes and tend to be expressed at lower mRNA abundances, whereas genes with higher levels of gene expression divergence and polymorphism tend to have shorter sizes and tend to be expressed at higher mRNA abundances. Finally, we show that protein length is negatively associated with both number of protein-protein interactions and mRNA abundance and that interacting proteins in Drosophila show similar amounts of divergence. We suggest that protein sequences and gene expression are subjected to similar evolutionary dynamics, possibly because of similarity in the fitness effect (i.e., strength of stabilizing selection) of disruptions in a gene's protein sequence or its mRNA expression. We conclude that, as more and better data accumulate, understanding the causal connections among biological traits and how they are integrated over time to constrain or promote structural and regulatory evolution may finally become possible.

Kulathinal, RJ, and DL Hartl. 2005. “The latest buzz in comparative genomics.” Genome Biol 6: 201. Abstract

A second species of fruit fly has just been added to the growing list of organisms with complete and annotated genome sequences. The publication of the Drosophila pseudoobscura sequence provides a snapshot of how genomes have changed over tens of millions of years and sets the stage for the analysis of more fly genomes.

Depristo, MA, DM Weinreich, and DL Hartl. 2005. “Missense meanderings in sequence space: a biophysical view of protein evolution.” Nat Rev Genet 6: 678-87. Abstract

Proteins are finicky molecules; they are barely stable and are prone to aggregate, but they must function in a crowded environment that is full of degradative enzymes bent on their destruction. It is no surprise that many common diseases are due to missense mutations that affect protein stability and aggregation. Here we review the literature on biophysics as it relates to molecular evolution, focusing on how protein stability and aggregation affect organismal fitness. We then advance a biophysical model of protein evolution that helps us to understand phenomena that range from the dynamics of molecular adaptation to the clock-like rate of protein evolution.

The extent to which natural selection shapes phenotypic variation has long been a matter of debate among those studying organic evolution. We studied the patterns of gene expression polymorphism and divergence in several datasets that ranged from comparisons between two very closely related laboratory strains of mice to comparisons across a considerably longer time scale, such as between humans and chimpanzees, two species of mice, and two species of Drosophila. The results were analyzed and interpreted in view of neutral models of phenotypic evolution. Our analyses used a number of metrics to show that most mRNA levels are evolutionary stable, changing little across the range of taxonomic distances compared. This implies that, overall, widespread stabilizing selection on transcription levels has prevented greater evolutionary changes in mRNA levels. Nevertheless, the range of rates of divergence is large with highly significant differences in the rate and patterns of transcription divergence across functional classes defined on the basis of the gene ontology annotation (primates and mice datasets) or on the basis of the pattern of sex-biased gene expression (Drosophila). Moreover, rates of divergence of sex-biased genes in the contrast between Drosophila species show a distinct pattern from that observed in the contrast between populations of D. melanogaster. Hence, we discuss the time scale of the changes observed and its consequences for the relationship between variation in gene expression within and between species. Finally, we argue that differences in mRNA levels of the magnitudes observed herein could be explained by a remarkably small number of generations of directional selection.

Castillo-Davis, CI, TB Bedford, and DL Hartl. 2004. “Accelerated rates of intron gain/loss and protein evolution in duplicate genes in human and mouse malaria parasites.” Mol Biol Evol 21: 1422-7. Abstract

Very little is known about molecular evolution in the human malaria parasite Plasmodium falciparum. Given the potentially important role that introns play in directing transcription and the posttranscriptional control of gene expression, we compare rates of intron/gain loss and intronic substitution in P. falciparum and the rodent malaria P. y. yoelii in both orthologous and duplicate genes. Specifically, we test the hypothesis that intron gain/loss and protein evolution is accelerated in duplicate genes versus orthologous genes in both parasites using the genome sequence of both species. We find that duplicate genes in both P. falciparum and P. y. yoelii exhibit a dramatic acceleration of both intron gain/loss and protein evolution in comparison with orthologs, suggesting increased directional and/or relaxed selection in duplicate genes. Further, we find that rates of intron gain/loss and protein evolution are weakly coupled in orthologs but not paralogs, supporting the hypothesis that selection acts on genes as functionally integrated units after speciation but not necessarily after gene duplication. In contrast, we find that rates of nucleotide substitution do not differ significantly between intronic sites and synonymous sites among duplicate genes, implying that a large fraction of intronic sites in Plasmodium evolve under little or no selective constraint.

Ranz, JM, K Namgyal, G Gibson, and DL Hartl. 2004. “Anomalies in the expression profile of interspecific hybrids of Drosophila melanogaster and Drosophila simulans.” Genome Res 14: 373-9. Abstract

When females of Drosophila melanogaster and males of Drosophila simulans are mated, the male progeny are inviable, whereas the female progeny display manifold malformations and are sterile. These abnormalities result from genetic incompatibilities accumulated since the time the lineages of the species diverged, and may have their origin in aberrant gene transcription. Because compensatory changes within species may obscure differences at the regulatory level in conventional comparisons of the expression profile between species, we have compared the gene-expression profile of hybrid females with those of females of the parental species in order to identify regulatory incompatibilities. In the hybrid females, we find abnormal levels of messenger RNA for a large fraction of the Drosophila transcriptome. These include a gross underexpression of genes preferentially expressed in females, accompanying gonadal atrophy. The hybrid females also show significant overexpression of male-biased genes, which we attribute to incompatibilities in the regulatory mechanisms that normally act to control the expression of these genes in females. The net result of the multiple incompatibilities is that the gene-expression profiles of the parental females are more similar to each other than either is to that of the hybrid.

Castillo-Davis, CI, DL Hartl, and G Achaz. 2004. “cis-Regulatory and protein evolution in orthologous and duplicate genes.” Genome Res 14: 1530-6. Abstract

The relationship between protein and regulatory sequence evolution is a central question in molecular evolution. It is currently not known to what extent changes in gene expression are coupled with the evolution of protein coding sequences, or whether these changes differ among orthologs (species homologs) and paralogs (duplicate genes). Here, we develop a method to measure the extent of functionally relevant cis-regulatory sequence change in homologous genes, and validate it using microarray data and experimentally verified regulatory elements in different eukaryotic species. By comparing the genomes of Caenorhabditis elegans and C. briggsae, we found that protein and regulatory evolution is weakly coupled in orthologs but not paralogs, suggesting that selective pressure on gene expression and protein evolution is quite similar and persists for a significant amount of time following speciation but not gene duplication. Additionally, duplicates of both species exhibit a dramatic acceleration of both regulatory and protein evolution compared to orthologs, suggesting increased directional selection and/or relaxed selection on both gene expression patterns and protein function in duplicate genes.

Kulathinal, RJ, BR Bettencourt, and DL Hartl. 2004. “Compensated deleterious mutations in insect genomes.” Science 306: 1553-4. Abstract

Relatively little is known about the importance of amino acid interactions in protein and phenotypic evolution. Here we examine whether mutations that are pathogenic in Drosophila melanogaster become fixed via epistasis in other Dipteran genomes. Overall divergence at pathogenic amino acid sites is reduced. However, approximately 10% of the substitutions at these sites carry the exact same pathogenic amino acid found in D. melanogaster mutants. Hence compensatory mutation(s) must have evolved. Surprisingly, the fraction 10% is not affected by phylogenetic distance. These results support a selection-driven process that allows compensated amino acid substitutions to become rapidly fixed in taxa with large populations.

Neafsey, DE, JP Blumenstiel, and DL Hartl. 2004. “Different regulatory mechanisms underlie similar transposable element profiles in pufferfish and fruitflies.” Mol Biol Evol 21: 2310-8. Abstract

Comparative analysis of recently sequenced eukaryotic genomes has uncovered extensive variation in transposable element (TE) abundance, diversity, and distribution. The TE profile in the sequenced pufferfish genomes is more similar to that of Drosophila melanogaster than to human or mouse, in that pufferfish TEs exhibit low overall abundance, high family diversity, and localization in the heterochromatin. It has been suggested that selection against the deleterious effects of ectopic recombination between TEs has structured the TE profile in Drosophila and pufferfish but not in humans. We test this hypothesis by measuring the sample frequency of 48 euchromatic TE insertions in the genome of the green spotted pufferfish (Tetraodon nigroviridis). We estimate the strength of selection acting on recent insertions by analyzing the site frequency spectrum using a maximum-likelihood approach. We show that in contrast to Drosophila, euchromatic TE insertions in Tetraodon are selectively neutral and that the low copy number and compartmentalized distribution of TEs in the Tetraodon genome must be caused by regulation by means other than purifying selection acting on recent insertions. Inference of regulatory processes governing TE profiles should take into account factors such as effective population size, incidence of inbreeding/outcrossing, and other species-specific traits.

Castillo-Davis, CI, FA Kondrashov, DL Hartl, and RJ Kulathinal. 2004. “The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint.” Genome Res 14: 802-11. Abstract

We compare the functional spectrum of protein evolution in two separate animal lineages with respect to two hypotheses: (1) rates of divergence are distributed similarly among functional classes within both lineages, indicating that selective pressure on the proteome is largely independent of organismic-level biological requirements; and (2) rates of divergence are distributed differently among functional classes within each lineage, indicating species-specific selective regimes impact genome-wide substitutional patterns. Integrating comparative genome sequence with data from tissue-specific expressed-sequence-tag (EST) libraries and detailed database annotations, we find a functional genomic signature of rapid evolution and selective constraint shared between mammalian and nematode lineages despite their extensive morphological and ecological differences and distant common ancestry. In both phyla, we find evidence of accelerated evolution among components of molecular systems involved in coevolutionary change. In mammals, lineage-specific fast evolving genes include those involved in reproduction, immunity, and possibly, maternal-fetal conflict. Likelihood ratio tests provide evidence for positive selection in these rapidly evolving functional categories in mammals. In contrast, slowly evolving genes, in terms of amino acid or insertion/deletion (indel) change, in both phyla are involved in core molecular processes such as transcription, translation, and protein transport. Thus, strong purifying selection appears to act on the same core cellular processes in both mammalian and nematode lineages, whereas positive and/or relaxed selection acts on different biological processes in each lineage.

The recent action of positive selection is expected to influence patterns of intraspecific DNA sequence variation in chromosomal regions linked to the selected locus. These effects include decreased polymorphism, increased linkage disequilibrium, and an increased frequency of derived variants. These effects are all expected to dissipate with distance from the selected locus due to recombination. Therefore, in regions of high recombination, it should be possible to localize a target of selection to a relatively small interval. Previously described patterns of intraspecific variation in three tandemly arranged, testes-expressed genes (janusA, janusB, and ocnus) in Drosophila simulans included all three of these features. Here we expand the original sample and also survey nucleotide polymorphism at three neighboring loci. On the basis of recombination events between derived and ancestral alleles, we localize the target of selection to a 1.5-kb region surrounding janusB. A composite-likelihood-ratio test based on the spatial distribution and frequency of derived polymorphic variants corroborates this result and provides an estimate of the strength of selection. However, the data are difficult to reconcile with the simplest model of positive selection, whereas a new composite-likelihood method suggests that the data are better described by a model in which the selected allele has not yet gone to fixation.

Hartl, DL. 2004. “The origin of malaria: mixed messages from genetic diversity.” Nat Rev Microbiol 2: 15-22. Abstract

Over the past 35 years, the incidence of malaria has increased 2-3-fold. At present, it affects 300-500 million people and causes about 1 million deaths, primarily in Africa. The continuing upsurge has come from a coincidence of drug-resistant parasites, insecticide-resistant mosquitoes, global climate change and continuing poverty and political instability. An analogous rapid increase in malaria might have taken place about 10,000 years ago. Patterns of genetic variation in mitochondrial DNA support this model, but variation in nuclear genes gives an ambiguous message. Resolving these discrepancies has implications for the evolution of drug resistance and vaccine evasion.

Lemos, B, CD Meiklejohn, and DL Hartl. 2004. “Regulatory evolution across the protein interaction network.” Nat Genet 36: 1059-60. Abstract

Protein-protein interactions may impose constraints on both structural and regulatory evolution. Here we show that protein-protein interactions are negatively associated with evolutionary variation in gene expression. Moreover, interacting proteins have similar levels of variation in expression, and their expression levels are positively correlated across strains. Our results suggest that interacting proteins undergo similar evolutionary dynamics, and that their expression levels are evolutionarily coupled. These patterns hold for organisms as diverse as budding yeast and fruit flies.

Sawyer, SA, RJ Kulathinal, CD Bustamante, and DL Hartl. 2003. “Bayesian analysis suggests that most amino acid replacements in Drosophila are driven by positive selection.” J Mol Evol 57 Suppl 1: S154-64. Abstract

One of the principal goals of population genetics is to understand the processes by which genetic variation within species (polymorphism) becomes converted into genetic differences between species (divergence). In this transformation, selective neutrality, near neutrality, and positive selection may each play a role, differing from one gene to the next. Synonymous nucleotide sites are often used as a uniform standard of comparison across genes on the grounds that synonymous sites are subject to relatively weak selective constraints and so may, to a first approximation, be regarded as neutral. Synonymous sites are also interdigitated with nonsynonymous sites and so are affected equally by genomic context and demographic factors. Hence a comparison of levels of polymorphism and divergence between synonymous sites and amino acid replacement sites in a gene is potentially informative about the magnitude of selective forces associated with amino acid replacements. We have analyzed 56 genes in which polymorphism data from D. simulans are compared with divergence from a reference strain of D. melanogaster. The framework of the analysis is Bayesian and assumes that the distribution of selective effects (Malthusian fitnesses) is Gaussian with a mean that differs for each gene. In such a model, the average scaled selection intensity (gamma = N(e)s) of amino acid replacements eligible to become polymorphic or fixed is -7.31, and the standard deviation of selective effects within each locus is 6.79 (assuming homoscedasticity across loci). For newly arising mutations of this type that occur in autosomal or X-linked genes, the average proportion of beneficial mutations is 19.7%. Among the amino acid polymorphisms in the sample, the expected average proportion of beneficial mutations is 47.7%, and among amino acid replacements that become fixed the average proportion of beneficial mutations is 94.3%. The average scaled selection intensity of fixed mutations is +5.1. The presence of positive selection is pervasive with the single exception of kl-5, a Y-linked fertility gene. We find no evidence that a significant fraction of fixed amino acid replacements is neutral or nearly neutral or that positive selection drives amino acid replacements at only a subset of the loci. These results are model dependent and we discuss possible modifications of the model that might allow more neutral and nearly neutral amino acid replacements to be fixed.