Castillo-Davis, CI, and DL Hartl. 2002. “Genome evolution and developmental constraint in Caenorhabditis elegans.” Mol Biol Evol 19: 728-35. Abstract

It has been hypothesized that evolutionary changes will be more frequent in later ontogeny than early ontogeny because of developmental constraint. To test this hypothesis, a genomewide examination of molecular evolution through ontogeny was carried out using comparative genomic data in Caenorhabditis elegans and Caenorhabditis briggsae. We found that the mean rate of amino acid replacement is not significantly different between genes expressed during and after embryogenesis. However, synonymous substitution rates differed significantly between these two classes. A genomewide survey of correlation between codon bias and expression level found codon bias to be significantly correlated with mRNA expression (r(s) = -0.30 and P < 10(-131)) but does not alone explain differences in dS between classes. Surprisingly, it was found that genes expressed after embryogenesis have a significantly greater number of duplicates in both the C. elegans and C. briggsae genomes (P < 10(-20) and P < 10(-13)) when compared with early-expressed and nonmodulated genes. A similarity in the distribution of duplicates of nonmodulated and early-expressed genes, as well as a disproportionately higher number of early pseudogenes, lend support to the hypothesis that this difference in duplicate number is caused by selection against gene duplicates of early-expressed genes, reflecting developmental constraint. Developmental constraint at the level of gene duplication may have important implications for macroevolutionary change.

We present a new likelihood method for detecting constrained evolution at synonymous sites and other forms of nonneutral evolution in putative pseudogenes. The model is applicable whenever the DNA sequence is available from a protein-coding functional gene, a pseudogene derived from the protein-coding gene, and an orthologous functional copy of the gene. Two nested likelihood ratio tests are developed to test the hypotheses that (1) the putative pseudogene has equal rates of silent and replacement substitutions; and (2) the rate of synonymous substitution in the functional gene equals the rate of substitution in the pseudogene. The method is applied to a data set containing 74 human processed-pseudogene loci, 25 mouse processed-pseudogene loci, and 22 rat processed-pseudogene loci. Using the informatics resources of the Human Genome Project, we localized 67 of the human-pseudogene pairs in the genome and estimated the GC content of a large surrounding genomic region for each. We find that, for pseudogenes deposited in GC regions similar to those of their paralogs, the assumption of equal rates of silent and replacement site evolution in the pseudogene is upheld; in these cases, the rate of silent site evolution in the functional genes is approximately 70% the rate of evolution in the pseudogene. On the other hand, for pseudogenes located in genomic regions of much lower GC than their functional gene, we see a sharp increase in the rate of silent site substitutions, leading to a large rate of rejection for the pseudogene equality likelihood ratio test.

Hartl, DL, SK Volkman, KM Nielsen, AE Barry, KP Day, DF Wirth, and EA Winzeler. 2002. “The paradoxical population genetics of Plasmodium falciparum.” Trends Parasitol 18: 266-72. Abstract

Among the leading causes of death in African children is cerebral malaria caused by the parasitic protozoan Plasmodium falciparum. Endemic forms of this disease are thought to have originated in central Africa 5000-10000 years ago, coincident with the innovation of slash-and-burn agriculture and the diversification of the Anopheles gambiae complex of mosquito vectors. Population genetic studies of P. falciparum have yielded conflicting results. Some evidence suggests that today's population includes multiple ancient lineages pre-dating human speciation. Other evidence suggests that today's population derives from only one, or a small number, of these ancient lineages. Resolution of this issue is important for the evaluation of the long-term efficacy of drug and immunological control strategies.

Grosu, P, JP Townsend, DL Hartl, and D Cavalieri. 2002. “Pathway Processor: a tool for integrating whole-genome expression results into metabolic networks.” Genome Res 12: 1121-6. Abstract

We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or more genes in a pathway would be significantly altered in a given experiment by chance alone. This method has been validated on diauxic shift experiments and reproduces well known effects of carbon source on yeast metabolism. The analysis is implemented with Pathway Analyzer, one of the tools of Pathway Processor, a new statistical package for the analysis of whole-genome expression data. Results from multiple experiments can be compared, reducing the analysis from the full set of individual genes to a limited number of pathways of interest. The pathways are visualized with OpenDX, an open-source visualization software package, and the relationship between genes in the pathways can be examined in detail using Expression Mapper, the second program of the package. This program features a graphical output displaying differences in expression on metabolic charts of the biochemical pathways to which the open reading frames are assigned.

Blumenstiel, JP, DL Hartl, and ER Lozovsky. 2002. “Patterns of insertion and deletion in contrasting chromatin domains.” Mol Biol Evol 19: 2211-25. Abstract

Transposable elements (TEs) play a fundamental role in the evolution of genomes. In Drosophila they are disproportionately represented in regions of low recombination, such as in heterochromatin. This pattern has been attributed to selection against repeated elements in regions of normal recombination, owing to either (1) the slightly deleterious position effects of TE insertions near or into genes, or (2) strong selection against chromosomal abnormalities arising from ectopic exchange between TE repeats. We have used defective non-long-terminal repeat (LTR) TEs that are "dead-on-arrival" (DOA) and unable to transpose in order to estimate spontaneous deletion rates in different constituents of chromatin. These elements have previously provided evidence for an extremely high rate of spontaneous deletion in Drosophila as compared with mammals, potentially explaining at least part of the differences in the genome sizes in these organisms. However, rates of deletion could be overestimated due to positive selection for a smaller likelihood of ectopic exchange. In this article, we show that rates of spontaneous deletion in DOA repeats are as high in heterochromatin and regions of euchromatin with low recombination as they are in regions of euchromatin with normal recombination. We have also examined the age distribution of five non-LTR families throughout the genome. We show that there is substantial variation in the historical pattern of transposition of these TEs. The overrepresentation of TEs in the heterochromatin is primarily due to their longer retention time in heterochromatin, as evidenced by the average time since insertion. Fragments inserted recently are much more evenly distributed in the genome. This contrast demonstrates that the accumulation of TEs in heterochromatin and in euchromatic regions of low recombination is not due to biased transposition but by greater probabilities of fixation in these regions relative to regions of normal recombination.

Castillo-Davis, CI, SL Mekhedov, DL Hartl, EV Koonin, and FA Kondrashov. 2002. “Selection for short introns in highly expressed genes.” Nat Genet 31: 415-8. Abstract

Transcription is a slow and expensive process: in eukaryotes, approximately 20 nucleotides can be transcribed per second at the expense of at least two ATP molecules per nucleotide. Thus, at least for highly expressed genes, transcription of long introns, which are particularly common in mammals, is costly. Using data on the expression of genes that encode proteins in Caenorhabditis elegans and Homo sapiens, we show that introns in highly expressed genes are substantially shorter than those in genes that are expressed at low levels. This difference is greater in humans, such that introns are, on average, 14 times shorter in highly expressed genes than in genes with low expression, whereas in C. elegans the difference in intron length is only twofold. In contrast, the density of introns in a gene does not strongly depend on the level of gene expression. Thus, natural selection appears to favor short introns in highly expressed genes to minimize the cost of transcription and other molecular processes, such as splicing.

Lozovsky, ER, D Nurminsky, EA Wimmer, and DL Hartl. 2002. “Unexpected stability of mariner transgenes in Drosophila.” Genetics 160: 527-35. Abstract

A number of mariner transformation vectors based on the mauritiana subfamily of transposable elements were introduced into the genome of Drosophila melanogaster and examined for their ability to be mobilized by the mariner transposase. Simple insertion vectors were constructed from single mariner elements into which exogenous DNA ranging in size from 1.3 to 4.5 kb had been inserted; composite vectors were constructed with partial or complete duplications of mariner flanking the exogenous DNA. All of the simple insertion vectors showed levels of somatic and germline excision that were at least 100-fold lower than the baseline level of uninterrupted mariner elements. Although composite vectors with inverted duplications were unable to be mobilized at detectable frequencies, vectors with large direct duplications of mariner could be mobilized. A vector consisting of two virtually complete elements flanking exogenous DNA yielded a frequency of somatic eye-color mosaicism of approximately 10% and a frequency of germline excision of 0.04%. These values are far smaller than those observed for uninterrupted elements. The results imply that efficient mobilization of mariner in vivo requires the presence and proper spacing of sequences internal to the element as well as the inverted repeats.

Nurminsky, D, DD Aguiar, CD Bustamante, and DL Hartl. 2001. “Chromosomal effects of rapid gene evolution in Drosophila melanogaster.” Science 291: 128-30. Abstract

Rapid adaptive fixation of a new favorable mutation is expected to affect neighboring genes along the chromosome. Evolutionary theory predicts that the chromosomal region would show a reduced level of genetic variation and an excess of rare alleles. We have confirmed these predictions in a region of the X chromosome of Drosophila melanogaster that contains a newly evolved gene for a component of the sperm axoneme. In D. simulans, where the novel gene does not exist, the pattern of genetic variation is consistent with selection against recurrent deleterious mutations. These findings imply that the pattern of genetic variation along a chromosome may be useful for inferring its evolutionary history and for revealing regions in which recent adaptive fixations have taken place.

Bustamante, CD, J Wakeley, S Sawyer, and DL Hartl. 2001. “Directional selection and the site-frequency spectrum.” Genetics 159: 1779-88. Abstract

In this article we explore statistical properties of the maximum-likelihood estimates (MLEs) of the selection and mutation parameters in a Poisson random field population genetics model of directional selection at DNA sites. We derive the asymptotic variances and covariance of the MLEs and explore the power of the likelihood ratio tests (LRT) of neutrality for varying levels of mutation and selection as well as the robustness of the LRT to deviations from the assumption of free recombination among sites. We also discuss the coverage of confidence intervals on the basis of two standard-likelihood methods. We find that the LRT has high power to detect deviations from neutrality and that the maximum-likelihood estimation performs very well when the ancestral states of all mutations in the sample are known. When the ancestral states are not known, the test has high power to detect deviations from neutrality for negative selection but not for positive selection. We also find that the LRT is not robust to deviations from the assumption of independence among sites.

Bensasson, D, DA Petrov, DX Zhang, DL Hartl, and GM Hewitt. 2001. “Genomic gigantism: DNA loss is slow in mountain grasshoppers.” Mol Biol Evol 18: 246-53. Abstract

Several studies have shown DNA loss to be inversely correlated with genome size in animals. These studies include a comparison between Drosophila and the cricket, Laupala, but there has been no assessment of DNA loss in insects with very large genomes. Podisma pedestris, the brown mountain grasshopper, has a genome over 100 times as large as that of Drosophila and 10 times as large as that of Laupala. We used 58 paralogous nuclear pseudogenes of mitochondrial origin to study the characteristics of insertion, deletion, and point substitution in P. pedestris and Italopodisma. In animals, these pseudogenes are "dead on arrival"; they are abundant in many different eukaryotes, and their mitochondrial origin simplifies the identification of point substitutions accumulated in nuclear pseudogene lineages. There appears to be a mononucleotide repeat within the 643-bp pseudogene sequence studied that acts as a strong hot spot for insertions or deletions (indels). Because the data for other insect species did not contain such an unusual region, hot spots were excluded from species comparisons. The rate of DNA loss relative to point substitution appears to be considerably and significantly lower in the grasshoppers studied than in Drosophila or Laupala. This suggests that the inverse correlation between genome size and the rate of DNA loss can be extended to comparisons between insects with large or gigantic genomes (i.e., Laupala and Podisma). The low rate of DNA loss implies that in grasshoppers, the accumulation of point mutations is a more potent force for obscuring ancient pseudogenes than their loss through indel accumulation, whereas the reverse is true for Drosophila. The main factor contributing to the difference in the rates of DNA loss estimated for grasshoppers, crickets, and Drosophila appears to be deletion size. Large deletions are relatively rare in Podisma and Italopodisma.

Bensasson, D, D Zhang, DL Hartl, and GM Hewitt. 2001. “Mitochondrial pseudogenes: evolution's misplaced witnesses.” Trends Ecol Evol 16: 314-321. Abstract

Nuclear copies of mitochondrial DNA (mtDNA) have contaminated PCR-based mitochondrial studies of over 64 different animal species. Since the last review of these nuclear mitochondrial pseudogenes (Numts) in animals, Numts have been found in 53 of the species studied. The recent evidence suggests that Numts are not equally abundant in all species, for example they are more common in plants than in animals, and also more numerous in humans than in Drosophila. Methods for avoiding Numts have now been tested, and several recent studies demonstrate the potential utility of Numt DNA sequences in evolutionary studies. As relics of ancient mtDNA, these pseudogenes can be used to infer ancestral states or root mitochondrial phylogenies. Where they are numerous and selectively unconstrained, Numts are ideal for the study of spontaneous mutation in nuclear genomes.

Parsch, J, CD Meiklejohn, E Hauschteck-Jungen, P Hunziker, and DL Hartl. 2001. “Molecular evolution of the ocnus and janus genes in the Drosophila melanogaster species subgroup.” Mol Biol Evol 18: 801-11. Abstract

Genes involved in male fertility are potential targets for sexual selection, and their evolution may play a role in reproductive isolation and speciation. Here we describe a new Drosophila melanogaster gene, ocnus (ocn), that encodes a protein abundant in testes nuclear extracts. RT-PCR indicates that ocn transcription is limited to males and is specific to testes. ocn shares homology with another testis-specific gene, janusB (janB), and is located just distal to janB on chromosome 3. The two genes also share homology with the adjacent janusA (janA) gene, suggesting that multiple duplication events have occurred within this region of the genome. We cloned and sequenced these three genes from species of the D. melanogaster species subgroup. Phylogenetic analysis based on protein-encoding sequences predicts a duplication pattern of janA --> janA janB --> janA janB ocn, with the latter event occurring after the divergence of the D. melanogaster and Drosophila obscura species groups. We found significant heterogeneity in the rates of evolution among the three genes within the D. melanogaster species subgroup as measured by the ratio of nonsynonymous to synonymous substitutions, suggesting that diversification of gene function followed each duplication event and that each gene evolved under different selective constraints. All three genes showed faster rates of evolution than genes encoding proteins with metabolic function. These results are consistent with previous studies that have detected an increased rate of evolution in genes with reproductive function.

Levels of nucleotide polymorphism in three paralogous Drosophila simulans genes, janusA (janA), janusB (janB), and ocnus (ocn), were surveyed by DNA sequencing. The three genes lie in tandem within a 2.5-kb region of chromosome arm 3R. In a sample of eight alleles from a worldwide distribution we found a significant departure from neutrality by several statistical tests. The most striking feature of this sample was that in a 1.7-kb region containing the janA and janB genes, 30 out of 31 segregating sites contained variants present only once in the sample, and 29 of these unique variants were found in the same allele. A restriction survey of an additional 28 lines of D. simulans revealed strong linkage disequilibrium over the janA-janB region and identified six more alleles matching the rare haplotype. Among the rare alleles, the level of DNA sequence variation was typical for D. simulans autosomal genes and showed no departure from neutrality. In addition, the rare haplotype was more similar to the D. melanogaster sequence, indicating that it was the ancestral form. These results suggest that the derived haplotype has risen to high worldwide frequency relatively recently, most likely as a result of natural selection.

Volkman, SK, AE Barry, EJ Lyons, KM Nielsen, SM Thomas, M Choi, SS Thakore, KP Day, DF Wirth, and DL Hartl. 2001. “Recent origin of Plasmodium falciparum from a single progenitor.” Science 293: 482-4. Abstract

Genetic variability of Plasmodium falciparum underlies its transmission success and thwarts efforts to control disease caused by this parasite. Genetic variation in antigenic, drug resistance, and pathogenesis determinants is abundant, consistent with an ancient origin of P. falciparum, whereas DNA variation at silent (synonymous) sites in coding sequences appears virtually absent, consistent with a recent origin of the parasite. To resolve this paradox, we analyzed introns and demonstrated that these are deficient in single-nucleotide polymorphisms, as are synonymous sites in coding regions. These data establish the recent origin of P. falciparum and further provide an explanation for the abundant diversity observed in antigen and other selected genes.

Tao, Y, DL Hartl, and CC Laurie. 2001. “Sex-ratio segregation distortion associated with reproductive isolation in Drosophila.” Proc Natl Acad Sci U S A 98: 13183-8. Abstract

Sex-ratio distortion is the most common form of non-Mendelian segregation observed in natural populations. It may occur even more frequently than direct observations suggest, because the dysgenic population consequences of a biased sex ratio are expected to result in the rapid evolution of suppressors, resulting in suppressed or "cryptic" segregation distortion. Here we report evidence for cryptic sex-ratio distortion that was discovered by introgressing segments of the genome of Drosophila mauritiana into the genome of Drosophila simulans. The autosomal suppressor of sex-ratio distortion, which is also associated with a reduction in hybrid male fertility, has been genetically localized to a region smaller than 80-kb pairs in chromosome 3.

Siegal, ML, and DL Hartl. 2000. “Application of Cre/loxP in Drosophila. Site-specific recombination and transgene coplacement.” Methods Mol Biol 136: 487-95.

In vivo levels of enzymatic activity may be increased through either structural or regulatory changes. Here we use Drosophila melanogaster alcohol dehydrogenase (ADH) in an experimental test for selective differences between these two mechanisms. The well-known ADH-Slow (S)/Fast (F) amino acid replacement leads to a twofold increase in activity by increasing the catalytic efficiency of the enzyme. Disruption of a highly conserved, negative regulatory element in the Adh 3' UTR also leads to a twofold increase in activity, although this is achieved by increasing in vivo Adh mRNA and protein concentrations. These two changes appear to be under different types of selection, with positive selection favoring the amino acid replacement and purifying selection maintaining the 3' UTR sequence. Using transgenic experiments we show that deletion of the conserved 3' UTR element increases adult and larval Adh expression in both the ADH-F and ADH-S genetic backgrounds. However, the 3' UTR deletion also leads to a significant increase in developmental time in both backgrounds. ADH allozyme type has no detectable effect on development. These results demonstrate a negative fitness effect associated with Adh overexpression. This provides a mechanism whereby natural selection can discriminate between alternative pathways of increasing enzymatic activity.

Petrov, DA, TA Sangster, JS Johnston, DL Hartl, and KL Shaw. 2000. “Evidence for DNA loss as a determinant of genome size.” Science 287: 1060-2. Abstract

Eukaryotic genome sizes range over five orders of magnitude. This variation cannot be explained by differences in organismic complexity (the C value paradox). To test the hypothesis that some variation in genome size can be attributed to differences in the patterns of insertion and deletion (indel) mutations among organisms, this study examines the indel spectrum in Laupala crickets, which have a genome size 11 times larger than that of Drosophila. Consistent with the hypothesis, DNA loss is more than 40 times slower in Laupala than in Drosophila.

Hartl, DL. 2000. “Fly meets shotgun: shotgun wins.” Nat Genet 24: 327-8.
Leung, JY, FE McKenzie, AM Uglialoro, PO Flores-Villanueva, BC Sorkin, EJ Yunis, DL Hartl, and AE Goldfeld. 2000. “Identification of phylogenetic footprints in primate tumor necrosis factor-alpha promoters.” Proc Natl Acad Sci U S A 97: 6614-8. Abstract

The human tumor necrosis factor-alpha (TNF-alpha) gene encodes a pleiotropic cytokine that plays a critical role in basic immunologic processes. To investigate the TNF-alpha regulatory region in the primate lineage, we isolated TNF-alpha promoters from representative great apes, Old World monkeys, and New World monkeys. We demonstrate that there is a nonuniform distribution of fixed human differences in the TNF-alpha promoter. We define a "fixed human difference" as a site that is not polymorphic in humans, but which differs in at least one of the seven primate sequences examined. Furthermore, we identify two human TNF-alpha promoter single nucleotide polymorphisms that are putative ancestral polymorphisms, because each of the human polymorphic nucleotides was found at the identical site in at least one of the other primate sequences. Strikingly, the largest conserved region among the primate species, a 69-nt "phylogenetic footprint," corresponds to a region of the human TNF-alpha promoter that forms the transcriptionally active nucleoprotein-DNA complex, essential for gene regulation. By contrast, other regions of the TNF-alpha promoter, which exhibit a high density of variable sites, are nonessential for gene expression, indicating that distinct TNF-alpha promoter regions have been subjected to different evolutionary constraints depending on their function. TNF-alpha is the first case in which a promoter region dissected by functional analyses can be correlated with nucleotide polymorphism and variability in primate lineages. The results suggest that patterns of polymorphism and divergence are likely to be useful in identifying candidate regions important for gene regulation in other immune-response genes.