Publications

2003
Castillo-Davis, CI, and DL Hartl. 2003. “Conservation, relocation and duplication in genome evolution.” Trends Genet 19: 593-7.
Barry, AE, A Leliwa, M Choi, KM Nielsen, DL Hartl, and KP Day. 2003. “DNA sequence artifacts and the estimation of time to the most recent common ancestor (TMRCA) of Plasmodium falciparum.” Mol Biochem Parasitol 130: 143-7.
Cavalieri, D, PE McGovern, DL Hartl, R Mortimer, and M Polsinelli. 2003. “Evidence for S. cerevisiae fermentation in ancient wine.” J Mol Evol 57 Suppl 1: S226-32. Abstract

Saccharomyces cerevisiae is the principal yeast used in modern fermentation processes, including winemaking, breadmaking, and brewing. From residue present inside one of the earliest known wine jars from Egypt, we have extracted, amplified, and sequenced ribosomal DNA from S. cerevisiae. These results indicate that this organism was probably responsible for wine fermentation by at least 3150 B.C. This inference has major implications for the evolution of bread and beer yeasts, since it suggests that S. cerevisiae yeast, which occurs naturally on the surface bloom of grapes, was also used as an inoculum to ferment cereal products.

Nielsen, KM, J Kasper, M Choi, T Bedford, K Kristiansen, DF Wirth, SK Volkman, ER Lozovsky, and DL Hartl. 2003. “Gene conversion as a source of nucleotide diversity in Plasmodium falciparum.” Mol Biol Evol 20: 726-34. Abstract

Examination of polymorphisms in the Plasmodium falciparum gene for falcipain 2 revealed that this gene is one of two paralogs separated by 10.8 kb in chromosome 11. We designate the annotated gene denoted chr11.gen_424 as encoding falcipain 2A and the annotated gene denoted chr11.gen_427 as encoding falcipain 2B. The paralogs are 96% identical at the nucleotide level and 93% identical at the amino acid level. The consensus sequences differ in 31/309 synonymous sites and 45/1140 nonsynonymous sites, including three amino acid replacements (V393I, A400P, and Q414E) that are near the catalytic site and that may affect substrate affinity or specificity. In six reference isolates, among 36 synonymous sites and 46 nonsynonymous sites that are polymorphic in the gene for falcipain 2A, falcipain 2B, or both, significant spatial clustering is observed. All but one of the polymorphisms appear to result from gene conversion between the paralogs. The estimated rate of gene conversion between the paralogs may be as many as 1,400 to 1,700 times greater than the rate of mutation. Owing to gene conversion, one of the falcipain 2A alleles is more similar to the falcipain 2B alleles than it is to other falcipain 2A alleles. Divergence among the synonymous sites suggests that the paralogous genes last shared a common ancestor 15.2 MYA, with a range of 8.8 to 20.6 MYA. During this period, the paralogs have acquired 0.10 synonymous substitutions per synonymous site in the coding region. The 5' and 3' flanking regions differ in 47.7% and 39.8% of the nucleotide sites, respectively. Hence synonymous sites and flanking regions are not conserved in sequence in spite of their high AT content and T skew.

Castillo-Davis, CI, and DL Hartl. 2003. “GeneMerge--post-genomic analysis, data mining, and hypothesis testing.” Bioinformatics 19: 891-2. Abstract

SUMMARY: GeneMerge is a web-based and standalone program written in PERL that returns a range of functional and genomic data for a given set of study genes and provides statistical rank scores for over-representation of particular functions or categories in the data set. Functional or categorical data of all kinds can be analyzed with GeneMerge, facilitating regulatory and metabolic pathway analysis, tests of population genetic hypotheses, cross-experiment comparisons, and tests of chromosomal clustering, among others. GeneMerge can perform analyses on a wide variety of genomic data quickly and easily and facilitates both data mining and hypothesis testing. AVAILABILITY: GeneMerge is available free of charge for academic use over the web and for download from: http://www.oeb.harvard.edu/hartl/lab/publications/GeneMerge.html.

The genetic basis of Haldane's rule was investigated through estimating the accumulation of hybrid incompatibilities between Drosophila simulans and D. mauritiana by means of introgression. The accumulation of hybrid male sterility (HMS) is at least 10 times greater than that of hybrid female sterility (HFS) or hybrid lethality (HL). The degree of dominance for HMS and HL in a pure D. simulans background is estimated as 0.23-0.29 and 0.33-0.39, respectively; that for HL in an F1 background is unlikely to be very small. Evidence obtained here was used to test the Turelli-Orr model of Haldane's rule. Composite causes, especially, faster-male evolution and recessive hybrid incompatibilities, underlie Haldane's rule in heterogametic male taxa such as Drosophila (XY male and XX female). However, if faster-male evolution is driven by sexual selection, it contradicts Haldane's rule for sterility in heterogametic-female taxa such as Lepidoptera (ZW female and ZZ male). The hypothesis of a faster-heterogametic-sex evolution seems to fit the current data best. This hypothesis states that gametogenesis in the heterogametic sex, instead of in males per se, evolves much faster than in the homogametic sex, in part because of sex-ratio selection. This hypothesis not only explains Haldane's rule in a simple way, but also suggests that genomic conflicts play a major role in evolution and speciation.

The genetic basis of hybrid incompatibility in crosses between Drosophila mauritiana and D. simulans was investigated to gain insight into the evolutionary mechanisms of speciation. In this study, segments of the D. mauritiana third chromosome were introgressed into a D. simulans genetic background and tested as homozygotes for viability, male fertility, and female fertility. The entire third chromosome was covered with partially overlapping segments. Many segments were male sterile, while none were female sterile or lethal, confirming previous reports of the rapid evolution of hybrid male sterility (HMS). A statistical model was developed to quantify the HMS accumulation. In comparison with previous work on the X chromosome, we estimate that the X has approximately 2.5 times the density of HMS factors as the autosomes. We also estimate that the whole genome contains approximately 15 HMS "equivalents"-i.e., 15 times the minimum number of incompatibility factors necessary to cause complete sterility. Although some caveats for the quantitative estimate of a 2.5-fold density difference are described, this study supports the notion that the X chromosome plays a special role in the evolution of reproductive isolation. Possible mechanisms of a "large X" effect include selective fixation of new mutations that are recessive or partially recessive and the evolution of sex-ratio distortion systems.

Hybrid male sterility (HMS) is a rapidly evolving mechanism of reproductive isolation in Drosophila. Here we report a genetic analysis of HMS in third-chromosome segments of Drosophila mauritiana that were introgressed into a D. simulans background. Qualitative genetic mapping was used to localize 10 loci on 3R and a quantitative trait locus (QTL) procedure (multiple-interval mapping) was used to identify 19 loci on the entire chromosome. These genetic incompatibilities often show dominance and complex patterns of epistasis. Most of the HMS loci have relatively small effects and generally at least two or three of them are required to produce complete sterility. Only one small region of the third chromosome of D. mauritiana by itself causes a high level of infertility when introgressed into D. simulans. By comparison with previous studies of the X chromosome, we infer that HMS loci are only approximately 40% as dense on this autosome as they are on the X chromosome. These results are consistent with the gradual evolution of hybrid incompatibilities as a by-product of genetic divergence in allopatric populations.

Winzeler, EA, CI Castillo-Davis, G Oshiro, D Liang, DR Richards, Y Zhou, and DL Hartl. 2003. “Genetic diversity in yeast assessed with whole-genome oligonucleotide arrays.” Genetics 163: 79-89. Abstract

The availability of a complete genome sequence allows the detailed study of intraspecies variability. Here we use high-density oligonucleotide arrays to discover 11,115 single-feature polymorphisms (SFPs) existing in one or more of 14 different yeast strains. We use these SFPs to define regions of genetic identity between common laboratory strains of yeast. We assess the genome-wide distribution of genetic variation on the basis of this yeast population. We find that genome variability is biased toward the ends of chromosomes and is more likely to be found in genes with roles in fermentation or in transport. This subtelomeric bias may arise through recombination between nonhomologous sequences because full-gene deletions are more common in these regions than in more central regions of the chromosome.

Townsend, JP, KM Nielsen, DS Fisher, and DL Hartl. 2003. “Horizontal acquisition of divergent chromosomal DNA in bacteria: effects of mutator phenotypes.” Genetics 164: 13-21. Abstract

We examine the potential beneficial effects of the expanded access to environmental DNA offered by mutators on the adaptive potential of bacterial populations. Using parameters from published studies of recombination in E. coli, we find that the presence of mutators has the potential to greatly enhance bacterial population adaptation when compared to populations without mutators. In one specific example, for which three specific amino acid substitutions are required for adaptation to occur in a 300-amino-acid protein, we found a 3500-fold increase in the rate of adaptation. The probability of a beneficial acquisition decreased if more amino acid changes, or integration of longer DNA fragments, were required for adaptation. The model also predicts that mutators are more likely than nonmutator phenotypes to acquire genetic variability from a more diverged set of donor bacteria. Bacterial populations harboring mutators in a sequence heterogeneous environment are predicted to acquire most of their DNA conferring adaptation in the range of 13-30% divergence, whereas nonmutator phenotypes become adapted after recombining with more homogeneous sequences of 7-21% divergence. We conclude that mutators can accelerate bacterial adaptation when desired genetic variability is present within DNA fragments of up to approximately 30% divergence.

Maximum likelihood and Bayesian approaches are presented for analyzing hierarchical statistical models of natural selection operating on DNA polymorphism within a panmictic population. For analyzing Bayesian models, we present Markov chain Monte-Carlo (MCMC) methods for sampling from the joint posterior distribution of parameters. For frequentist analysis, an Expectation-Maximization (EM) algorithm is presented for finding the maximum likelihood estimate of the genome wide mean and variance in selection intensity among classes of mutations. The framework presented here provides an ideal setting for modeling mutations dispersed through the genome and, in particular, for the analysis of how natural selection operates on different classes of single nucleotide polymorphisms (SNPs).

Ranz, JM, AR Ponce, DL Hartl, and D Nurminsky. 2003. “Origin and evolution of a new gene expressed in the Drosophila sperm axoneme.” Genetica 118: 233-44. Abstract

Sdic is a new gene that evolved recently in the lineage of Drosophila melanogaster. It was formed from a duplication and fusion of the gene AnnX, which encodes annexin X, and Cdic, which encodes the intermediate polypeptide chain of the cytoplasmic dynein. The fusion joins AnnX exon 4 with Cdic intron 3, which brings together three putative promoter elements for testes-specific expression of Sdic: the distal conserved element (DCE) and testes-specific element (TSE) are derived from AnnX, and the proximal conserved element (PCE) from Cdic intron 3. Sdic transcription initiates within the PCE, and translation is initiated within the sequence derived from Cdic intron 3, continuing through a 10 base pair insertion that creates a new splice donor site that enables the new coding sequence derived from intron 3 to be joined with the coding sequence of Cdic exon 4. A novel protein is created lacking 100 residues at the amino end that contain sequence motifs essential for the function of cytoplasmic dynein intermediate chains. Instead, the amino end is a hydrophobic region of 16 residues that resembles the amino end of axonemal dynein intermediate chains from other organisms. The downstream portion of Sdic features large deletions eliminating Cdic exons v2 and v3, as well as multiple frameshift deletions or insertions. The new protein becomes incorporated into the tail of the mature sperm and may function as an axonemal dynein intermediate chain. The new Sdic gene is present in about 10 tandem repeats between the wildtype Cdic and AnnX genes located near the base of the X chromosome. The implications of these findings are discussed relative to the origin of new gene functions and the process of speciation.

Volkman, SK, and DL Hartl. 2003. “Parasitology. A game of cat and mouth.” Science 299: 353-4.
Townsend, JP, D Cavalieri, and DL Hartl. 2003. “Population genetic variation in genome-wide gene expression.” Mol Biol Evol 20: 955-63. Abstract

Evolutionary biologists seek to understand which traits display variation, are heritable, and influence differential reproduction, because such traits respond to natural selection and underlie organic evolution. Selection acts upon individual differences within a population. Whether individual differences within a natural population include variation in gene expression levels has not yet been addressed on a genome-wide scale. Here we use DNA microarray technology for measuring comparative gene expression and a refined statistical analysis for the purpose of comparing gene expression levels in natural isolates of the wine yeast Saccharomyces cerevisiae. A method for the Bayesian analysis of gene expression levels is used to compare four natural isolates of S. cerevisiae from Montalcino, Italy. Widespread variation in amino acid metabolism, sulfur assimilation and processing, and protein degradation-primarily consisting of differences in expression level smaller than a factor of 2-is demonstrated. Genetic variation in gene expression among isolates from a natural population is present on a genomic scale. It remains to be determined what role differential gene expression may play in adaptation to new or changing environments.

Meiklejohn, CD, J Parsch, JM Ranz, and DL Hartl. 2003. “Rapid evolution of male-biased gene expression in Drosophila.” Proc Natl Acad Sci U S A 100: 9894-9. Abstract

A number of genes associated with sexual traits and reproduction evolve at the sequence level faster than the majority of genes coding for non-sex-related traits. Whole genome analyses allow this observation to be extended beyond the limited set of genes that have been studied thus far. We use cDNA microarrays to demonstrate that this pattern holds in Drosophila for the phenotype of gene expression as well, but in one sex only. Genes that are male-biased in their expression show more variation in relative expression levels between conspecific populations and two closely related species than do female-biased genes or genes with sexually monomorphic expression patterns. Additionally, elevated ratios of interspecific expression divergence to intraspecific expression variation among male-biased genes suggest that differences in rates of evolution may be due in part to natural selection. This finding has implications for our understanding of the importance of sexual dimorphism for speciation and rates of phenotypic evolution.

Ranz, JM, CI Castillo-Davis, CD Meiklejohn, and DL Hartl. 2003. “Sex-dependent gene expression and evolution of the Drosophila transcriptome.” Science 300: 1742-5. Abstract

Comparison of the gene-expression profiles between adults of Drosophila melanogaster and Drosophila simulans has uncovered the evolution of genes that exhibit sex-dependent regulation. Approximately half the genes showed differences in expression between the species, and among these, approximately 83% involved a gain, loss, increase, decrease, or reversal of sex-biased expression. Most of the interspecific differences in messenger RNA abundance affect male-biased genes. Genes that differ in expression between the species showed functional clustering only if they were sex-biased. Our results suggest that sex-dependent selection may drive changes in expression of many of the most rapidly evolving genes in the Drosophila transcriptome.

2002

BACKGROUND: Methods of microarray analysis that suit experimentalists using the technology are vital. Many methodologies discard the quantitative results inherent in cDNA microarray comparisons or cannot be flexibly applied to multifactorial experimental design. Here we present a flexible, quantitative Bayesian framework. This framework can be used to analyze normalized microarray data acquired by any replicated experimental design in which any number of treatments, genotypes, or developmental states are studied using a continuous chain of comparisons. RESULTS: We apply this method to Saccharomyces cerevisiae microarray datasets on the transcriptional response to ethanol shock, to SNF2 and SWI1 deletion in rich and minimal media, and to wild-type and zap1 expression in media with high, medium, and low levels of zinc. The method is highly robust to missing data, and yields estimates of the magnitude of expression differences and experimental error variances on a per-gene basis. It reveals genes of interest that are differentially expressed at below the twofold level, genes with high 'fold-change' that are not statistically significantly different, and genes differentially regulated in quantitatively unanticipated ways. CONCLUSIONS: Anyone with replicated normalized cDNA microarray ratio datasets can use the freely available MacOS and Windows software, which yields increased biological insight by taking advantage of replication to discern important changes in expression level both above and below a twofold threshold. Not only does the method have utility at the moment, but also, within the Bayesian framework, there will be considerable opportunity for future development.

Bustamante, CD, R Nielsen, SA Sawyer, KM Olsen, MD Purugganan, and DL Hartl. 2002. “The cost of inbreeding in Arabidopsis.” Nature 416: 531-4. Abstract

Population geneticists have long sought to estimate the distribution of selection intensities among genes of diverse function across the genome. Only recently have DNA sequencing and analytical techniques converged to make this possible. Important advances have come from comparing genetic variation within species (polymorphism) with fixed differences between species (divergence). These approaches have been used to examine individual genes for evidence of selection. Here we use the fact that the time since species divergence allows combination of data across genes. In a comparison of amino-acid replacements among species of the mustard weed Arabidopsis with those among species of the fruitfly Drosophila, we find evidence for predominantly beneficial gene substitutions in Drosophila but predominantly detrimental substitutions in Arabidopsis. We attribute this difference to the Arabidopsis mating system of partial self-fertilization, which corroborates a prediction of population genetics theory that species with a high frequency of inbreeding are less efficient in eliminating deleterious mutations owing to their reduced effective population size.

Lohe, AR, and DL Hartl. 2002. “Efficient mobilization of mariner in vivo requires multiple internal sequences.” Genetics 160: 519-26. Abstract

Aberrant products of mariner excision that have an impaired ability to be mobilized often include internal deletions that do not encroach on either of the inverted repeats. Analysis of 13 such deletions, as well as 7 additional internal deletions obtained by various methods, has revealed at least three internal regions whose integrity is necessary for efficient mariner mobilization. Within the 1286-bp element, the essential regions are contained in the intervals bounded by coordinates 229-586, 735-765, and 939-1066, numbering in base pairs from the extreme 5' end of the element. These regions may contain sequences that are necessary for transposase binding or that are needed to maintain proper spacing between binding sites. The isolation of excision-defective elements with point mutations at nucleotide positions 993 and 161/179 supports the hypothesis of sequence requirements, but the reduced mobility of transformation vectors with insertions into the SacI site at position 790 supports the hypothesis of spacing requirements. The finding of multiple internal regions that are essential for efficient mariner mobilization in vivo contrasts with reports that mini-elements with as little as 43 bp of DNA between the inverted repeats can transpose efficiently in vitro.

Volkman, SK, DL Hartl, DF Wirth, KM Nielsen, M Choi, S Batalov, Y Zhou, et al. 2002. “Excess polymorphisms in genes for membrane proteins in Plasmodium falciparum.” Science 298: 216-8. Abstract

The detection of single-nucleotide polymorphisms in pathogenic microorganisms has normally been carried out by trial and error. Here we show that DNA hybridization with high-density oligonucleotide arrays provides rapid and convenient detection of single-nucleotide polymorphisms in Plasmodium falciparum, despite its exceptionally high adenine-thymine (AT) content (82%). A disproportionate number of polymorphisms are found in genes encoding proteins associated with the cell membrane. These genes are targets for only 22% of the oligonucleotide probes but account for 69% of the polymorphisms. Genetic variation is also enriched in subtelomeric regions, which account for 22% of the chromosome but 76% of the polymorphisms.

Pages