Townsend, JP, and DL Hartl. 2000. “The kinetics of transposable element autoregulation.” Genetica 108: 229-37. Abstract

Kinetic modeling of the self-regulatory mechanisms of transposable elements (TEs) involving interactions of one or a few gene products makes predictions that are often at odds with observed results. In particular, explanations of TE autorepression at high copy number that invoke a decrease in number of active monomers through dimerization, amyloidization, and protein-mRNA binding to create an inactive state are not supported by analysis of the corresponding kinetic models. This is also true for similar mRNA-mRNA binding models. Self-repression in mariner as well as other TEs can, however, be explained by a host-independent model in which inactive dimers compete with monomers for TE binding sites at the ends of the element. This model would also allow heterodimer poisoning to down-regulate transposition in the presence of divergent nonautonomous elements, since nondivergent monomers would be required at both TE ends for transposition.

Cavalieri, D, JP Townsend, and DL Hartl. 2000. “Manifold anomalies in gene expression in a vineyard isolate of Saccharomyces cerevisiae revealed by DNA microarray analysis.” Proc Natl Acad Sci U S A 97: 12369-74. Abstract

Genome-wide transcriptional profiling has important applications in evolutionary biology for assaying the extent of heterozygosity for alleles showing quantitative variation in gene expression in natural populations. We have used DNA microarray analysis to study the global pattern of transcription in a homothallic strain of Saccharomyces cerevisiae isolated from wine grapes in a Tuscan vineyard, along with the diploid progeny obtained after sporulation. The parental strain shows 2:2 segregation (heterozygosity) for three unlinked loci. One determines resistance to trifluoroleucine; another, resistance to copper sulfate; and the third is associated with a morphological phenotype observed as colonies with a ridged surface resembling a filigree. Global expression analysis of the progeny with the filigreed and smooth colony phenotypes revealed a greater than 2-fold difference in transcription for 378 genes (6% of the genome). A large number of the overexpressed genes function in pathways of amino acid biosynthesis (particularly methionine) and sulfur or nitrogen assimilation, whereas many of the underexpressed genes are amino acid permeases. These wholesale changes in amino acid metabolism segregate as a suite of traits resulting from a single gene or a small number of genes. We conclude that natural vineyard populations of S. cerevisiae can harbor alleles that cause massive alterations in the global patterns of gene expression. Hence, studies of expression variation in natural populations, without accompanying segregation analysis, may give a false picture of the number of segregating genes underlying the variation.

Hartl, DL. 2000. “Molecular melodies in high and low C.” Nat Rev Genet 1: 145-9. Abstract

For 50 years now, one of the enigmas of molecular evolution has been the C-value paradox, which refers to the often massive, counterintuitive and seemingly arbitrary differences in genome size observed among eukaryotic organisms. For example, the genome of the fruitfly Drosophila melanogaster is 180 megabases (Mb), whereas that of the European brown grasshopper Podisma pedestris is 18,000 Mb. The difference in genome size of a factor of 100 is difficult to explain in view of the apparently similar levels of evolutionary, developmental and behavioural complexity of these organisms.

Goldfeld, AE, JY Leung, SA Sawyer, and DL Hartl. 2000. “Post-genomics and the neutral theory: variation and conservation in the tumor necrosis factor-alpha promoter.” Gene 261: 19-25. Abstract

In the post-genomics era, molecular evolutionary geneticists have come to possess the molecular, statistical, and computational tools for estimating the relative importance of selection and random genetic drift in virtually any gene in almost any organism. We have examined single-nucleotide polymorphisms (SNPs) and nucleotide divergence across a region of approximately 1 kb in the promoter of the human tumor necrosis factor alpha (TNF-alpha) gene. TNF-alpha, which plays an important role in lymphocyte biology and in the pathogenesis of infectious and autoimmune diseases, is tightly regulated at the level of transcription through sequence-specific binding of transcription factors to cognate binding sites in a relatively small region of the 5' non-coding region of the gene. Analysis of the promoter region in 207 human chromosomes revealed nine SNPs, none of which were located in regions known to be important in transcriptional activation. Comparison with one promoter sequence in each of seven species of primates revealed 162 nucleotide sites occupied by a monomorphic nucleotide in the human sample but occupied by a different nucleotide in at least one of the primate sequences (a 'fixed human difference'). The fixed human differences were found outside the regions known to be important in transcriptional activation, and their large number suggests that they might be effectively neutral (Ns<<1). With regard to the human SNPs, although the hypothesis Ns approximately 0 cannot be rejected, the sample configurations suggest that the substitutions might be mildly deleterious. We emphasize the analytical insight to be gained from interspecific comparisons: through the interspecific comparisons, 3.1% of the total sequence information yielded 94.7% of the variable nucleotides. This combined approach, using interspecific comparisons and human polymorphism together with data from functional analyses, provides valuable insights into the evolutionary history and regulation of a key gene in the human immune response.

Petrov, DA, and DL Hartl. 2000. “Pseudogene evolution and natural selection for a compact genome.” J Hered 91: 221-7. Abstract

Pseudogenes are nonfunctional copies of protein-coding genes that are presumed to evolve without selective constraints on their coding function. They are of considerable utility in evolutionary genetics because, in the absence of selection, different types of mutations in pseudogenes should have equal probabilities of fixation. This theoretical inference justifies the estimation of patterns of spontaneous mutation from the analysis of patterns of substitutions in pseudogenes. Although it is possible to test whether pseudogene sequences evolve without constraints for their protein-coding function, it is much more difficult to ascertain whether pseudogenes may affect fitness in ways unrelated to their nucleotide sequence. Consider the possibility that a pseudogene affects fitness merely by increasing genome size. If a larger genome is deleterious--for example, because of increased energetic costs associated with genome replication and maintenance--then deletions, which decrease the length of a pseudogene, should be selectively advantageous relative to insertions or nucleotide substitutions. In this article we examine the implications of selection for genome size relative to small (1-400 bp) deletions, in light of empirical evidence pertaining to the size distribution of deletions observed in Drosophila and mammalian pseudogenes. There is a large difference in the deletion spectra between these organisms. We argue that this difference cannot easily be attributed to selection for overall genome size, since the magnitude of selection is unlikely to be strong enough to significantly affect the probability of fixation of small deletions in Drosophila.

Lohe, AR, C Timmons, I Beerman, ER Lozovskaya, and DL Hartl. 2000. “Self-inflicted wounds, template-directed gap repair and a recombination hotspot. Effects of the mariner transposase.” Genetics 154: 647-56. Abstract

Aberrant repair products of mariner transposition occur at a frequency of approximately 1/500 per target element per generation. Among 100 such mutations in the nonautonomous element peach, most had aberrations in the 5' end of peach (40 alleles), in the 3' end of peach (11 alleles), or a deletion of peach with or without deletion of flanking genomic DNA (29 alleles). Most mariner mutations can be explained by exonuclease "nibble" and host-mediated repair of the double-stranded gap created by the transposase, in contrast to analogous mutations in the P element. In mariner, mutations in the 5' inverted repeat are smaller and more frequent than those in the 3' inverted repeat, but secondary mutations in target elements with a 5' lesion usually had 3' lesions resembling those normally found at the 5' end. We suggest that the mariner transposase distinguishes between the 5' and 3' ends of the element, and that the 5' end is relatively more protected after strand scission. We also find: (1) that homolog-dependent gap repair is a frequent accompaniment to mariner excision, estimated as 30% of all excision events; and (2) that mariner is a hotspot of recombination in Drosophila females, but only in the presence of functional transposase.

Bustamante, CD, JP Townsend, and DL Hartl. 2000. “Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica.” Mol Biol Evol 17: 301-8. Abstract

The neutral theory of molecular evolution predicts that variation within species is inversely related to the strength of purifying selection, but the strength of purifying selection itself must be related to physical constraints imposed by protein folding and function. In this paper, we analyzed five enzymes for which polymorphic sequence variation within Escherichia coli and/or Salmonella enterica was available, along with a protein structure. Single and multivariate logistic regression models are presented that evaluate amino acid size, physicochemical properties, solvent accessibility, and secondary structure as predictors of polymorphism. A model that contains a positive coefficient of association between polymorphism and solvent accessibility and separate intercepts for each secondary-structure element is sufficient to explain the observed variation in polymorphism between sites. The model predicts an increase in the probability of amino acid polymorphism with increasing solvent accessibility for each protein regardless of physicochemical properties, secondary-structure element, or size of the amino acid. This result, when compared with the distribution of synonymous polymorphism, which shows no association with solvent accessibility, suggests a strong decrease in purifying selection with increasing solvent accessibility.


The type 1 pilin encoded by fim is present in both Escherichia coli and Salmonella natural isolates, but several lines of evidence indicate that similarities at the fim locus may be an example of independent acquisition rather than common ancestry. For example, the fim gene cluster is found at different chromosomal locations and with distinct gene orders in these closely related species. In this work we examined the fim gene cluster of Salmonella, the genes of which show high nucleotide sequence divergence from their E. coli counterparts, as well as a different G+C content and codon usage. DNA hybridization analysis revealed that, among the salmonellae, the fim gene cluster is present in all isolates of S. enterica but is absent from S. bongori. Molecular phylogenetic analyses of the fimA and fimI genes yield an estimate of phylogeny that is in satisfactory congruence with housekeeping and other virulence genes examined in this species. In contrast, phylogenetic analyses of the fimZ, fimY, and fimW genes indicate that horizontal transfer of this region has occurred more than once. There is also size variation in the fimZ, fimY, and fimW intergenic regions in the 3' region, and these genes are absent in isolate S2983 of subspecies IIIa. Interestingly, the G+C contents of the fimZ, fimY, and fimW genes are less than 46%, which is considerably lower than those of the other six genes of the fim cluster. This study demonstrates that horizontal transmission of all or part of the same gene cluster can occur repeatedly, with the result that different regions of a single gene cluster may have different evolutionary histories.

Lozovskaya, ER, DI Nurminsky, DA Petrov, and DL Hartl. 1999. “Genome size as a mutation-selection-drift process.” Genes Genet Syst 74: 201-7. Abstract

A novel method for estimating neutral rates and patterns of DNA evolution in Drosophila takes advantage of the propensity of non-LTR retrotransposable elements to create nonfunctional, transpositionally inactive copies as a product of transposition. For many LINE elements, most copies present in a genome at any one time are nonfunctional "dead-on-arrival" (DOA) copies. Because these are off-shoots of active, transpositionally competent "master" lineages, in a gene tree of a LINE element from multiple samples from related species, the DOA lineages are expected to map to the terminal branches and the active lineages to the internal branches, the primary exceptions being when the sample includes DOA copies that are allelic or orthologous. Analysis of nucleotide substitutions and other changes along the terminal branches therefore allows estimation of the fixation process in the DOA copies, which are unconstrained with respect to protein coding; and under selective neutrality, the fixation process estimates the underlying mutational pattern. We have studied the retroelement Helena in Drosophila. An unexpectedly high rate of DNA loss was observed, yielding a half-life of unconstrained DNA sequences approximately 60-fold faster in Drosophila than in mammals. The high rate of DNA loss suggests a straightforward explanation of the seeming paradox that Drosophila has many fewer pseudogenes than found in mammalian species. Differential rates of deletion in different taxa might also contribute to the celebrated C-value paradox of why some closely related organisms can have very different DNA contents. New data presented here rule out the possibility that the transposition process itself is highly mutagenic, hence the observed linear relation between number of deletions and number of nucleotide substitutions is most easily explained by the hypothesis that both types of changes accumulate in unconstrained sequences over time.

The preference of Drosophila females to lay eggs on substrates that do or do not contain alcohol is an excellent system to study the evolutionary genetics of behavior, because (1) there is variation in this behavior within and among species, (2) the behavior is amenable to laboratory investigation, and (3) the behavior presumably has a direct relationship to reproductive fitness. Moreover, a key genetic component of the system, the Alcohol dehydrogenase (Adh) locus, is arguably the most well characterized gene known. However, because the Adh gene and its genetic background are inseparable in reproductively isolated species, it is difficult to establish its role in behavioral divergence. By transgene coplacement, we created pairs of strains of D. melanogaster expressing an Adh allele from either D. melanogaster or D. affinidisjuncta, a Hawaiian species with very low levels of ADH in adults. When raised on ethanol-containing medium, the affinidisjuncta-Adh strains experience high mortality relative to the melanogaster-Adh strains. However, affinidisjuncta-Adh females show the same preference for oviposition on ethanol-containing medium as melanogaster-Adh females. Thus, preference for ethanol in these strains is not determined primarily by Adh genotype.

Petrov, DA, and DL Hartl. 1999. “Patterns of nucleotide substitution in Drosophila and mammalian genomes.” Proc Natl Acad Sci U S A 96: 1475-9. Abstract

To estimate patterns of molecular evolution of unconstrained DNA sequences, we used maximum parsimony to separate phylogenetic trees of a non-long terminal repeat retrotransposable element into either internal branches, representing mainly the constrained evolution of active lineages, or into terminal branches, representing mainly nonfunctional "dead-on-arrival" copies that are unconstrained by selection and evolve as pseudogenes. The pattern of nucleotide substitutions in unconstrained sequences is expected to be congruent with the pattern of point mutation. We examined the retrotransposon Helena in the Drosophila virilis species group (subgenus Drosophila) and the Drosophila melanogaster species subgroup (subgenus Sophophora). The patterns of point mutation are indistinguishable, suggesting considerable stability over evolutionary time (40-60 million years). The relative frequencies of different point mutations are unequal, but the "transition bias" results largely from an approximately 2-fold excess of G.C to A.T substitutions. Spontaneous mutation is biased toward A.T base pairs, with an expected mutational equilibrium of approximately 65% A + T (quite similar to that of long introns). These data also enable the first detailed comparison of patterns of point mutations in Drosophila and mammals. Although the patterns are different, all of the statistical significance comes from a much greater rate of G.C to A.T substitution in mammals, probably because of methylated cytosine "hotspots." When the G.C to A.T substitutions are discounted, the remaining differences are considerably reduced and not statistically significant.

De Aguiar, D, and DL Hartl. 1999. “Regulatory potential of nonautonomous mariner elements and subfamily crosstalk.” Genetica 107: 79-85. Abstract

Two naturally occurring nonautonomous mariner elements were tested in vivo for their ability to down-regulate excision of a target element in the presence of functional mariner transposase. The tested elements were the peach element isolated from Drosophila mauritiana, which encodes a transposase that differs from the autonomous element Mos1 in four amino acid replacements, and the DTBZ1 element isolated from D. teissieri, which encodes a truncated protein consisting of the first 132 residues at the amino end of the normally 345-residue transposase. We provide evidence that the protein from the peach element does interact to down-regulate wildtype transposase, indicating that at least some nonautonomous elements in natural populations that retain their open reading frame may play a regulatory role. In contrast, our tests reveal at most a weak interaction between transposase from the autonomous Mos1 element and the truncated protein from DTBZ1, and none between Mos1 transposase and that from the distantly related mariner-like element Himar1 identified in the horn fly Haematobia irritans. Hence, the extent of regulatory crosstalk between mariner-like elements may be limited to closely related ones. The evolutionary implications of these results are discussed.


We studied the ancestry of virulence-associated genes in Escherichia coli by examining chromosomal regions specific to pathogenic isolates. The four virulence determinants examined were the alpha-hemolysin (hly) loci hlyI and hlyII, the type II capsule gene cluster kps, and the P (pap) and S (sfa) fimbria gene clusters. All four loci were shown previously to be associated with pathogenicity islands of uropathogenic E. coli isolates. The hly, kps, sfa, and pap regions each have an unexpected clustered distribution among the E. coli collection of reference (ECOR) strains, but all these regions were absent from a collection of diarrheagenic E. coli isolates. Strains in the ECOR subgroup B2 typically had a combination of at least three of the four loci, and all strains in subgroup D had a copy of the kps and pap clusters. In contrast, only four strains in subgroup A had either hly, kps, sfa, or pap, and no subgroup A strains had all four together. Strains of subgroup B1 were devoid of all four virulence regions, with the exception of one isolate that had a copy of the sfa gene cluster. This phylogenetic distribution of strain-specific sequences corresponds to the ECOR groups with the largest genome size, namely, B2 and D. We propose that the pathogenicity islands are ancestral to subgroups B2 and D and were acquired after speciation, with subsequent horizontal transfer into some group A, B1, and E lineages. These results suggest that the hly, kps, sfa, and pap pathogenicity determinants may play a role in the evolution of enteric bacteria quite apart from, and perhaps with precedence over, their ability to cause disease.

Nurminsky, DI, MV Nurminskaya, EV Benevolenskaya, YY Shevelyov, DL Hartl, and VA Gvozdev. 1998. “Cytoplasmic dynein intermediate-chain isoforms with different targeting properties created by tissue-specific alternative splicing.” Mol Cell Biol 18: 6816-25. Abstract

The intermediate chains (ICs) are the subunits of the cytoplasmic dynein that provide binding of the complex to cargo organelles through interaction of their N termini with dynactin. We present evidence that in Drosophila, the IC subunits are represented by at least 10 structural isoforms, created by the alternative splicing of transcripts from a unique Cdic gene. The splicing pattern is tissue specific. A constitutive set of four IC isoforms is expressed in all tissues tested; in addition, tissue-specific isoforms are found in the ovaries and nervous tissue. The structural variations between isoforms are limited to the N terminus of the IC molecule, where the interaction with dynactin takes place. This suggests differences in the dynactin-mediated organelle binding by IC isoforms. Accordingly, when transiently expressed in Drosophila Schneider-3 cells, the IC isoforms differ in their intracellular targeting properties from each other. A mechanism is proposed for the regulation of dynein binding to organelles through the changes in the content of the IC isoform pool.

Fimbriae or pili are essential adherence factors usually found in pathogenic bacteria to aid colonization of host cells. Three major structural pilin genes, fimA, sfaA, and papA, from Escherichia coli natural isolates were examined and nucleotide sequence data revealed elevated levels of both synonymous and nonsynonymous site variation at these loci. Examination of synonymous site variation shows a fivefold increase in fimA sites, relative to the housekeeping gene mdh; and similarly the sfaA and papA genes have increased synonymous sites variation relative to fimA. Nonsynonymous site variation is also elevated at all three loci but, in particular, at the papA locus (kN = 0.44). The kN/kS ratio for the three genes are among the highest yet reported for E. coli genes. Regional variation in nucleotide polymorphism within each of the genes reveal hypervariable segments where nonsynonymous substitutions exceed synonymous substitutions. We propose that at the fimA, papA, and sfaA genes, diversifying selection has brought about the increase levels of polymorphism.

Siegal, ML, and DL Hartl. 1998. “An experimental test for lineage-specific position effects on alcohol dehydrogenase (Adh) genes in Drosophila.” Proc Natl Acad Sci U S A 95: 15513-8. Abstract

Independent transgene insertions differ in expression based on their location in the genome; these position effects are of interest because they reflect the influence of genome organization on gene regulation. Position effects also represent potentially insurmountable obstacles to the rigorous functional comparison of homologous genes from different species because (i) quantitative variation in expression of each gene across genomic positions (generalized position effects, or GPEs) may overwhelm differences between the genes of interest, or (ii) divergent genes may be differentially sensitive to position effects, reflecting unique interactions between each gene and its genomic milieu (lineage-specific position effects, or LSPEs). We have investigated both types of position-effect variation by applying our method of transgene coplacement, which allows comparisons of transgenes in the same position in the genome of Drosophila melanogaster. Here we report an experimental test for LSPE in Drosophila. The alcohol dehydrogenase (Adh) genes of D. melanogaster and Drosophila affinidisjuncta differ in both tissue distribution and amounts of ADH activity. Despite this striking regulatory divergence, we found a very high correlation in overall ADH activity between the genes of the two species when placed in the same genomic position as assayed in otherwise Adh-null adults and larvae. These results argue against the influence of LSPE for these sequences, although the effects of GPE are significant. Our new findings validate the coplacement approach and show that it greatly magnifies the power to detect differences in expression between transgenes. Transgene coplacement thus dramatically extends the range of functional and evolutionary questions that can be addressed by transgenic technology.

Vieira, J, CP Vieira, DL Hartl, and ER Lozovskaya. 1998. “Factors contributing to the hybrid dysgenesis syndrome in Drosophila virilis.” Genet Res 71: 109-17. Abstract

A hybrid dysgenesis syndrome in Drosophila virilis is associated with the mobilization of at least four unrelated transposable elements designated Helena, Paris, Penelope and Ulysses. We carried out 42 crosses between eight strains differing in transposable element copy number in order to assess their contributions to hybrid dysgenesis. Linear regression and stepwise regression analysis was performed to estimate the correlation between the difference in euchromatic transposable element number between the parental flies of different strains involved in the crosses and the percentage, in the progeny of these crosses, of males with atrophic gonads. Male gonadal atrophy is a typical manifestation of the D. virilis hybrid dysgenesis syndrome. About half the variability in the level of male gonadal atrophy can be attributed to Penelope and Paris/Helena. Other factors also seem to play a significant role in hybrid dysgenesis in D. virilis, including maternally transmitted host factors and/or uncontrolled environmental variation. In the course of this work a novel transposable element named Telemac was found. Telemac is also mobilized in hybrid dysgenesis but does not appear to play a major causative role.

Moriyama, EN, DA Petrov, and DL Hartl. 1998. “Genome size and intron size in Drosophila.” Mol Biol Evol 15: 770-3.
Petrov, DA, and DL Hartl. 1998. “High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups.” Mol Biol Evol 15: 293-302. Abstract

We recently proposed that patterns of evolution of non-LTR retrotransposable elements can be used to study patterns of spontaneous mutation. Transposition of non-LTR retrotransposable elements commonly results in creation of 5' truncated, "dead-on-arrival" copies. These inactive copies are effectively pseudogenes and, according to the neutral theory, their molecular evolution ought to reflect rates and patterns of spontaneous mutation. Maximum parsimony can be used to separate the evolution of active lineages of a non-LTR element from the fate of the "dead-on-arrival" insertions and to directly assess the relative frequencies of different types of spontaneous mutations. We applied this approach using a non-LTR element, Helena, in the Drosophila virilis group and have demonstrated a surprisingly high incidence of large deletions and the virtual absence of insertions. Based on these results, we suggested that Drosophila in general may exhibit a high rate of spontaneous large deletions and have hypothesized that such a high rate of DNA loss may help to explain the puzzling dearth of bona fide pseudogenes in Drosophila. We also speculated that variation in the rate of spontaneous deletion may contribute to the divergence of genome size in different taxa by affecting the amount of superfluous "junk" DNA such as, for example, pseudogenes or long introns. In this paper, we extend our analysis to the D. melanogaster subgroup, which last shared a common ancestor with the D. virilis group approximately 40 MYA. In a different region of the same transposable element, Helena, we demonstrate that inactive copies accumulate deletions in species of the D. melanogaster subgroup at a rate very similar to that of the D. virilis group. These results strongly suggest that the high rate of DNA loss is a general feature of Drosophila and not a peculiar property of a particular stretch of DNA in a particular species group.

Petrov, DA, YC Chao, EC Stephenson, and DL Hartl. 1998. “Pseudogene evolution in Drosophila suggests a high rate of DNA loss.” Mol Biol Evol 15: 1562-7.