Historically, duplicate genes have been regarded as a major source of novel genetic material. However, recent work suggests that chimeric genes formed through the fusion of pieces of different genes may also contribute to the evolution of novel functions. To compare the contribution of chimeric and duplicate genes to genome evolution, we measured their prevalence and persistence within Drosophila melanogaster. We find that approximately 80.4 duplicates form per million years, but most are rapidly eliminated from the genome, leaving only 4.1% to be preserved by natural selection. Chimeras form at a comparatively modest rate of approximately 11.4 per million years but follow a similar pattern of decay, with ultimately only 1.4% of chimeras preserved. We propose two mechanisms of chimeric gene formation, which rely entirely on local, DNA-based mutations to explain the structure and placement of the youngest chimeric genes observed. One involves imprecise excision of an unpaired duplication during large-loop mismatch repair, while the other invokes a process akin to replication slippage to form a chimeric gene in a single event. Our results paint a dynamic picture of both chimeras and duplicate genes within the genome and suggest that chimeric genes contribute substantially to genomic novelty.
In the war against Plasmodium, humans have evolved to eliminate or modify proteins on the erythrocyte surface that serve as receptors for parasite invasion, such as the Duffy blood group, a receptor for Plasmodium vivax, and the Gerbich-negative modification of glycophorin C for Plasmodium falciparum. In turn, the parasite counters with expansion and diversification of ligand families. The high degree of polymorphism in glycophorin B found in malaria-endemic regions suggests that it also may be a receptor for Plasmodium, but, to date, none has been identified. We provide evidence from erythrocyte-binding that glycophorin B is a receptor for the P. falciparum protein EBL-1, a member of the Duffy-binding-like erythrocyte-binding protein (DBL-EBP) receptor family. The erythrocyte-binding domain, region 2 of EBL-1, expressed on CHO-K1 cells, bound glycophorin B(+) but not glycophorin B-null erythrocytes. In addition, glycophorin B(+) but not glycophorin B-null erythrocytes adsorbed native EBL-1 from the P. falciparum culture supernatants. Interestingly, the Efe pygmies of the Ituri forest in the Democratic Republic of the Congo have the highest gene frequency of glycophorin B-null in the world, raising the possibility that the DBL-EBP family may have expanded in response to the high frequency of glycophorin B-null in the population.
It is generally assumed that stabilizing selection promoting a phenotypic optimum acts to shape variation in quantitative traits across individuals and species. Although gene expression represents an intensively studied molecular phenotype, the extent to which stabilizing selection limits divergence in gene expression remains contentious. In this study, we present a theoretical framework for the study of stabilizing and directional selection using data from between-species divergence of continuous traits. This framework, based upon Brownian motion, is analytically tractable and can be used in maximum-likelihood or Bayesian parameter estimation. We apply this model to gene-expression levels in 7 species of Drosophila, and find that gene-expression divergence is substantially curtailed by stabilizing selection. However, we estimate the selective effect, s, of gene-expression change to be very small, approximately equal to Ns for a change of one standard deviation, where N is the effective population size. These findings highlight the power of natural selection to shape phenotype, even when the fitness effects of mutations are in the nearly neutral range.
Short-read sequencing techniques provide the opportunity to capture genome-wide sequence data in a single experiment. A current challenge is to identify questions that shallow-depth genomic data can address successfully and to develop corresponding analytical methods that are statistically sound. Here, we apply the Roche/454 platform to survey natural variation in strains of Drosophila melanogaster from an African (n = 3) and a North American (n = 6) population. Reads were aligned to the reference D. melanogaster genomic assembly, single nucleotide polymorphisms were identified, and nucleotide variation was quantified genome wide. Simulations and empirical results suggest that nucleotide diversity can be accurately estimated from sparse data with as little as 0.2x coverage per line. The unbiased genomic sampling provided by random short-read sequencing also allows insight into distributions of transposable elements and copy number polymorphisms found within populations and demonstrates that short-read sequencing methods provide an efficient means to quantify variation in genome organization and content. Continued development of methods for statistical inference of shallow-depth genome-wide sequencing data will allow such sparse, partial data sets to become the norm in the emerging field of population genomics.
The spread of high-level pyrimethamine resistance in Africa threatens to curtail the therapeutic lifetime of antifolate antimalarials. We studied the possible evolutionary pathways in the evolution of pyrimethamine resistance using an approach in which all possible mutational intermediates were created by site-directed mutagenesis and assayed for their level of drug resistance. The coding sequence for dihydrofolate reductase (DHFR) from the malaria parasite Plasmodium falciparum was mutagenized, and tests were carried out in Escherichia coli under conditions in which the endogenous bacterial enzyme was selectively inhibited. We studied 4 key amino acid replacements implicated in pyrimethamine resistance: N51I, C59R, S108N, and I164L. Using empirical estimates of the mutational spectrum in P. falciparum and probabilities of fixation based on the relative levels of resistance, we found that the predicted favored pathways of drug resistance are consistent with those reported in previous kinetic studies, as well as DHFR polymorphisms observed in natural populations. We found that 3 pathways account for nearly 90% of the simulated realizations of the evolution of pyrimethamine resistance. The most frequent pathway (S108N and then C59R, N51I, and I164L) accounts for more than half of the simulated realizations. Our results also suggest an explanation for why I164L is detected in Southeast Asia and South America, but not at significant frequencies in Africa.
Understanding the molecular details of the sequence of events in multistep evolutionary pathways can reveal the extent to which natural selection exploits regulatory mutations affecting expression, amino acid replacements affecting the active site, amino acid replacements affecting protein folding or stability, or variations affecting gene copy number. In experimentally exploring the adaptive landscape of the evolution of resistance to beta-lactam antibiotics in enteric bacteria, we noted that a regulatory mutation that increases beta-lactamase expression by about 2-fold has a very strong tendency to be fixed at or near the end of the evolutionary pathway. This pattern contrasts with previous experiments selecting for the utilization of novel substrates, in which regulatory mutations that increase expression are often fixed early in the process. To understand the basis of the difference, we carried out experiments in which the expression of beta-lactamase was under the control of a tunable arabinose promoter. We find that the fitness effect of an increase in gene expression is highly dependent on the catalytic activity of the coding sequence. An increase in expression of an inefficient enzyme has a negligible effect on drug resistance; however, the effect of an increase in expression of an efficient enzyme is very large. The contrast in the temporal incorporation of regulatory mutants between antibiotic resistance and the utilization of novel substrates is related to the nature of the function that relates enzyme activity to fitness. A mathematical model of beta-lactam resistance is examined in detail and shown to be consistent with the observed results.
Gene-expression variation in natural populations is widespread, and its phenotypic effects can be acted upon by natural selection. Only a few naturally segregating genetic differences associated with expression variation have been identified at the molecular level. We have identified a single nucleotide insertion in a vineyard isolate of Saccharomyces cerevisiae that has cascading effects through the gene-expression network. This allele is responsible for about 45% (103/230) of the genes that show differential gene expression among the homozygous diploid progeny produced by a vineyard isolate. Using isogenic laboratory strains, we confirm that this allele causes dramatic differences in gene-expression levels of key genes involved in amino acid biosynthesis. The mutation is a frameshift mutation in a mononucleotide run of eight consecutive T's in the coding region of the gene SSY1, which encodes a key component of a plasma-membrane sensor of extracellular amino acids. The potentially high rate of replication slippage of this mononucleotide repeat, combined with its relatively mild effects on growth rate in heterozygous genotypes, is sufficient to account for the persistence of this phenotype at low frequencies in natural populations.
Gene expression levels appear to be under pervasive stabilizing selection. Yet the genetic architecture underlying abundant gene expression diversity within and between populations remains elusive. Here, we investigated the role of dominance in the segregation of cis- and trans-regulation within and between populations. We used chromosome substitution lines of Drosophila melanogaster to show that (i) >70% of the genes that are differentially expressed between two homozygous lines are masked in the heterozygous, suggesting that one of the substituted chromosomes contains a recessive allele; (ii) such large masking is already obtained with heterozygous chromosomes originating from the same population, with the time of divergence between chromosomes in heterozygous lines making only a small but significant contribution to the masking of variation observed in homozygous lines; (iii) variation in gene expression due to trans-regulation is biased toward greater deviations from additivity because of recessive and dominant alleles, whereas variation due to cis-regulation shows higher additivity; and (iv) genetic divergence between second chromosomes is associated with increased cis-regulation, whereas the level of trans-regulation shows little increase over the time scale studied. Our results indicate that cis-acting alleles may be preferentially fixed by positive natural selection because of their higher additivity, and that the disruption of gene expression by recessive variation with pervasive trans-effects may be important for understanding gene expression variation within populations. We suggest that widespread regulatory effects of recessive low-frequency homozygous variation may provide a general mechanism mediating disease phenotypes and the genetic load of natural populations.
Patterns of polymorphism and divergence in Drosophila protein-coding genes suggest that a considerable fraction of amino acid differences between species can be attributed to positive selection and that genes with sex-biased expression, that is, those expressed predominantly in one sex, have especially high rates of adaptive evolution. Previous studies, however, have been restricted to autosomal sex-biased genes and, thus, do not provide a complete picture of the evolutionary forces acting on sex-biased genes across the genome. To determine the effects of X-linkage on sex-biased gene evolution, we surveyed DNA sequence polymorphism and divergence in 45 X-linked genes, including 17 with male-biased expression, 13 with female-biased expression, and 15 with equal expression in the 2 sexes. Using both single- and multilocus tests for selection, we found evidence for adaptive evolution in both groups of sex-biased genes. The signal of adaptive evolution was particularly strong for X-linked male-biased genes. A comparison with data from 91 autosomal genes revealed a "fast-X" effect, in which the rate of adaptive evolution was greater for X-linked than for autosomal genes. This effect was strongest for male-biased genes but could be seen in the other groups as well. A genome-wide analysis of coding sequence divergence that accounted for sex-biased expression also uncovered a fast-X effect for male-biased and unbiased genes, suggesting that recessive beneficial mutations play an important role in adaptation.
The extensive sequence variation in most surface antigens of Plasmodium falciparum is one of the major factors why clinical immunity to malaria develops only after repeated infections with the same species over several years. For some P. falciparum surface antigens, all observed alleles clearly fall into two allelic classes, with divergence between classes dwarfing divergence within classes. We discuss the ways in which such allelic dimorphism deviates from the expected shape of the genealogy of genes under either neutral evolution or standard balancing selection, and present a simple test, based on coalescent theory, to detect this deviation in samples of DNA sequences. We review previous hypotheses for the origin and evolution of allelic dimorphism in malarial antigens and discuss the difficulties of explaining the available data under these proposals. We conclude by offering several possible classes of explanations for allelic dimorphism, which are worthy of further theoretical and empirical exploration.
The population structure of Plasmodium vivax remains elusive. The markers of choice for large-scale population genetic studies of eukaryotes, short tandem repeats known as microsatellites, have been recently reported to be less polymorphic in P. vivax. Here we investigate the microsatellite diversity and geographic structure in P. vivax, at both local and global levels, using 14 new markers consisting of tri- or tetranucleotide repeats. The local-level analysis, which involved 50 field isolates from Sri Lanka, revealed unexpectedly high diversity (average virtual heterozygosity [H(E)], 0.807) and significant multilocus linkage disequilibrium in this region of low malaria endemicity. Multiple-clone infections occurred in 60% of isolates sampled in 2005. The global-level analysis of field isolates or monkey-adapted strains identified 150 unique haplotypes among 164 parasites from four continents. Individual P. vivax isolates could not be unambiguously assigned to geographic populations. For example, we found relatively low divergence among parasites from Central America, Africa, Southeast Asia and Oceania, but substantial differentiation between parasites from the same continent (South Asia and Southeast Asia) or even from the same country (Brazil). Parasite relapses, which may extend the duration of P. vivax carriage in humans, are suggested to facilitate the spread of strains across continents, breaking down any pre-existing geographic structure.
BACKGROUND: The malaria parasite Plasmodium falciparum exhibits abundant genetic diversity, and this diversity is key to its success as a pathogen. Previous efforts to study genetic diversity in P. falciparum have begun to elucidate the demographic history of the species, as well as patterns of population structure and patterns of linkage disequilibrium within its genome. Such studies will be greatly enhanced by new genomic tools and recent large-scale efforts to map genomic variation. To that end, we have developed a high throughput single nucleotide polymorphism (SNP) genotyping platform for P. falciparum. RESULTS: Using an Affymetrix 3,000 SNP assay array, we found roughly half the assays (1,638) yielded high quality, 100% accurate genotyping calls for both major and minor SNP alleles. Genotype data from 76 global isolates confirm significant genetic differentiation among continental populations and varying levels of SNP diversity and linkage disequilibrium according to geographic location and local epidemiological factors. We further discovered that nonsynonymous and silent (synonymous or noncoding) SNPs differ with respect to within-population diversity, inter-population differentiation, and the degree to which allele frequencies are correlated between populations. CONCLUSIONS: The distinct population profile of nonsynonymous variants indicates that natural selection has a significant influence on genomic diversity in P. falciparum, and that many of these changes may reflect functional variants deserving of follow-up study. Our analysis demonstrates the potential for new high-throughput genotyping technologies to enhance studies of population structure, natural selection, and ultimately enable genome-wide association studies in P. falciparum to find genes underlying key phenotypic traits.
The mutation process ultimately defines the genetic features of all populations and, hence, has a bearing on a wide range of issues involving evolutionary genetics, inheritance, and genetic disorders, including the predisposition to cancer. Nevertheless, formidable technical barriers have constrained our understanding of the rate at which mutations arise and the molecular spectrum of their effects. Here, we report on the use of complete-genome sequencing in the characterization of spontaneously arising mutations in the yeast Saccharomyces cerevisiae. Our results confirm some findings previously obtained by indirect methods but also yield numerous unexpected findings, in particular a very high rate of point mutation and skewed distribution of base-substitution types in the mitochondrion, a very high rate of segmental duplication and deletion in the nuclear genome, and substantial deviations in the mutational profile among various model organisms.
Simple models of molecular evolution assume that sequences evolve by a Poisson process in which nucleotide or amino acid substitutions occur as rare independent events. In these models, the expected ratio of the variance to the mean of substitution counts equals 1, and substitution processes with a ratio greater than 1 are called overdispersed. Comparing the genomes of 10 closely related species of Drosophila, we extend earlier evidence for overdispersion in amino acid replacements as well as in four-fold synonymous substitutions. The observed deviation from the Poisson expectation can be described as a linear function of the rate at which substitutions occur on a phylogeny, which implies that deviations from the Poisson expectation arise from gene-specific temporal variation in substitution rates. Amino acid sequences show greater temporal variation in substitution rates than do four-fold synonymous sequences. Our findings provide a general phenomenological framework for understanding overdispersion in the molecular clock. Also, the presence of substantial variation in gene-specific substitution rates has broad implications for work in phylogeny reconstruction and evolutionary rate estimation.
Although protein evolution can be approximated as a "molecular evolutionary clock," it is well known that sequence change departs from a clock-like Poisson expectation. Through studying the deviations from a molecular clock, insight can be gained into the forces shaping evolution at the level of proteins. Generally, substitution patterns that show greater variance than the Poisson expectation are said to be "overdispersed." Overdispersion of sequence change may result from temporal variation in the rate at which amino acid substitutions occur on a phylogeny. By comparing the genomes of four species of yeast, five species of Drosophila, and five species of mammals, we show that the extent of overdispersion shows a strong negative correlation with the effective population size of these organisms. Yeast proteins show very little overdispersion, while mammalian proteins show substantial overdispersion. Additionally, X-linked genes, which have reduced effective population size, have gene products that show increased overdispersion in both Drosophila and mammals. Our research suggests that mutational robustness is more pervasive in organisms with large population sizes and that robustness acts to stabilize the molecular evolutionary clock of sequence change.
Many surface antigens of the human malaria parasite Plasmodium falciparum show extraordinary diversity, with different alleles being so divergent as to be unalignable in some coding regions. To better understand the population history and modes of selection on such loci, we sequenced genomic regions flanking the highly polymorphic genes merozoite surface protein-1, merozoite surface protein-2, and circumsporozoite protein, from reference isolates of P. falciparum. Diversity was much lower in genomic flanking regions than in the coding sequences. Average pairwise nucleotide diversity for these regions was 0.00088, similar to other genomic regions not thought to be evolving under balancing selection, suggesting against balancing selection acting on promoter regions of these genes. Most observed polymorphisms were singletons. A higher ratio of SNPs to indels than previously reported for P. falciparum was observed. An 11 bp repeat upstream of msp2 showed an intriguing pattern of polymorphism possibly suggestive of purifying selection on total allele length.
The paucity of polymorphisms in single-copy genes on the Y chromosome of Drosophila contrasts with data indicating that this chromosome has polymorphic phenotypic effects on sex ratio, temperature sensitivity, behavior, and fitness. We show that the Y chromosome of D. melanogaster harbors substantial genetic diversity in the form of polymorphisms for genetic elements that differentially affect the expression of hundreds of X-linked and autosomal genes. The affected genes are more highly expressed in males, more meagerly expressed in females, and more highly divergent between species. Functionally, they affect microtubule stability, lipid and mitochondrial metabolism, and the thermal sensitivity of spermatogenesis. Our findings provide a mechanism for adaptive phenotypic variation associated with the Y chromosome.
During its red blood cell stage, the malaria parasite Plasmodium falciparum can switch its variant surface proteins (P. falciparum erythrocyte membrane protein 1) to evade the host immune response. The var gene family encodes P. falciparum erythrocyte membrane protein 1, different versions of which have unique binding specificities to various human endothelial surface molecules. Individual parasites each contain approximately 60 var genes at various locations within their chromosomes; however, parasite isolates contain different complements of var genes, thus, the gene family is enormous with a virtually unlimited number of members. A single var gene is expressed by each parasite in a mutually exclusive manner. We report that control of var gene transcription and antigenic variation is associated with a chromatin memory that includes methylation of histone H3 at lysine K9 as an epigenetic mark. We also discuss how gene transcription memory may affect the mechanism of pathogenesis and immune evasion.
Identifying the properties of gene networks that influence their evolution is a fundamental research goal. However, modes of evolution cannot be inferred solely from the distribution of natural variation, because selection interacts with demography and mutation rates to shape polymorphism and divergence. We estimated the effects of naturally occurring mutations on gene expression while minimizing the effect of natural selection. We demonstrate that sensitivity of gene expression to mutations increases with both increasing trans-mutational target size and the presence of a TATA box. Genes with greater sensitivity to mutations are also more sensitive to systematic environmental perturbations and stochastic noise. These results provide a mechanistic basis for gene expression evolvability that can serve as a foundation for realistic models of regulatory evolution.
In interspecific hybrids, novel phenotypes often emerge from the interaction of two divergent genomes. Interactions between the two transcriptional networks are assumed to contribute to these unpredicted new phenotypes by inducing novel patterns of gene expression. Here we provide a review of the recent literature on the accumulation of regulatory incompatibilities. We review specific examples of regulatory incompatibilities reported at particular loci as well as genome-scale surveys of gene expression in interspecific hybrids. Finally, we consider and preview novel technologies that could help decipher how divergent transcriptional networks interact in hybrids between species.