To identify mechanisms that influence the evolution of bacterial transposons, DNA sequence variation was evaluated among homologs of insertion sequences IS1, IS3 and IS30 from natural strains of Escherichia coli and related enteric bacteria. The nucleotide sequences within each class of IS were highly conserved among E. coli strains, over 99.7% similar to a consensus sequence. When compared to the range of nucleotide divergence among chromosomal genes, these data indicate high turnover and rapid movement of the transposons among clonal lineages of E. coli. In addition, length polymorphism among IS appears to be far less frequent than in eukaryotic transposons, indicating that nonfunctional elements comprise a smaller fraction of bacterial transposon populations than found in eukaryotes. IS present in other species of enteric bacteria are substantially divergent from E. coli elements, indicating that IS are mobilized among bacterial species at a reduced rate. However, homologs of IS1 and IS3 from diverse species provide evidence that recombination events and horizontal transfer of IS among species have both played major roles in the evolution of these elements. IS3 elements from E. coli and Shigella show multiple, nested, intragenic recombinations with a distantly related transposon, and IS1 homologs from diverse taxa reveal a mosaic structure indicative of multiple recombination and horizontal transfer events.
The population biology and molecular evolution of the transposable element mariner has been studied in the eight species of the melanogaster subgroup of the Drosophila subgenus Sophophora. The element occurs in D. simulans, D. mauritiana, D. sechellia, D. teissieri, and D. yakuba, but is not found in D. melanogaster, D. erecta, or D. orena. Sequence comparisons suggest that the mariner element was present in the ancestor of the species subgroup and was lost in some of the lineages. Most species contain both active and inactive mariner elements. A deletion of most of the 3' end characterizes many elements in D. teissieri, but in other species the inactive elements differ from active ones only by simple nucleotide substitutions or small additions/deletions. Active mariner elements from all species are quite similar in nucleotide sequence, although there are some species-specific differences. Many, but not all, of the inactive elements are also quite closely related. The genome of D. mauritiana contains 20-30 copies of mariner, that of D. simulans 0-10, and that of D. sechellia only two copies (at fixed positions in the genome). The mariner situation in D. sechellia may reflect a reduced effective population size owing to the restricted geographical range of this species and its ecological specialization to the fruit of Morinda citrifolia.
Highly polymorphic segments of the human genome containing variable numbers of tandem repeats (VNTRs) have been widely used to establish DNA profiles of individuals for use in forensics. Methods of estimating the probability of occurrence of matching DNA profiles between two randomly selected individuals have been subject to extensive debate regarding the possibility of significant substructure occurring within the major races. We have sampled two Caucasian subpopulations, Finns and Italians, at four commonly used VNTR loci to determine the extent to which the subgroups differ from each other and from a mixed Caucasian database. The data were also analyzed for the occurrence of linkage disequilibrium among the loci. The allele frequency distributions of some loci were found to differ significantly among the subpopulations in a manner consistent with population substructure. Major differences were also found in the probability of occurrence of matching DNA profiles between two individuals chosen at random from the same subpopulation. With respect to the Finnish and Italian subpopulations, the conventional product rule for estimating the probability of a multilocus VNTR match using a mixed Caucasian database consistently yields estimates that are artificially small. Systematic errors of this type were not found using the interim ceiling principle recently advocated in the National Research Council's report [National Research Council (1992) DNA Technology in Forensic Science (Natl. Acad. Sci., Washington)]. The interim ceiling principle is based on currently available racial or ethnic databases and sets an arbitrary lower limit on each VNTR allele frequency. In the future the ceiling frequencies are expected to be established from more adequate data acquired for relevant VNTR loci from multiple subpopulations.
Inconsistencies in taxonomic relationships implicit in different sets of nucleic acid sequences potentially result from horizontal transfer of genetic material between genomes. A nonparametric method is proposed to determine whether such inconsistencies are statistically significant. A similarity coefficient is calculated from ranked pairwise identities and evaluated against a distribution of similarity coefficients generated from resampled data. Subsequent analyses of partial data sets, obtained by the elimination of individual taxa, identify particular taxa to which the significance may be attributed, and can sometimes help in distinguishing horizontal genetic transfer from inconsistencies due to convergent evolution or variation in evolutionary rate. The method was successfully applied to data sets that were not found to be significantly different with existing methods that use comparisons of phylogenetic trees. The new statistical framework is also applicable to the inference of horizontal transfer from restriction fragment length polymorphism distributions and protein sequences.
Defective (nonautonomous) copies of transposable elements are relatively common in the genomes of eukaryotes but less common in the genomes of prokaryotes. With regard to transposable elements that exist exclusively in the form of DNA (nonretroviral transposable elements), nonautonomous elements may play a role in the regulation of transposition. In prokaryotes, plasmid-mediated horizontal transmission probably imposes a selection against nonautonomous elements, since nonautonomous elements are incapable of mobilizing themselves. The lower relative frequency of nonautonomous elements in prokaryotes may also reflect the coupling of transcription and translation, which may bias toward the cis activation of transposition. The cis bias we suggest need not be absolute in order to militate against the long-term maintenance of prokaryotic elements unable to transpose on their own. Furthermore, any cis bias in transposition would also decrease the opportunity for trans repression of transposition by nonautonomous elements.
Population data suggest that many parasitic protozoa (e.g. Trypanosoma, Leishmania, Entamoeba and Giardia) reproduce clonally, but this hypothesis has been highly controversial for Plasmodium falciparum. Although reproduction is predominantly clonal in the enteric bacteria Escherichia coli and Salmonella, the level of recombination affecting short (< 1 kb) regions of the chromosome is sufficient such that many genes are obviously mosaics of different ancestries. Transposable insertion sequences in E. coli are examples of selfish DNA whose short-term population dynamics are determined mainly by transposition and horizontal transmission among strains balanced against the regulation of transposition as a function of copy number, and negative effects on fitness. Occasional advantageous effects of transposable elements have also been documented.
Frequencies of mutant sites are modeled as a Poisson random field in two species that share a sufficiently recent common ancestor. The selective effect of the new alleles can be favorable, neutral, or detrimental. The model is applied to the sample configurations of nucleotides in the alcohol dehydrogenase gene (Adh) in Drosophila simulans and Drosophila yakuba. Assuming a synonymous mutation rate of 1.5 x 10(-8) per site per year and 10 generations per year, we obtain estimates for the effective population size (N(e) = 6.5 x 10(6)), the species divergence time (tdiv = 3.74 million years), and an average selection coefficient (sigma = 1.53 x 10(-6) per generation for advantageous or mildly detrimental replacements), although it is conceivable that only two of the amino acid replacements were selected and the rest neutral. The analysis, which includes a sampling theory for the independent infinite sites model with selection, also suggests the estimate that the number of amino acids in the enzyme that are susceptible to favorable mutation is in the range 2-23 at any one time. The approach provides a theoretical basis for the use of a 2 x 2 contingency table to compare fixed differences and polymorphic sites with silent sites and amino acid replacements.
He-T sequences are a complex repetitive family of DNA sequences in Drosophila that are associated with telomeric regions, pericentromeric heterochromatin, and the Y chromosome. A component of the He-T family containing open reading frames (ORFs) is described. These ORF-containing elements within the He-T family are designated T-elements, since hybridization in situ with the polytene salivary gland chromosomes results in detectable signal exclusively at the chromosome tips. One T-element that has been sequenced includes ORFs of 1,428 and 1,614 bp. The ORFs are overlapping but one nucleotide out of frame with respect to each other. The longer ORF contains cysteine-histidine motifs strongly resembling nucleic acid binding domains of gag-like proteins, and the overall organization of the T-element ORFs is reminiscent of LINE elements. The T-elements are transcribed and appear to be conserved in Drosophila species related to D. melanogaster. The results suggest that T-elements may play a role in the structure and/or function of telomeres.
Active and inactive mariner elements from natural and laboratory populations of Drosophila simulans were isolated and sequenced in order to assess their nucleotide variability and to compare them with previously isolated mariner elements from the sibling species Drosophila mauritiana and Drosophila sechellia. The active elements of D. simulans are very similar among themselves (average 99.7% nucleotide identity), suggesting that the level of mariner expression in different natural populations is largely determined by position effects, dosage effects and perhaps other factors. Furthermore, the D. simulans elements exhibit nucleotide identities of 98% or greater when compared with mariner elements from the sibling species. Parsimony analysis of mariner elements places active elements from the three species into separate groups and suggests that D. simulans is the species from which mariner elements in D. mauritiana and D. sechellia are most likely derived. This result strongly suggests that the ancestral form of mariner among these species was an active element. The two inactive mariner elements sequenced from D. simulans are very similar to the inactive peach element from D. mauritiana. The similarity may result from introgression between D. simulans and D. mauritiana or from selective constraints imposed by regulatory effects of inactive elements.
A physical map of the genome of Drosophila melanogaster has been created using 965 yeast artificial chromosome (YAC) clones assigned to locations in the cytogenetic map by in situ hybridization with the polytene salivary gland chromosomes. Clones with insert sizes averaging about 200 kb, totaling 1.7 genome equivalents, have been mapped. More than 80% of the euchromatic genome is included in the mapped clones, and 75% of the euchromatic genome is included in 161 cytological contigs ranging in size up to 2.5 Mb (average size 510 kb). On the other hand, YAC coverage of the one-third of the genome constituting the heterochromatin is incomplete, and clones containing long tracts of highly repetitive simple satellite DNA sequences have not been recovered.
DNA sequences and chromosomal locations of four Drosophila pseudoobscura opsin genes were compared with those from Drosophila melanogaster, to determine factors that influence the evolution of multigene families. Although the opsin proteins perform the same primary functions, the comparisons reveal a wide range of evolutionary rates. Amino acid identities for the opsins range from 90% for Rh2 to more than 95% for Rh1 and Rh4. Variation in the rate of synonymous site substitution is especially striking: the major opsin, encoded by the Rh1 locus, differs at only 26.1% of synonymous sites between D. pseudoobscura and D. melanogaster, while the other opsin loci differ by as much as 39.2% at synonymous sites. Rh3 and Rh4 have similar levels of synonymous nucleotide substitution but significantly different amounts of amino acid replacement. This decoupling of nucleotide substitution and amino acid replacement suggests that different selective pressures are acting on these similar genes. There is significant heterogeneity in base composition and codon usage bias among the opsin genes in both species, but there are no consistent relationships between these factors and the rate of evolution of the opsins. In addition to exhibiting variation in evolutionary rates, the opsin loci in these species reveal rearrangements of chromosome elements.