데이터셋 상세
미국
Interkingdom gene fusions
Background: Genome comparisons have revealed major lateral gene transfer between the three primary kingdoms of life - Bacteria, Archaea, and Eukarya. Another important evolutionary phenomenon involves the evolutionary mobility of protein domains that form versatile multidomain architectures. We were interested in investigating the possibility of a combination of these phenomena, with an invading gene merging with a pre-existing gene in the recipient genome. Results: Complete genomes of fifteen bacteria, four archaea and one eukaryote were searched for interkingdom gene fusions (IKFs); that is, genes coding for proteins that apparently consist of domains originating from different primary kingdoms. Phylogenetic analysis supported 37 cases of IKF, each of which includes a 'native' domain and a horizontally acquired 'alien' domain. IKFs could have evolved via lateral transfer of a gene coding for the alien domain (or a larger protein containing this domain) followed by recombination with a native gene. For several IKFs, this scenario is supported by the presence of a gene coding for a second, stand-alone version of the alien domain in the recipient genome. Among the genomes investigated, the greatest number of IKFs has been detected in Mycobacterium tuberculosis, where they are almost always accompanied by a stand-alone alien domain. For most of the IKF cases detected in other genomes, the stand-alone counterpart is missing. Conclusions: The results of comparative genome analysis show that IKF formation is a real, but relatively rare, evolutionary phenomenon. We hypothesize that IKFs are formed primarily via the proposed two-stage mechanism, but other than in the Actinomycetes, in which IKF generation seems to be an active, ongoing process, most of the stand-alone intermediates have been eliminated, perhaps because of functional redundancy.
데이터 정보
연관 데이터
On the species of origin: diagnosing the source of symbiotic transcripts
공공데이터포털
Background Most organisms have developed ways to recognize and interact with other species. Symbiotic interactions range from pathogenic to mutualistic. Some molecular mechanisms of interspecific interaction are well understood, but many remain to be discovered. Expressed sequence tags (ESTs) from cultures of interacting symbionts can help identify transcripts that regulate symbiosis, but present a unique challenge for functional analysis. Given a sequence expressed in an interaction between two symbionts, the challenge is to determine from which organism the transcript originated. For high-throughput sequencing from interaction cultures, a reliable computational approach is needed. Previous investigations into GC nucleotide content and comparative similarity searching provide provisional solutions, but a comparative lexical analysis, which uses a likelihood-ratio test of hexamer counts, is more powerful. Results Validation with genes whose origin and function are known yielded 94% accuracy. Microbial (non-plant) transcripts comprised 75% of a Phytophthora sojae-infected soybean (Glycine max cv Harasoy) library, contrasted with 15% or less in root tissue libraries of Medicago truncatula from axenic, Phytophthora medicaginis-infected, mycorrhizal, and rhizobacterial treatments. Mycorrhizal libraries contained about 23% microbial transcripts; an axenic plant library contained a similar proportion of putative microbial transcripts. Conclusions Comparative lexical analysis offers numerous advantages over alternative approaches. Many of the transcripts isolated from mixed cultures were of unknown function, suggesting specificity to symbiotic metabolism and therefore candidates likely to be interesting for further functional investigation. Future investigations will determine whether the abundance of non-plant transcripts in a pure plant library indicates procedural artifacts, horizontally transferred genes, or other phenomena.
The society of genes: networks of functional links between genes from comparative genomics
공공데이터포털
Comparative genomics provides at least three methods for identifying functional links between genes: examination of phylogenetic distributions, analysis of conserved proximity and observations of fusions of genes into a multidomain gene in another organism. We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods.
Genomic comparisons among
공공데이터포털
Background Insertion Sequence (IS) elements are mobile genetic elements widely distributed among bacteria. Their activities cause mutations, promoting genetic diversity and sometimes adaptation. Previous studies have examined their copy number and distribution in Escherichia coli K-12 and natural isolates. Here, we map most of the IS elements in E. coli B and compare their locations with the published genomes of K-12 and O157:H7. Results The genomic locations of IS elements reveal numerous differences between B, K-12, and O157:H7. IS elements occur in hok-sok loci (homologous to plasmid stabilization systems) in both B and K-12, whereas these same loci lack IS elements in O157:H7. IS elements in B and K-12 are often found in locations corresponding to O157:H7-specific sequences, which suggests IS involvement in chromosomal rearrangements including the incorporation of foreign DNA. Some sequences specific to B are identified, as reported previously for O157:H7. The extent of nucleotide sequence divergence between B and K-12 is <2% for most sequences adjacent to IS elements. By contrast, B and K-12 share only a few IS locations besides those in hok-sok loci. Several phenotypic features of B are explained by IS elements, including differential porin expression from K-12. Conclusions These data reveal a high level of IS activity since E. coli B, K-12, and O157:H7 diverged from a common ancestor, including IS association with deletions and incorporation of horizontally acquired genes as well as transpositions. These findings indicate the important role of IS elements in genome plasticity and divergence.
Major genomic mitochondrial lineages delineate early human expansions
공공데이터포털
Background The phylogeographic distribution of human mitochondrial DNA variations allows a genetic approach to the study of modern Homo sapiens dispersals throughout the world from a female perspective. As a new contribution to this study we have phylogenetically analysed complete mitochondrial DNA(mtDNA) sequences from 42 human lineages, representing major clades with known geographic assignation. Results We show the relative relationships among the 42 lineages and present more accurate temporal calibrations than have been previously possible to give new perspectives as how modern humans spread in the Old World. Conclusions The first detectable expansion occurred around 59,000–69,000 years ago from Africa, independently colonizing western Asia and India and, following this southern route, swiftly reaching east Asia. Within Africa, this expansion did not replace but mixed with older lineages detectable today only in Africa. Around 39,000–52,000 years ago, the western Asian branch spread radially, bringing Caucasians to North Africa and Europe, also reaching India, and expanding to north and east Asia. More recent migrations have entangled but not completely erased these primitive footprints of modern human expansions.
Conservation of long-range synteny and microsynteny between the genomes of two distantly related nematodes
공공데이터포털
To assess whether the pattern of high rates of genome rearrangement, with a bias towards within-chromosome events is true of nematodes in general, genome sequence was used to compare the model Caenorhabditis elegans and the filarial parasite Brugia malayi. It is suggested that intrachromosomal rearrangement is a major force driving chromosomal organization in nematodes.
A tandem repeats database for bacterial genomes: application to the genotyping of
공공데이터포털
Background Some pathogenic bacteria are genetically very homogeneous, making strain discrimination difficult. In the last few years, tandem repeats have been increasingly recognized as markers of choice for genotyping a number of pathogens. The rapid evolution of these structures appears to contribute to the phenotypic flexibility of pathogens. The availability of whole-genome sequences has opened the way to the systematic evaluation of tandem repeats diversity and application to epidemiological studies. Results This report presents a database () of tandem repeats from publicly available bacterial genomes which facilitates the identification and selection of tandem repeats. We illustrate the use of this database by the characterization of minisatellites from two important human pathogens, Yersinia pestis and Bacillus anthracis. In order to avoid simple sequence contingency loci which may be of limited value as epidemiological markers, and to provide genotyping tools amenable to ordinary agarose gel electrophoresis, only tandem repeats with repeat units at least 9 bp long were evaluated. Yersinia pestis contains 64 such minisatellites in which the unit is repeated at least 7 times. An additional collection of 12 loci with at least 6 units, and a high internal conservation were also evaluated. Forty-nine are polymorphic among five Yersinia strains (twenty-five among three Y. pestis strains). Bacillus anthracis contains 30 comparable structures in which the unit is repeated at least 10 times. Half of these tandem repeats show polymorphism among the strains tested. Conclusions Analysis of the currently available bacterial genome sequences classifies Bacillus anthracis and Yersinia pestis as having an average (approximately 30 per Mb) density of tandem repeat arrays longer than 100 bp when compared to the other bacterial genomes analysed to date. In both cases, testing a fraction of these sequences for polymorphism was sufficient to quickly develop a set of more than fifteen informative markers, some of which show a very high degree of polymorphism. In one instance, the polymorphism information content index reaches 0.82 with allele length covering a wide size range (600-1950 bp), and nine alleles resolved in the small number of independent Bacillus anthracis strains typed here.
DNA loops and semicatenated DNA junctions
공공데이터포털
Background Alternative DNA conformations are of particular interest as potential signals to mark important sites on the genome. The structural variability of CA microsatellites is particularly pronounced; these are repetitive poly(CA) · poly(TG) DNA sequences spread in all eukaryotic genomes as tracts of up to 60 base pairs long. Many in vitro studies have shown that the structure of poly(CA) · poly(TG) can vary markedly from the classical right handed DNA double helix and adopt diverse alternative conformations. Here we have studied the mechanism of formation and the structure of an alternative DNA structure, named Form X, which was observed previously by polyacrylamide gel electrophoresis of DNA fragments containing a tract of the CA microsatellite poly(CA) · poly(TG) but had not yet been characterized. Results Formation of Form X was found to occur upon reassociation of the strands of a DNA fragment containing a tract of poly(CA) · poly(TG), in a process strongly stimulated by the nuclear proteins HMG1 and HMG2. By inserting Form X into DNA minicircles, we show that the DNA strands do not run fully side by side but instead form a DNA knot. When present in a closed DNA molecule, Form X becomes resistant to heating to 100°C and to alkaline pH. Conclusions Our data strongly support a model of Form X consisting in a DNA loop at the base of which the two DNA duplexes cross, with one of the strands of one duplex passing between the strands of the other duplex, and reciprocally, to form a semicatenated DNA junction also called a DNA hemicatenane.
High correlation between the turnover of nucleotides under mutational pressure and the DNA composition
공공데이터포털
Background Any DNA sequence is a result of compromise between the selection and mutation pressures exerted on it during evolution. It is difficult to estimate the relative influence of each of these pressures on the rate of accumulation of substitutions. However, it is important to discriminate between the effect of mutations, and the effect of selection, when studying the phylogenic relations between taxa. Results We have tested in computer simulations, and analytically, the available substitution matrices for many genomes, and we have found that DNA strands in equilibrium under mutational pressure have unique feature: the fraction of each type of nucleotide is linearly dependent on the time needed for substitution of half of nucleotides of a given type, with a correlation coefficient close to 1. Substitution matrices found for sequences under selection pressure do not have this property. A substitution matrix for the leading strand of the Borrelia burgdorferi genome, having reached equilibrium in computer simulation, gives a DNA sequence with nucleotide composition and asymmetry corresponding precisely to the third positions in codons of protein coding genes located on the leading strand. Conclusions Parameters of mutational pressure allow us to count DNA composition in equilibrium with this mutational pressure. Comparing any real DNA sequence with the sequence in equilibrium it is possible to estimate the distance between these sequences, which could be used as a measure of the selection pressure. Furthermore, the parameters of the mutational pressure enable direct estimation of the relative mutation rates in any DNA sequence in the studied genome.
Reinforcement of genetic coherence in a two-locus model
공공데이터포털
Background In order to maintain populations as units of reproduction and thus enable anagenetic evolution, genetic factors must exist which prevent continuing reproductive separation or enhance reproductive contact. This evolutionary principle is called genetic coherence and it marks the often ignored counterpart of cladistic evolution. Possibilities of the evolution of genetic coherence are studied with the help of a two-locus model with two alleles at each locus. The locus at which viability selection takes place is also the one that controls the fusion of gametes. The second locus acts on the first by modifying the control of the fusion probabilities. It thus acts as a mating modifier whereas the first locus plays the role of the object of selection and mating. Genetic coherence is enhanced by modifications which confer higher probabilities of fusion to heterotypic gametic combinations (resulting in heterozygous zygotes) at the object locus. Results It is shown that mutants at the mating modifier locus, which increase heterotypic fusions but do not lower the homotpyic fusions relative to the resident allele at the object locus, generally replace the resident allele. Since heterozygote advantage at the object locus is a necessary condition for this result to hold true, reinforcement of genetic coherence can be claimed for this case. If the homotypic fusions are lowered, complex situations may arise which may favor or disfavor the mutant depending on initial frequencies and recombination rates. To allow for a generalized analysis including alternative models of genetic coherence as well as the estimation of its degrees in real populations, an operational concept for the measurement of this degree is developed. The resulting index is applied to the interpretation of data from crossing experiments in Alnus species designed to detect incompatibility relations.
A genomic timescale for the origin of eukaryotes
공공데이터포털
Background Genomic sequence analyses have shown that horizontal gene transfer occurred during the origin of eukaryotes as a consequence of symbiosis. However, details of the timing and number of symbiotic events are unclear. A timescale for the early evolution of eukaryotes would help to better understand the relationship between these biological events and changes in Earth's environment, such as the rise in oxygen. We used refined methods of sequence alignment, site selection, and time estimation to address these questions with protein sequences from complete genomes of prokaryotes and eukaryotes. Results Eukaryotes were found to evolve faster than prokaryotes, with those eukaryotes derived from eubacteria evolving faster than those derived from archaebacteria. We found an early time of divergence (~4 billion years ago, Ga) for archaebacteria and the archaebacterial genes in eukaryotes. Our analyses support at least two horizontal gene transfer events in the origin of eukaryotes, at 2.7 Ga and 1.8 Ga. Time estimates for the origin of cyanobacteria (2.6 Ga) and the divergence of an early-branching eukaryote that lacks mitochondria (Giardia) (2.2 Ga) fall between those two events. Conclusions We find support for two symbiotic events in the origin of eukaryotes: one premitochondrial and a later mitochondrial event. The appearance of cyanobacteria immediately prior to the earliest undisputed evidence for the presence of oxygen (2.4–2.2 Ga) suggests that the innovation of oxygenic photosynthesis had a relatively rapid impact on the environment as it set the stage for further evolution of the eukaryotic cell.