데이터셋 상세
미국
Comparison of complete nuclear receptor sets from the human,
Background The availability of complete genome sequences enables all the members of a gene family to be identified without limitations imposed by temporal, spatial or quantitative aspects of mRNA expression. Using the nearly completed human genome sequence, we combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set. This information was used to carry out a comparative genomic study of the NR superfamily. Results Our analysis of the human genome identified two novel NR sequences. Both these contained stop codons within the coding regions, indicating that both are pseudogenes. One (HNF4 γ-related) contained no introns and expressed no detectable mRNA, whereas the other (FXR-related) produced mRNA at relatively high levels in testis. If translated, the latter is predicted to encode a short, non-functional protein. Our analysis indicates that there are fewer than 50 functional human NRs, dramatically fewer than in Caenorhabditis elegans and about twice as many as in Drosophila. Using the complete human NR set we made comparisons with the NR sets of C. elegans and Drosophila. Searches for the >200 NRs unique to C. elegans revealed no human homologs. The comparative analysis also revealed a Drosophila member of NR subfamily NR3, confirming an ancient metazoan origin for this subfamily. Conclusions This work provides the basis for new insights into the evolution and functional relationships of NR superfamily members.
데이터 정보
연관 데이터
An algorithm for mapping positively selected members of quasispecies-type viruses
공공데이터포털
Background Many RNA viruses do not have a single, representative genome but instead form a set of related variants that has been called a quasispecies. The sequence variability of such viruses presents a significant bioinformatics challenge. In order for the sequence information to be understood, the complete mutational spectrum needs to be distilled to a biologically relevant and analyzable representation. Results Here, we develop a "selection mapping" algorithm--QUASI--that identifies the positively selected variants of viral proteins. The key to the selection mapping algorithm is the identification of particular replacement mutations that are overabundant relative to silent mutations at each codon (e.g., threonine at hemagglutinin position 262). Selection mapping identifies such replacement mutations as positively selected. Conversely, selection mapping recognizes negatively selected variants as mutational "noise" (e.g., serine at hemagglutinin position 262). Conclusion Selection mapping is a fundamental improvement over earlier methods (e.g., dN/dS) that identify positive selection at codons but do not identify which amino acids at these codons confer selective advantage. Using QUASI's selection maps, we characterize the selected mutational landscapes of influenza A H3 hemagglutinin, HIV-1 reverse transcriptase, and HIV-1 gp120.
Noncoding RNA gene detection using comparative sequence analysis
공공데이터포털
Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive. Results We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class. Conclusions We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability.
Isolation and characterization of new
공공데이터포털
Background Nuclear pore complexes (NPCs) are essential for facilitated, directional nuclear transport; however, the mechanism by which ~30 different nucleoporins (nups) are assembled into NPCs is unknown. We combined a genetic strategy in Saccharomyces cerevisiae with Green Fluorescence Protein (GFP) technology to identify mutants in NPC structure, assembly, and localization. To identify such mutants, a bank of temperature sensitive strains was generated and examined by fluorescence microscopy for mislocalization of GFP-tagged nups at the non-permissive temperature. Results A total of 121 mutant strains were isolated, with most showing GFP-Nic96 and Nup170-GFP mislocalized to discrete, cytoplasmic foci. By electron microscopy, several mutants also displayed an expansion of the endoplasmic reticulum (ER). Complementation analysis identified several mutant groups with defects in components required for ER/Golgi trafficking (sec13, sec23, sec27, and bet3). By directed testing, we found that mutant alleles of all COPII components resulted in altered GFP-Nup localization. Finally, at least nine unknown complementation groups were identified that lack secretion defects. Conclusion The isolation of sec mutants in the screen could reflect a direct role for vesicle fusion or the COPII coat during NPC assembly; however, only those sec mutants that altered ER structure affected Nup localization. This suggests that the GFP-Nup mislocalization phenotypes observed in these mutants were the indirect result of overproliferation of the ER and connected outer nuclear envelope. The identification of potentially novel mutants with no secretory defects suggests the distinct GFP-Nup localization defects in other mutants in the collection will provide insights into NPC structure and assembly.
DNA loops and semicatenated DNA junctions
공공데이터포털
Background Alternative DNA conformations are of particular interest as potential signals to mark important sites on the genome. The structural variability of CA microsatellites is particularly pronounced; these are repetitive poly(CA) · poly(TG) DNA sequences spread in all eukaryotic genomes as tracts of up to 60 base pairs long. Many in vitro studies have shown that the structure of poly(CA) · poly(TG) can vary markedly from the classical right handed DNA double helix and adopt diverse alternative conformations. Here we have studied the mechanism of formation and the structure of an alternative DNA structure, named Form X, which was observed previously by polyacrylamide gel electrophoresis of DNA fragments containing a tract of the CA microsatellite poly(CA) · poly(TG) but had not yet been characterized. Results Formation of Form X was found to occur upon reassociation of the strands of a DNA fragment containing a tract of poly(CA) · poly(TG), in a process strongly stimulated by the nuclear proteins HMG1 and HMG2. By inserting Form X into DNA minicircles, we show that the DNA strands do not run fully side by side but instead form a DNA knot. When present in a closed DNA molecule, Form X becomes resistant to heating to 100°C and to alkaline pH. Conclusions Our data strongly support a model of Form X consisting in a DNA loop at the base of which the two DNA duplexes cross, with one of the strands of one duplex passing between the strands of the other duplex, and reciprocally, to form a semicatenated DNA junction also called a DNA hemicatenane.
Quantitative assessment of the use of modified nucleoside triphosphates in expression profiling: differential effects on signal intensities and impacts on expression ratios
공공데이터포털
Background The power of DNA microarrays derives from their ability to monitor the expression levels of many genes in parallel. One of the limitations of such powerful analytical tools is the inability to detect certain transcripts in the target sample because of artifacts caused by background noise or poor hybridization kinetics. The use of base-modified analogs of nucleoside triphosphates has been shown to increase complementary duplex stability in other applications, and here we attempted to enhance microarray hybridization signal across a wide range of sequences and expression levels by incorporating these nucleotides into labeled cRNA targets. Results RNA samples containing 2-aminoadenosine showed increases in signal intensity for a majority of the sequences. These results were similar, and additive, to those seen with an increase in the hybridization time. In contrast, 5-methyluridine and 5-methylcytidine decreased signal intensities. Hybridization specificity, as assessed by mismatch controls, was dependent on both target sequence and extent of substitution with the modified nucleotide. Concurrent incorporation of modified and unmodified ATP in a 1:1 ratio resulted in significantly greater numbers of above-threshold ratio calls across tissues, while preserving ratio integrity and reproducibility. Conclusions Incorporation of 2-aminoadenosine triphosphate into cRNA targets is a promising method for increasing signal detection in microarrays. Furthermore, this approach can be optimized to minimize impact on yield of amplified material and to increase the number of expression changes that can be detected.
Human members of the eukaryotic protein kinase family
공공데이터포털
Publicly available genetic sequence data were searched for human sequences that potentially represent protein kinases, important players in virtually every signaling pathway. After removal of duplicates, splice variants and pseudogenes, this search yielded 510 sequences with recognizable similarity to eukaryotic protein kinases.
Improved analytical methods for microarray-based genome-composition analysis
공공데이터포털
Genome-composition analysis using microarrays can be used to categorize genes into 'present' and 'divergent' categories. This involves selecting a signal value that is used as a cutoff to discriminate present and divergent genes, but this can result in the misclassification of many genes. A method is described that depends on the shape of the signal-ratio distribution and does not require empirical determination of a cutoff. Many genes previously classified as present using static methods are in fact divergent on the basis of microarray signal; this is corrected by our algorithm.
GENOSIM
공공데이터포털
,The genosim package simulates genotypes, breeding values, and phenotypes; simulates DNA sequence read depth (numbers of A and B alleles); and resolves SNP conflicts between parent and offspring genotypes.,,
NIST test dataset for assessing baseline nucleic acid sequence screening
공공데이터포털
This repository contains the dataset used in the manuscript "Inter-tool analysis of a NIST dataset for assessing baseline nucleic acid sequence screening". NIST constructed the test dataset based on the current screening recommendations from HHS. The dataset is a FASTA formatted file with blinded numerical sequence headers. The dataset was sent to sequence screening tool developers for initial testing and to obtain feedback about its utility for assessing baseline sequence screening. An additional metadata file provides the NIST-assigned label for each sequence, along with a more detailed description derived from the source database.
A draft annotation and overview of the human genome
공공데이터포털
Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence.