교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

Noncoding RNA gene detection using comparative sequence analysis

Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive. Results We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class. Conclusions We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability.

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/noncoding-rna-gene-detection-using-comparative-sequence-analysis
라이선스
notspecified
비용
제공기관
U.S. Department of Health & Human Services
관리부서
데이터
- Official Government Data Source
- 랜딩 페이지

연관 데이터

Comparison of complete nuclear receptor sets from the human,

공공데이터포털

Background The availability of complete genome sequences enables all the members of a gene family to be identified without limitations imposed by temporal, spatial or quantitative aspects of mRNA expression. Using the nearly completed human genome sequence, we combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set. This information was used to carry out a comparative genomic study of the NR superfamily. Results Our analysis of the human genome identified two novel NR sequences. Both these contained stop codons within the coding regions, indicating that both are pseudogenes. One (HNF4 γ-related) contained no introns and expressed no detectable mRNA, whereas the other (FXR-related) produced mRNA at relatively high levels in testis. If translated, the latter is predicted to encode a short, non-functional protein. Our analysis indicates that there are fewer than 50 functional human NRs, dramatically fewer than in Caenorhabditis elegans and about twice as many as in Drosophila. Using the complete human NR set we made comparisons with the NR sets of C. elegans and Drosophila. Searches for the >200 NRs unique to C. elegans revealed no human homologs. The comparative analysis also revealed a Drosophila member of NR subfamily NR3, confirming an ancient metazoan origin for this subfamily. Conclusions This work provides the basis for new insights into the evolution and functional relationships of NR superfamily members.

An algorithm for mapping positively selected members of quasispecies-type viruses

공공데이터포털

Background Many RNA viruses do not have a single, representative genome but instead form a set of related variants that has been called a quasispecies. The sequence variability of such viruses presents a significant bioinformatics challenge. In order for the sequence information to be understood, the complete mutational spectrum needs to be distilled to a biologically relevant and analyzable representation. Results Here, we develop a "selection mapping" algorithm--QUASI--that identifies the positively selected variants of viral proteins. The key to the selection mapping algorithm is the identification of particular replacement mutations that are overabundant relative to silent mutations at each codon (e.g., threonine at hemagglutinin position 262). Selection mapping identifies such replacement mutations as positively selected. Conversely, selection mapping recognizes negatively selected variants as mutational "noise" (e.g., serine at hemagglutinin position 262). Conclusion Selection mapping is a fundamental improvement over earlier methods (e.g., dN/dS) that identify positive selection at codons but do not identify which amino acids at these codons confer selective advantage. Using QUASI's selection maps, we characterize the selected mutational landscapes of influenza A H3 hemagglutinin, HIV-1 reverse transcriptase, and HIV-1 gp120.

A search for reverse transcriptase-coding sequences reveals new non-LTR retrotransposons in the genome of

공공데이터포털

Background: Non-long terminal repeat (non-LTR) retrotransposons are eukaryotic mobile genetic elements that transpose by reverse transcription of an RNA intermediate. We have performed a systematic search for sequences matching the characteristic reverse transcriptase domain of non-LTR retrotransposons in the sequenced regions of the Drosophila melanogaster genome. Results: In addition to previously characterized BS, Doc, F, G, I and Jockey elements, we have identified new non-LTR retrotransposons: Waldo, You and JuanDm. Waldo elements are related to mosquito RTI elements. You to the Drosophila I factor, and JuanDm to mosquito Juan-A and Juan-C. Interestingly, all JuanDm elements are highly homogeneous in sequence, suggesting that they are recent components of the Drosophila genome. Conclusions: The genome of D. melanogaster contains at least ten families of non-site-specific non-LTR retrotransposons representing three distinct clades. Many of these families contain potentially active members. Fine evolutionary analyses must await the more accurate sequences that are expected in the next future.

Screening for sequence-specific RNA-BPs by comprehensive UV crosslinking

공공데이터포털

Background Specific cis-elements and the associated trans-acting factors have been implicated in the post-transcriptional regulation of gene expression. In the era of genome wide analyses identifying novel trans-acting factors and cis-regulatory elements is a step towards understanding coordinated gene expression. UV-crosslink analysis is a standard method used to identify RNA-binding proteins. Uridine is traditionally used to radiolabel substrate RNAs, however, proteins binding to cis-elments particularly uridine poor will be weakly or not detected. We evaluate here the possibility of using UV-crosslinking with RNA substrates radiolabeled with each of the four ribonucleotides as an approach for screening for novel sequence specific RNA-binding proteins. Results The radiolabeled RNA substrates were derived from the 3'UTRs of the cloned Eg and c-mos Xenopus laevis maternal mRNAs. Specific, but not identical, uv-crosslinking signals were obtained, some of which corresponded to already identified proteins. A signal for a novel 90 kDa protein was observed with the c-mos 3'UTR radiolabeled with both CTP and GTP but not with UTP. The binding site of the 90 kDa RNA-binding protein was localised to a 59-nucleotide portion of the c-mos 3'UTR. Conclusion That the 90 kDa signal was detected with RNAs radiolabeled with CTP or GTP but not UTP illustrates the advantage of radiolabeling all four nucleotides in a UV-crosslink based screen. This method can be used for both long and short RNAs and does not require knowledge of the cis-acting sequence. It should be amenable to high throughput screening for RNA binding proteins.

Phenotypic silencing of cytoplasmic genes using sequence-specific double-stranded short interfering RNA and its application in the reverse genetics of wild type negative-strand RNA viruses

공공데이터포털

Background Post-transcriptional gene silencing (PTGS) by short interfering RNA has opened up new directions in the phenotypic mutation of cellular genes. However, its efficacy on non-nuclear genes and its effect on the interferon pathway remain unexplored. Since directed mutation of RNA genomes is not possible through conventional mutagenesis, we have tested sequence-specific 21-nucleotide long double-stranded RNAs (dsRNAs) for their ability to silence cytoplasmic RNA genomes. Results Short dsRNAs were generated against specific mRNAs of respiratory syncytial virus, a nonsegmented negative-stranded RNA virus with a cytoplasmic life cycle. At nanomolar concentrations, the dsRNAs specifically abrogated expression of the corresponding viral proteins, and produced the expected mutant phenotype ex vivo. The dsRNAs did not induce an interferon response, and did not inhibit cellular gene expression. The ablation of the viral proteins correlated with the loss of the specific mRNAs. In contrast, viral genomic and antigenomic RNA, which are encapsidated, were not directly affected. Conclusions Synthetic inhibitory dsRNAs are effective in specific silencing of RNA genomes that are exclusively cytoplasmic and transcribed by RNA-dependent RNA polymerases. RNA-directed RNA gene silencing does not require cloning, expression, and mutagenesis of viral cDNA, and thus, will allow the generation of phenotypic null mutants of specific RNA viral genes under normal infection conditions and at any point in the infection cycle. This will, for the first time, permit functional genomic studies, attenuated infections, reverse genetic analysis, and studies of host-virus signaling pathways using a wild type RNA virus, unencumbered by any superinfecting virus.

Quantitative assessment of the use of modified nucleoside triphosphates in expression profiling: differential effects on signal intensities and impacts on expression ratios

공공데이터포털

Background The power of DNA microarrays derives from their ability to monitor the expression levels of many genes in parallel. One of the limitations of such powerful analytical tools is the inability to detect certain transcripts in the target sample because of artifacts caused by background noise or poor hybridization kinetics. The use of base-modified analogs of nucleoside triphosphates has been shown to increase complementary duplex stability in other applications, and here we attempted to enhance microarray hybridization signal across a wide range of sequences and expression levels by incorporating these nucleotides into labeled cRNA targets. Results RNA samples containing 2-aminoadenosine showed increases in signal intensity for a majority of the sequences. These results were similar, and additive, to those seen with an increase in the hybridization time. In contrast, 5-methyluridine and 5-methylcytidine decreased signal intensities. Hybridization specificity, as assessed by mismatch controls, was dependent on both target sequence and extent of substitution with the modified nucleotide. Concurrent incorporation of modified and unmodified ATP in a 1:1 ratio resulted in significantly greater numbers of above-threshold ratio calls across tissues, while preserving ratio integrity and reproducibility. Conclusions Incorporation of 2-aminoadenosine triphosphate into cRNA targets is a promising method for increasing signal detection in microarrays. Furthermore, this approach can be optimized to minimize impact on yield of amplified material and to increase the number of expression changes that can be detected.

Discontinuous and non-discontinuous subgenomic RNA transcription in a nidovirus

공공데이터포털

Arteri-, corona-, toro- and roniviruses are evolutionarily related positive-strand RNA viruses, united in the order Nidovirales. The best studied nidoviruses, the corona- and arteriviruses, employ a unique transcription mechanism, which involves discontinuous RNA synthesis, a process resembling similarity-assisted copy-choice RNA recombination. During infection, multiple subgenomic (sg) mRNAs are transcribed from a mirror set of sg negative-strand RNA templates. The sg mRNAs all possess a short 5′ common leader sequence, derived from the 5′ end of the genomic RNA. The joining of the non-contiguous ‘leader’ and ‘body’ sequences presumably occurs during minus-strand synthesis. To study whether toroviruses use a similar transcription mechanism, we characterized the 5′ termini of the genome and the four sg mRNAs of Berne virus (BEV). We show that BEV mRNAs 3–5 lack a leader sequence. Surprisingly, however, RNA 2 does contain a leader, identical to the 5′-terminal 18 residues of the genome. Apparently, BEV combines discontinuous and non-discontinous RNA synthesis to produce its sg mRNAs. Our findings have important implications for the understanding of the mechanism and evolution of nidovirus transcription.

Evaluation of thresholds for the detection of binding sites for regulatory proteins in

공공데이터포털

Background Sites in DNA that bind regulatory proteins can be detected computationally in various ways. Pattern discovery methods analyze collections of genes suspected to be co-regulated on the evidence, for example, of clustering of transcriptome data. Pattern searching methods use sequences with known binding sites to find other genes regulated by a given protein. Such computational methods are important strategies in the discovery and elaboration of regulatory networks and can provide the experimental biologist with a precise prediction of a binding site or identify a gene as a member of a set of co-regulated genes (a regulon). As more variations on such methods are published, however, thorough evaluation is necessary, as performance may differ depending on the conditions of use. Detailed evaluation also helps to improve and understand the behavior of the different methods and computational strategies. Results We used a collection of 86 regulons from Escherichia coli as datasets to evaluate two methods for pattern discovery and pattern searching: dyad analysis/dyad sweeping using the program Dyad-analysis, and multiple alignment using the programs Consensus/Patser. Clearly defined statistical parameters are used to evaluate the two methods in different situations. We placed particular emphasis on minimizing the rate of false positives. Conclusions As a general rule, sensors obtained from experimentally reported binding sites in DNA frequently locate true sites as the highest-scoring sequences within a given upstream region, especially using Consensus/Patser. Pattern discovery is still an unsolved problem, although in the cases where Dyad-analysis finds significant dyads (around 50%), these frequently correspond to true binding sites. With more robust methods, regulatory predictions could help identify the function of unknown genes.

Mouse ribonuclease III. cDNA structure, expression analysis, and chromosomal location

공공데이터포털

Background Members of the ribonuclease III superfamily of double-stranded(ds)-RNA-specific endoribonucleases participate in diverse RNA maturation and decay pathways in eukaryotic and prokaryotic cells. A human RNase III orthologue has been implicated in ribosomal RNA maturation. To better understand the structure and mechanism of mammalian RNase III and its involvement in RNA metabolism we determined the cDNA structure, chromosomal location, and expression patterns of mouse RNase III. Results The predicted mouse RNase III polypeptide contains 1373 amino acids (~160 kDa). The polypeptide exhibits a single C-terminal dsRNA-binding motif (dsRBM), tandem catalytic domains, a proline-rich region (PRR) and an RS domain. Northern analysis and RT-PCR reveal that the transcript (4487 nt) is expressed in all tissues examined, including extraembryonic tissues and the midgestation embryo. Northern analysis indicates the presence of an additional, shorter form of the transcript in testicular tissue. Fluorescent in situ hybridization demonstrates that the mouse RNase III gene maps to chromosome 15, region B, and that the human RNase III gene maps to a syntenic location on chromosome 5p13-p14. Conclusions The broad transcript expression pattern indicates a conserved cellular role(s) for mouse RNase III. The putative polypeptide is highly similar to human RNase III (99% amino acid sequence identity for the two catalytic domains and dsRBM), but is distinct from other eukaryotic orthologues, including Dicer, which is involved in RNA interference. The mouse RNase III gene has a chromosomal location distinct from the Dicer gene.

DNA loops and semicatenated DNA junctions

공공데이터포털

Background Alternative DNA conformations are of particular interest as potential signals to mark important sites on the genome. The structural variability of CA microsatellites is particularly pronounced; these are repetitive poly(CA) · poly(TG) DNA sequences spread in all eukaryotic genomes as tracts of up to 60 base pairs long. Many in vitro studies have shown that the structure of poly(CA) · poly(TG) can vary markedly from the classical right handed DNA double helix and adopt diverse alternative conformations. Here we have studied the mechanism of formation and the structure of an alternative DNA structure, named Form X, which was observed previously by polyacrylamide gel electrophoresis of DNA fragments containing a tract of the CA microsatellite poly(CA) · poly(TG) but had not yet been characterized. Results Formation of Form X was found to occur upon reassociation of the strands of a DNA fragment containing a tract of poly(CA) · poly(TG), in a process strongly stimulated by the nuclear proteins HMG1 and HMG2. By inserting Form X into DNA minicircles, we show that the DNA strands do not run fully side by side but instead form a DNA knot. When present in a closed DNA molecule, Form X becomes resistant to heating to 100°C and to alkaline pH. Conclusions Our data strongly support a model of Form X consisting in a DNA loop at the base of which the two DNA duplexes cross, with one of the strands of one duplex passing between the strands of the other duplex, and reciprocally, to form a semicatenated DNA junction also called a DNA hemicatenane.

목록