데이터셋 상세
미국
Screening for sequence-specific RNA-BPs by comprehensive UV crosslinking
Background Specific cis-elements and the associated trans-acting factors have been implicated in the post-transcriptional regulation of gene expression. In the era of genome wide analyses identifying novel trans-acting factors and cis-regulatory elements is a step towards understanding coordinated gene expression. UV-crosslink analysis is a standard method used to identify RNA-binding proteins. Uridine is traditionally used to radiolabel substrate RNAs, however, proteins binding to cis-elments particularly uridine poor will be weakly or not detected. We evaluate here the possibility of using UV-crosslinking with RNA substrates radiolabeled with each of the four ribonucleotides as an approach for screening for novel sequence specific RNA-binding proteins. Results The radiolabeled RNA substrates were derived from the 3'UTRs of the cloned Eg and c-mos Xenopus laevis maternal mRNAs. Specific, but not identical, uv-crosslinking signals were obtained, some of which corresponded to already identified proteins. A signal for a novel 90 kDa protein was observed with the c-mos 3'UTR radiolabeled with both CTP and GTP but not with UTP. The binding site of the 90 kDa RNA-binding protein was localised to a 59-nucleotide portion of the c-mos 3'UTR. Conclusion That the 90 kDa signal was detected with RNAs radiolabeled with CTP or GTP but not UTP illustrates the advantage of radiolabeling all four nucleotides in a UV-crosslink based screen. This method can be used for both long and short RNAs and does not require knowledge of the cis-acting sequence. It should be amenable to high throughput screening for RNA binding proteins.
데이터 정보
연관 데이터
Evaluation of thresholds for the detection of binding sites for regulatory proteins in
공공데이터포털
Background Sites in DNA that bind regulatory proteins can be detected computationally in various ways. Pattern discovery methods analyze collections of genes suspected to be co-regulated on the evidence, for example, of clustering of transcriptome data. Pattern searching methods use sequences with known binding sites to find other genes regulated by a given protein. Such computational methods are important strategies in the discovery and elaboration of regulatory networks and can provide the experimental biologist with a precise prediction of a binding site or identify a gene as a member of a set of co-regulated genes (a regulon). As more variations on such methods are published, however, thorough evaluation is necessary, as performance may differ depending on the conditions of use. Detailed evaluation also helps to improve and understand the behavior of the different methods and computational strategies. Results We used a collection of 86 regulons from Escherichia coli as datasets to evaluate two methods for pattern discovery and pattern searching: dyad analysis/dyad sweeping using the program Dyad-analysis, and multiple alignment using the programs Consensus/Patser. Clearly defined statistical parameters are used to evaluate the two methods in different situations. We placed particular emphasis on minimizing the rate of false positives. Conclusions As a general rule, sensors obtained from experimentally reported binding sites in DNA frequently locate true sites as the highest-scoring sequences within a given upstream region, especially using Consensus/Patser. Pattern discovery is still an unsolved problem, although in the cases where Dyad-analysis finds significant dyads (around 50%), these frequently correspond to true binding sites. With more robust methods, regulatory predictions could help identify the function of unknown genes.
NIST test dataset for assessing baseline nucleic acid sequence screening
공공데이터포털
This repository contains the dataset used in the manuscript "Inter-tool analysis of a NIST dataset for assessing baseline nucleic acid sequence screening". NIST constructed the test dataset based on the current screening recommendations from HHS. The dataset is a FASTA formatted file with blinded numerical sequence headers. The dataset was sent to sequence screening tool developers for initial testing and to obtain feedback about its utility for assessing baseline sequence screening. An additional metadata file provides the NIST-assigned label for each sequence, along with a more detailed description derived from the source database.
An efficient method to successively introduce transgenes into a given genomic locus in the mouse
공공데이터포털
Background Expression of transgenes in mice requires transcriptional regulatory elements that direct expression in a chosen cell type. Unfortunately, the availability of well-characterized promoters that direct bona-fide expression of transgenes in transgenic mice is limited. Here we described a method that allows highly efficient targeting of transgenes to a preselected locus in ES cells. Results A pgk-LoxP-Neo cassette was introduced into a desired genomic locus by homologous recombination in ES cells. The pgk promoter was then removed from the targeted ES cells by Cre recombinase thereby restoring the ES cells' sensitivity to G418. We demonstrated that transgenes could be efficiently introduced into this genomic locus by reconstituting a functional Neo gene. Conclusion This approach is simple and extremely efficient in facilitating the introduction of single-copy transgenes into defined genomic loci. The availability of such an approach greatly enhances the ease of using endogenous regulatory elements to control transgene expression and, in turn, expands the repertoire of elements available for transgene expression.
Noncoding RNA gene detection using comparative sequence analysis
공공데이터포털
Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive. Results We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class. Conclusions We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability.
SNP analysis of the inter-alpha-trypsin inhibitor family heavy chain-related protein (IHRP) gene by a fluorescence-adapted SSCP method
공공데이터포털
Background Single-nucleotide polymorphisms (SNPs) are considered to be useful polymorphic markers for genetic studies of polygenic traits. Single-stranded conformational polymorphism (SSCP) analysis has been widely applied to detect SNPs, including point mutations in cancer and congenital diseases. In this study, we describe an application of the fluorescent labeling of PCR fragments using a fluorescent-adapted primer for SSCP analysis as a novel method. Methods Single-nucleotide polymorphisms (SNPs) of the inter-alpha-trypsin inhibitor family heavy chain-related protein (IHRP) gene were analyzed using a fluorescence-adapted SSCP method. The method was constructed from two procedures: 1) a fluorescent labeling reaction of PCR fragments using fluorescence-adapted primers in a single tube, and 2) electrophoresis on a non-denaturing polyacrylamide gel. Results This method was more economical and convenient than the single-stranded conformational polymorphism (SSCP) methods previously reported in the detection of the labeled fragments obtained. In this study, eight SNPs of the IHRP gene were detected by the fluorescence-adapted SSCP. One of the SNPs was a new SNP resulting in an amino acid substitution, while the other SNPs have already been reported in the public databases. Six SNPs of the IHRP were associated with two haplotypes. Conclusions The fluorescence-adapted SSCP was useful for detecting and genotyping SNPs.
Quantitative assessment of the use of modified nucleoside triphosphates in expression profiling: differential effects on signal intensities and impacts on expression ratios
공공데이터포털
Background The power of DNA microarrays derives from their ability to monitor the expression levels of many genes in parallel. One of the limitations of such powerful analytical tools is the inability to detect certain transcripts in the target sample because of artifacts caused by background noise or poor hybridization kinetics. The use of base-modified analogs of nucleoside triphosphates has been shown to increase complementary duplex stability in other applications, and here we attempted to enhance microarray hybridization signal across a wide range of sequences and expression levels by incorporating these nucleotides into labeled cRNA targets. Results RNA samples containing 2-aminoadenosine showed increases in signal intensity for a majority of the sequences. These results were similar, and additive, to those seen with an increase in the hybridization time. In contrast, 5-methyluridine and 5-methylcytidine decreased signal intensities. Hybridization specificity, as assessed by mismatch controls, was dependent on both target sequence and extent of substitution with the modified nucleotide. Concurrent incorporation of modified and unmodified ATP in a 1:1 ratio resulted in significantly greater numbers of above-threshold ratio calls across tissues, while preserving ratio integrity and reproducibility. Conclusions Incorporation of 2-aminoadenosine triphosphate into cRNA targets is a promising method for increasing signal detection in microarrays. Furthermore, this approach can be optimized to minimize impact on yield of amplified material and to increase the number of expression changes that can be detected.
Transcriptional transactivation by selected short random peptides attached to lexA-GFP fusion proteins
공공데이터포털
Background Transcriptional transactivation is a process with remarkable tolerance for sequence diversity and structural geometry. In studies of the features that constitute transactivating functions, acidity has remained one of the most common characteristics observed among native activation domains and activator peptides. Results We performed a deliberate search of random peptide libraries for peptides capable of conferring transcriptional transactivation on the lexA DNA binding domain. Two libraries, one composed of C-terminal fusions, the other of peptide insertions within the green fluorescent protein structure, were used. We show that (i) peptide sequences other than C-terminal fusions can confer transactivation; (ii) though acidic activator peptides are more common, charge neutral and basic peptides can function as activators; and (iii) peptides as short as 11 amino acids behave in a modular fashion. Conclusions These results support the recruitment model of transcriptional activation and, combined with other studies, suggest the possibility of using activator peptides in a variety of applications, including drug development work.
Efficient Identification of Multiple Pathways: RNA-Seq Analysis of Livers from 56Fe Ion Irradiated Mice
공공데이터포털
Background: mRNA interactions with each other and other signaling molecules define different biological pathways and functions. Researchers have been investigating various tools to analyze these types of interactions. In particular gene co-expression network methods have proved useful in finding and analyzing these molecular interactions. Many different analytical pipelines to identify these interactions networks have been proposed with the aim of identifying an optimal partition of the network where the individual modules are neither too small to make any general inference or too large to be biologically interpretable. Results: In this study we propose a new pipeline to perform gene co-expression network analysis. The proposed pipeline uses WGCNA a widely used software to perform different aspects of gene co-expression network analysis and modularity maximization algorithm to analyze novel RNA-Seq data to understand the effects of low-dose 56Fe ion irradiation on the formation of hepatocellular carcinoma in mice. The network results along with experimental validation show that using WGCNA combined with Modularity provide a more biologically interpretable network in our dataset. Our pipeline showed better performance than the existing clustering algorithm in WGCNA in finding modules and identified a module with mitochondrial subunits that are supported by mitochondrial complex assay. Conclusions: We present a pipeline that can reduce the problem of parameter selection with the existing algorithm in WGCNA for comparable RNA-Seq datasets which may assist in future research to discover novel mRNA interactions and their downstream molecular effects. C57BL16 males were placed into 2 treatment groups and received the following irradiation treatments at Brookhaven National Laboratories (Long Island NY): 600 MeV/n 56Fe (0.2 Gy) and no irradiation. Left liver lobes were collected at 30 60 120 270 and 360 days post-irradiation flash frozen and stored at -80 xc2 xb0C until they could be processed for RNA-Seq. Livers were sampled by taking two 40-micron thick slices using a cryotome at -20 xc2 xb0C. This allowed multiple sampling of the tissue without the tissue going through multiple freeze/thaw cycles. Total RNA was isolated from the liver slices using RNAqueousTM Total RNA Isolation Kit (ThermoFisher Scientific Waltham MA) and rRNA was removed via Ribo-ZeroTM rRNA Removal Kit (Illumina San Diego CA) prior to library preparation with the Illumina TruSeq RNA Library kit. Samples were sequenced in a paired-end 50 base format on an Illumina HiSeq 1500. Reads were aligned to the mouse GRCm38 reference genome using the STAR alignment program version 2.5.3a with the recommended ENCODE options. The -quantMode GeneCounts option was used to obtain read counts per gene based on the Gencode release M14 annotation file. Total number of reads used in analysis varies between 23-35 millions of reads.
Research Article: BMC Medical Genetics
공공데이터포털
Background SFHR (small fragment homologous replacement)-mediated targeting is a process that has been used to correct specific mutations in mammalian cells. This process involves both chemical and cellular factors that are not yet defined. To evaluate potential of this technique for gene therapy it is necessary to characterize gene transfer efficacy in terms of the transfection vehicle, the genetic target, and the cellular processing of the DNA and DNA-vehicle complex. Methods In this study, small fragments of genomic cystic fibrosis (CF) transmembrane conductance regulator (CFTR) DNA, that comprise the wild-type and ΔF508 sequences, were transfected into immortalized CF and normal airway epithelial cells, respectively. Homologous replacement was evaluated using PCR and sequence-based analyses of cellular DNA and RNA. Individual stages of cationic lipid-facilitated SFHR in cultured cell lines were also examined using transmission electron microscopy (TEM). Results We demonstrated that the lipid/DNA (+/-) ratio influences the mode of entry into the cell and therefore affects the efficacy of SFHR-mediated gene targeting. Lipid/DNA complexes with more negative ratios entered the cell via a plasma membrane fusion pathway. Transfer of the DNA that relies on an endocytic pathway appeared more effective at mediating SFHR. In addition, it was also clear that there is a correlation between the specific cell line transfected and the optimal lipid/DNA ratio. Conclusions These studies provide new insights into factors that underlie SFHR-mediated gene targeting efficacy and into the parameters that can be modulated for its optimization.
Research Article: BMC Genetics
공공데이터포털
Background Blood group genotyping is increasingly utilized for prenatal diagnosis and after recent transfusions, but still lacks the specificity of serology. In whites, the presence of antigen D is predicted, if two or more properly selected RHD-specific polymorphism are detected. This prediction must fail, if an antigen D negative RHD positive allele is encountered. Excluding RHDψ and CdeS frequent only in individuals of African descent, most of these alleles are unknown and the population frequency of any such allele has not been determined. Methods We screened 8,442 antigen D negative blood donations by RHD PCR-SSP. RHD PCR positive samples were further characterized by RHD exon specific PCR-SSP or sequencing. The phenotype of the identified alleles was checked and their frequencies in Germans were determined. Results We detected 50 RHD positive samples. Fifteen samples harbored one of three new Del alleles. Thirty samples were due to 14 different D negative alleles, only 5 of which were previously known. Nine of the 14 alleles may have been generated by gene conversion in cis, for which we proposed a mechanism triggered by hairpin formation of chromosomal DNA. The cumulative population frequency of the 14 D negative alleles was 1:1,500. Five samples represented a D+/- chimera, a weak D and three partial D, which had been missed by routine serology; two recipients transfused with blood of the D+/- chimera donor became anti-D immunized. Conclusion The results of this study allowed to devise an improved RHD genotyping strategy, the false-positive rate of which was lower than 1:10,000. The number of characterized RHD positive antigen D negative and Del alleles was more than doubled and their population frequencies in Europe were defined.