데이터셋 상세
미국
VecScreen
Quickly identifying segments of a nucleic acid sequence that may be of vector origin.
데이터 정보
연관 데이터
Efficient Identification of Multiple Pathways: RNA-Seq Analysis of Livers from 56Fe Ion Irradiated Mice
공공데이터포털
Background: mRNA interactions with each other and other signaling molecules define different biological pathways and functions. Researchers have been investigating various tools to analyze these types of interactions. In particular gene co-expression network methods have proved useful in finding and analyzing these molecular interactions. Many different analytical pipelines to identify these interactions networks have been proposed with the aim of identifying an optimal partition of the network where the individual modules are neither too small to make any general inference or too large to be biologically interpretable. Results: In this study we propose a new pipeline to perform gene co-expression network analysis. The proposed pipeline uses WGCNA a widely used software to perform different aspects of gene co-expression network analysis and modularity maximization algorithm to analyze novel RNA-Seq data to understand the effects of low-dose 56Fe ion irradiation on the formation of hepatocellular carcinoma in mice. The network results along with experimental validation show that using WGCNA combined with Modularity provide a more biologically interpretable network in our dataset. Our pipeline showed better performance than the existing clustering algorithm in WGCNA in finding modules and identified a module with mitochondrial subunits that are supported by mitochondrial complex assay. Conclusions: We present a pipeline that can reduce the problem of parameter selection with the existing algorithm in WGCNA for comparable RNA-Seq datasets which may assist in future research to discover novel mRNA interactions and their downstream molecular effects. C57BL16 males were placed into 2 treatment groups and received the following irradiation treatments at Brookhaven National Laboratories (Long Island NY): 600 MeV/n 56Fe (0.2 Gy) and no irradiation. Left liver lobes were collected at 30 60 120 270 and 360 days post-irradiation flash frozen and stored at -80 xc2 xb0C until they could be processed for RNA-Seq. Livers were sampled by taking two 40-micron thick slices using a cryotome at -20 xc2 xb0C. This allowed multiple sampling of the tissue without the tissue going through multiple freeze/thaw cycles. Total RNA was isolated from the liver slices using RNAqueousTM Total RNA Isolation Kit (ThermoFisher Scientific Waltham MA) and rRNA was removed via Ribo-ZeroTM rRNA Removal Kit (Illumina San Diego CA) prior to library preparation with the Illumina TruSeq RNA Library kit. Samples were sequenced in a paired-end 50 base format on an Illumina HiSeq 1500. Reads were aligned to the mouse GRCm38 reference genome using the STAR alignment program version 2.5.3a with the recommended ENCODE options. The -quantMode GeneCounts option was used to obtain read counts per gene based on the Gencode release M14 annotation file. Total number of reads used in analysis varies between 23-35 millions of reads.
Genome Sequence Data Set01
공공데이터포털
The fasta files (Genome_Set01.zip) contain the reference-assisted de novo assemblies (as contigs) of seven Legionella pneumophila subps. pneumophila isolates. The table contains rows as isolates (yellow) and columns as attributes (green) for each individual genome. This dataset is associated with the following publication: Gomez-Alvarez, V., L. Boczek, D. King, A. Pemberton, S. Pfaller, M. Rodgers, J. Santodomingo, and R. Revetta. Draft Genome Sequences of Seven Legionella pneumophila Isolates from a Hot Water System of a Large Building. Microbiology Resource Announcements. American Society for Microbiology, Washington, DC, USA, 8(18): e00384-19, (2019).
Genome Sequence Data Set01
공공데이터포털
The fasta files (Genome_Set01.zip) contain the reference-assisted de novo assemblies (as contigs) of three Escherichia coli isolates. The table contains rows as isolates (yellow) and columns as attributes (green) for each individual genome. This dataset is associated with the following publication: Gomez-Alvarez, V., and J. Hoelle-Schwalbach. Draft Genome Sequences of Antibiotic-Resistant Escherichia coli Isolates from U.S. Wastewater Treatment Plants. Microbiology Resource Announcements. American Society for Microbiology, Washington, DC, USA, 8(23): e00351-19, (2019).
FastGroup: A program to dereplicate libraries of 16S rDNA sequences
공공데이터포털
Background Ribosomal 16S DNA sequences are an essential tool for identifying and classifying microbes. High-throughput DNA sequencing now makes it economically possible to produce very large datasets of 16S rDNA sequences in short time periods, necessitating new computer tools for analyses. Here we describe FastGroup, a Java program designed to dereplicate libraries of 16S rDNA sequences. By dereplication we mean to: 1) compare all the sequences in a data set to each other, 2) group similar sequences together, and 3) output a representative sequence from each group. In this way, duplicate sequences are removed from a library. Results FastGroup was tested using a library of single-pass, bacterial 16S rDNA sequences cloned from coral-associated bacteria. We found that the optimal strategy for dereplicating these sequences was to: 1) trim ambiguous bases from the 5' end of the sequences and all sequence 3' of the conserved Bact517 site, 2) match the sequences from the 3' end, and 3) group sequences >=97% identical to each other. Conclusions The FastGroup program simplifies the dereplication of 16S rDNA sequence libraries and prepares the raw sequences for subsequent analyses.
Jana Sperschneider - Melampsora lini genome assembly and RNA-seq data
공공데이터포털
The Melampsora lini genome was sequenced to improve its genome assembly. PacBio HiFi and Hi-C data was generated as well as RNA-seq data.
NIST test dataset for assessing baseline nucleic acid sequence screening
공공데이터포털
This repository contains the dataset used in the manuscript "Inter-tool analysis of a NIST dataset for assessing baseline nucleic acid sequence screening". NIST constructed the test dataset based on the current screening recommendations from HHS. The dataset is a FASTA formatted file with blinded numerical sequence headers. The dataset was sent to sequence screening tool developers for initial testing and to obtain feedback about its utility for assessing baseline sequence screening. An additional metadata file provides the NIST-assigned label for each sequence, along with a more detailed description derived from the source database.
CottonGen Sequence Retrieval
공공데이터포털
,Sequence Retrieval allows users to download nucleotide and protein sequences including chromosomes, scaffolds, genes, mRNAs, transcript coding sequences, protein, reftrans contigs and unigene contigs. For the sequences aligned to larger sequences, such as genes, mRNAs and transcript coding sequences, a numeric value specifying the number of upstream bases and downstream bases can be entered. A video and text tutorial are provided for additional help information.,,
Datasets associated with journal article 'Combining phenotypic profiling and targeted RNA-Seq reveals linkages between transcriptional perturbations and chemical effects on cell morphology: Retinoic acid as an example' by Nyffeler, J, et.al.
공공데이터포털
We evaluated the Templated Oligo with Sequencing Readout (TempO-Seq) assay for High-throughput transcriptomic screening of a small set of environmental chemicals. This assay yields sequencing reads of exactly 50 base pairs that can be rapidly aligned to generate gene counts, and is compatible with cell lysates prepared in multiwell format. The version of the TempO-Seq assay we evaluated provides nearly whole transcriptome coverage (>20,000 genes). This study encompasses 2 replicates of a 384-well plate design. The majority of wells contain U-2 OS cells exposed to 11 test chemicals at 7 different concentrations (two replicate per test chemical, concentration, and plate), and additional reference samples and controls. Controls include DMSO vehicle treatments (22 per plate). Reference samples include bulk lysate MCF7 samples (2 DMSO treated and 2 TSA treated samples per plate), reference chemical treatments (3 chemicals at single conc each, per plate), and vendor-provided reference RNA mixtures (UHRR and HBRR). This dataset is associated with the following publication: Nyffeler, J., C. Willis, D. Harris, L. Taylor, R. Judson, L. Everett, and J. Harrill. Combining phenotypic profiles and targeted RNA-Seq reveals linkages between transcriptional perturbations and chemical effects on cell morphology: retinoic acid as an example.. TOXICOLOGY AND APPLIED PHARMACOLOGY. Academic Press Incorporated, Orlando, FL, USA, 444: 116032, (2022).
Jana Sperschneider - Rhizoctonia AG8-1 and AG8-3 genome data
공공데이터포털
Genome assemblies and sequencing data for two Rhizoctonia isolates, AG8-1 and AG8-3.
A simple method for generating full length cDNA from low abundance partial genomic clones
공공데이터포털
Background PCR amplification of target molecules involves sequence specific primers that flank the region to be amplified. While this technique is generally routine, its applicability may not be sufficient to generate a desired target molecule from two separate regions involving intron /exon boundaries. For these situations, the generation of full-length complementary DNAs from two partial genomic clones becomes necessary for the family of low abundance genes. Results The first approach we used for the isolation of full-length cDNA from two known genomic clones of Hox genes was based on fusion PCR. Here we describe a simple and efficient method of amplification for homeobox D13 (HOXD13) full length cDNA from two partial genomic clones. Specific 5' and 3' untranslated region (UTR) primer pairs and website program (primer3_www.cgv0.2) were key steps involved in this process. Conclusions We have devised a simple, rapid and easy method for generating cDNA clone from genomic sequences. The full length HOXD13 clone (1.1 kb) generated with this technique was confirmed by sequence analysis. This simple approach can be utilized to generate full-length cDNA clones from available partial genomic sequences.