교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

FastGroup: A program to dereplicate libraries of 16S rDNA sequences

Background Ribosomal 16S DNA sequences are an essential tool for identifying and classifying microbes. High-throughput DNA sequencing now makes it economically possible to produce very large datasets of 16S rDNA sequences in short time periods, necessitating new computer tools for analyses. Here we describe FastGroup, a Java program designed to dereplicate libraries of 16S rDNA sequences. By dereplication we mean to: 1) compare all the sequences in a data set to each other, 2) group similar sequences together, and 3) output a representative sequence from each group. In this way, duplicate sequences are removed from a library. Results FastGroup was tested using a library of single-pass, bacterial 16S rDNA sequences cloned from coral-associated bacteria. We found that the optimal strategy for dereplicating these sequences was to: 1) trim ambiguous bases from the 5' end of the sequences and all sequence 3' of the conserved Bact517 site, 2) match the sequences from the 3' end, and 3) group sequences >=97% identical to each other. Conclusions The FastGroup program simplifies the dereplication of 16S rDNA sequence libraries and prepares the raw sequences for subsequent analyses.

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/fastgroup-a-program-to-dereplicate-libraries-of-16s-rdna-sequences
라이선스
notspecified
비용
제공기관
U.S. Department of Health & Human Services
관리부서
데이터
- Official Government Data Source
- 랜딩 페이지

연관 데이터

Bacterial discrimination by means of a universal array approach mediated by LDR (ligase detection reaction)

공공데이터포털

Background PCR amplification of bacterial 16S rRNA genes provides the most comprehensive and flexible means of sampling bacterial communities. Sequence analysis of these cloned fragments can provide a qualitative and quantitative insight of the microbial population under scrutiny although this approach is not suited to large-scale screenings. Other methods, such as denaturing gradient gel electrophoresis, heteroduplex or terminal restriction fragment analysis are rapid and therefore amenable to field-scale experiments. A very recent addition to these analytical tools is represented by microarray technology. Results Here we present our results using a Universal DNA Microarray approach as an analytical tool for bacterial discrimination. The proposed procedure is based on the properties of the DNA ligation reaction and requires the design of two probes specific for each target sequence. One oligo carries a fluorescent label and the other a unique sequence (cZipCode or complementary ZipCode) which identifies a ligation product. Ligated fragments, obtained in presence of a proper template (a PCR amplified fragment of the 16s rRNA gene) contain either the fluorescent label or the unique sequence and therefore are addressed to the location on the microarray where the ZipCode sequence has been spotted. Such an array is therefore "Universal" being unrelated to a specific molecular analysis. Here we present the design of probes specific for some groups of bacteria and their application to bacterial diagnostics. Conclusions The combined use of selective probes, ligation reaction and the Universal Array approach yielded an analytical procedure with a good power of discrimination among bacteria.

Bacterial community on biofilms from MAIFAS reactors

공공데이터포털

Sequence reads (16S rDNA- and 16S rRNA-based) were processed and analyzed using Mothur software. The results presented in the attached Excel file. Also, the other MS word file includes taxonomic summary tables for bacterial communities on biofilms from the MAIFAS reactor as well as the detailed description of Materials & Methods. This dataset is associated with the following publication: Church, J., H. Ryu, A. Sadmani, A. Randall, J. Santodomingo, and W.H. Lee. Multiscale investigation of a symbiotic microalgal-integrated fixed film activated sludge (MAIFAS) process for nutrient removal and photo-oxygenation. Bioresource Technology. Elsevier Online, New York, NY, USA, 268: 128-138, (2018).

Sequencing Data for Hospital Metagenomes

공공데이터포털

FASTA files containing the sequence data and for Assembled contigs (FastA), Predicted genes (FastA), Predicted proteins (FastA), Gene prediction (GFF v2). This dataset is not publicly accessible because: These are sequences that have already been deposited in publicly available databases and therefore we can avoid replication. Also the data is quite large and there are numerous files associated with these entries, which are included in the links below. It can be accessed through the following means: Using the following web links https://www.ncbi.nlm.nih.gov/bioproject/PRJNA299404 https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP065069 http://enve-omics.ce.gatech.edu/data/showerheads. Format: The data represent genome sequencing and assembly of 180 different contigs. This dataset is associated with the following publication: Soto-Giron, M.J., L. Rodriguez, C. Luo , M. Elk, H. Ryu, J. Santodomingo , and K. Konstantinidis. Biofilms on Hospital Shower Hoses: Characterization and Implications for Nosocomial Infections. APPLIED AND ENVIRONMENTAL MICROBIOLOGY. American Society for Microbiology, Washington, DC, USA, 82(9): 2872-2883, (2016).

Data from: Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

공공데이터포털

,Amplicon sequencing utilizing next-generation platforms has significantly transformed how research is conducted, specifically microbial ecology. However, primer and sequencing platform biases can confound or change the way scientists interpret these data. The Pacific Biosciences RSII instrument may also preferentially load smaller fragments, which may also be a function of PCR product exhaustion during sequencing. To further examine theses biases, data is provided from 16S rRNA rumen community analyses. Specifically, data from the relative phylum-level abundances for the ruminal bacterial community are provided to determine between-sample variability. Direct sequencing of metagenomic DNA was conducted to circumvent primer-associated biases in 16S rRNA reads and rarefaction curves were generated to demonstrate adequate coverage of each amplicon. PCR products were also subjected to reduced amplification and pooling to reduce the likelihood of PCR product exhaustion during sequencing on the Pacific Biosciences platform. The taxonomic profiles for the relative phylum-level and genus-level abundance of rumen microbiota as a function of PCR pooling for sequencing on the Pacific Biosciences RSII platform were provided.,Data is within this article and raw ruminal MiSeq sequence data is available from the NCBI Sequence Read Archive (SRA Accession SRP047292). Additional descriptive information is associated with NCBI BioProject PRJNA261425. http://www.ncbi.nlm.nih.gov/bioproject/PRJNA261425/,,

Data from: Agile Genetics: Single gene resolution without the fuss

공공데이터포털

,These files are 250bp Illumina MiSeq paired-end sequencing reads in fastq format. Libraries were prepared from DNA fragments amplified from tomato bulk (heterogenous) samples around the fruit weight locus.,

Diatom DNA sequence data

공공데이터포털

The raw data consisted of demultiplexed fastq files pairs (R1.fastq and R2.fastq) per sample accessible on the NCBI Sequences Read Archive (SRA) under the BioProject accession numbers PRJNA1187555 for experiments E1 and E3 and PRJNA1187576 for E2 and E4. This dataset is associated with the following publication: Valentin, V., S. Rivera, E. Acs, S. Almeida, K. Andree, L. Apothéloz-Perret-Gentil, B. Bailet, A. Baričević, K. Beentjes, J. Bettig, A. Bouchez, C. Camilla, C. Chardon, M. Duleba, T. Elersek, C. Genthon, M. Jablonska, L. Jacas, M. Kahlert, M. Kelly, J. Macher, F. Mauri, M. Moletta-Denat, A. Mortágua, J. Pawlowski, J. Pérez-Burillo, M. Pfannkuchen, E. Pilgrim, P. Panayiota, F. Rimet, K. Stanic, K. Tapolczai, S. Theroux, R. Trobajo, B. Van der Hoorn, M. Vasquez, M. Vidal, D. Wanless, J. Warren, J. Zimmermann, and B. Paix. Proficiency testing and cross-laboratory method comparison to support standardisation of diatom DNA metabarcoding for freshwater biomonitoring. Metabarcoding and Metagenomics. Pensoft Publishers, Sofia, BULGARIA, e133264, (2025).

ITSxpress: Software to rapidly trim internal transcribed spacer sequences with quality scores for amplicon sequencing

공공데이터포털

,The ribosomal RNA (rRNA) internal transcribed spacer (ITS) regions are commonly used to identify fungi and other eukaryotic taxa in amplicon sequencing. The highly conserved rRNA regions flanking the ITS need to be trimmed before being used for taxonomic assignment. The Python software package ITSxpress rapidly trims single-end or paired-end sequences in FASTQ format for use in amplicon sequence variant clustering methods like DADA2. This new major release of ITSxpress improves the paired-end merging method, simplifies installation of the QIIME 2 ITSxpress plugin, removes major dependencies, adds use cases, and is compatible with newer compression formats. This paper discusses the modifications to ITSxpress that improve the output and user experience, leading to a major version increase.,

공공데이터포털

UniVec and UniVec_Core databases in FASTA format.

공공데이터포털

System for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.

Microbial monitoring in the ISS-Kibo

공공데이터포털

Continuous monitoring of bacterial community structure in the ISS-Kibo. This data contains numerous sequence reads of 16S rRNA gene fragments.

목록