데이터셋 상세
미국
Whole Genome Shotgun Submissions
Whole Genome Shotgun (WGS) projects are genome assemblies of incomplete genomes or incomplete chromosomes of prokaryotes or eukaryotes that are generally being sequenced by a whole genome shotgun strategy. WGS projects may be annotated, but annotation is not required. NCBI has a Prokaryotic Genomes Annotation Pipeline that may be requested at the time the genome files are submitted to GenBank. This pipeline generates a submission-ready annotated file that is posted back to the submitter for review and which the submitter could edit prior to data release. The public WGS projects are at the list of WGS projects. https://www.ncbi.nlm.nih.gov/Traces/wgs/
데이터 정보
연관 데이터
Sequence Set Browser
공공데이터포털
This site is for browsing WGS (Whole Genome Shotgun) genomes, TSA (Transcriptome Shotgun Assemblies) and TLS (Targeted Locus Study) sets. WGS sequences are incomplete genomes that have been sequenced by a whole genome shotgun strategy. TSA sequences are transcript sequences that have been computationally assembled from primary RNA sequence data. TLS sequences are large-scale marker gene sequencing studies. Please consult WGS Submission or TSA Submission pages for more details. https://www.ncbi.nlm.nih.gov/genbank/wgs https://www.ncbi.nlm.nih.gov/genbank/tsa
Transcriptome Shotgun Assembly (TSA) Sequence Database and Submissions
공공데이터포털
TSA is an archive of computationally assembled transcript sequences from primary data such as ESTs and Next Generation Sequencing Technologies. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs. The primary sequence data used in the assemblies must have been experimentally determined by the same submitter. TSA sequence records differ from GenBank records because there are no physical counterparts to the assemblies.
Phenotype-Genotype Integrator (PheGenI)
공공데이터포털
Supports finding human phenotype/genotype relationships with queries by phenotype, chromosome location, gene, and SNP identifiers. Currently includes information from dbGaP, the National Human Genome Research Institute (NHGRI) genome-wide association study (GWAS) Catalog, and Genotype - Tissue Expression (GTeX).
Genome In A Bottle - v2.0 Genome Stratifications (Deprecated)
공공데이터포털
These stratification BED files from the Global Alliance for Genomics and Health (GA4GH) Benchmarking Team and the Genome in a Bottle Consortium are intended as a standard resource of BED files for use in stratifying true positive, false positive, and false negative variant calls. These v2.0 stratification BED files from the Global Alliance for Genomics and Health (GA4GH) Benchmarking Team and the Genome in a Bottle Consortium are intended as a standard resource of BED files for use in stratifying true positive, false positive, and false negative variant calls. v2.0 stratifications have been deprecated and replaced by v3.0 genome-stratifications.
Database of Genotype and Phenotype (dbGaP)
공공데이터포털
Database of Genotype and Phenotype (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.
Sequence Read Archive (SRA)
공공데이터포털
The Sequence Read Archive (SRA) stores sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System®, Helicos Biosciences Heliscope®, Complete Genomics®, and Pacific Biosciences SMRT®.
RefSeq: NCBI Reference Sequence Database
공공데이터포털
A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.
Database of Short Genetic Variations (dbSNP)
공공데이터포털
Database of Short Genetic Variations (dbSNP) contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.