데이터셋 상세
미국
Agrilus planipennis community manual annotations
,Manual annotation at the i5k Workspace@NAL (https://i5k.nal.usda.gov) is the review and improvement of gene models derived from computational gene prediction. Community curators compare an existing gene model to evidence such as RNA-Seq or protein alignments from the same or closely related species and modify the structure or function of the gene accordingly, typically following the i5k Workspace@NAL manual annotation guidelines (https://i5k.nal.usda.gov/content/rules-web-apollo-annotation-i5k-pilot-project). If a gene model is missing, the annotator can also use this evidence to create a new gene model. Because manual annotation, by definition, improves or creates gene models where computational methods have failed, it can be a powerful tool to improve computational gene sets, which often serve as foundational datasets to facilitate research on a species.,Here, community curators used manual annotation at the i5k Workspace@NAL to improve computational gene predictions from the dataset Agrilus planipennis genome annotations v0.5.3. The i5k Workspace@NAL set up the Apollo v1 manual annotation software and multiple evidence tracks to facilitate manual annotation. From 2014-10-20 to 2018-07-12, five community curators updated 263 genes, including developmental genes; cytochrome P450s; cathepsin peptidases; cuticle proteins; glycoside hydrolases; and polysaccharide lyases. For this dataset, we used the program LiftOff v1.6.3 to map the manual annotations to the genome assembly GCF_000699045.2. We computed overlaps with annotations from the RefSeq database using gff3_merge from the GFF3toolkit software v2.1.0. FASTA sequences were generated using gff3_to_fasta from the same toolkit. These improvements should facilitate continued research on Agrilus planipennis, or emerald ash borer (EAB), which is an invasive insect pest.,While these manual annotations will not be integrated with other computational gene sets, they are available to view at the i5k Workspace@NAL (https://i5k.nal.usda.gov) to enhance future research on Agrilus planipennis.,
데이터 정보
연관 데이터
Homalodisca vitripennis genome annotations v0.5.3
공공데이터포털
,The Homalodisca vitripennis genome was recently sequenced and annotated as part of the i5k pilot project by the Baylor College of Medicine.,The Glassy-winged sharpshooter, GWSS, (Homalodisca vitripennis) [Hemiptera: Cicadellidae], occurs naturally within the southern United States. Once restricted to the southeastern states, it was accidentally spread across the south into California. The GWSS is a voracious feeder, and can fly long distances, preferring to feed upon cultivated crops, ie. Grapevine, fruit trees, and in the nymphal stages many weeds and grasses. The GWSS is a serious threat to the viticulture industry as the primary vector of the plant-infecting bacterium, Xylella fastidiosa, Xf. The GWSS feeds on a diverse number of plants, during which the bacteria can infect many tree fruit, nut, vine, and woody ornamental crops. Glassy-winged Sharpshooter adults are ½ inch (13mm) long being fairly large for the Sharpshooter leafhopper family of insects. Sharpshooters use an ovipositor to lay eggs inside of the underside of leaves. The Sharpshooter will lay its eggs on almost any plant including cactus. The egg masses are usually composed of 10-20 eggs, but can lay more or as few as 1. Most of the egg masses have a waxy coating of brocosomes around the eggs for protection. The nymphs (5 instars) do not have wings, but develop wing pads in the 5th instar and are generally smaller than the adults, ranging in size from .07 inches (2 mm) to nearly ½ inch (13mm) long. The nymphs have very distinct red eyes. The Sharpshooter can consume about 300 times its own weight in fluids from the xylem vessels of the plants upon which it feeds, thus producing copious amounts of excreta fluid.,This dataset presents the Homalodisca vitripennis genome v1.0. This assembly version is the pre-release version, prior to filtering and quality control by the National Center for Biotechnology Information's GenBank resource (https://www.ncbi.nlm.nih.gov/assembly/GCA_000696855.1). Assembly method details will be available in a forthcoming publication.,NOTE: This gene set is an unstable pre-release (v0.5.3), and was provided to facilitate manual curation and analyses before the official gene set is released. Gene identifiers from this gene set will likely not be maintained.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
Leptinotarsa decemlineata genome assembly 1.0
공공데이터포털
,The Baylor College of Medicine recently sequenced and annotated the Leptinotarsa decemlineata genome as part of the i5k pilot project. This dataset presents the Leptinotarsa decemlineata genome v1.0. This assembly version is the pre-release version, prior to filtering and quality control by the National Center for Biotechnology Information's GenBank resource. Assembly method details will be available in a forthcoming publication.,The Colorado potato beetle is considered the economically most significant defoliator of potato in northern latitudes worldwide. The range of this insect is continuing to expand, and it is likely to eventually colonize all potato-producing areas with temperate climate. Within it's native habitat, the beetle feeds on native solanaceous plants, S. angustifolium, S. elaeagnifolium, and buffalo bur, S. rostratum. However, it has adapted to potatoes and other solanaceous crops after its range expansion.,Due to the lack of any natural enemies that have been able to evolve seasonal adaptations, the cornerstone of Colorado potato beetle management has been the use of insecticides. However, the beetle has shown a remarkable ability to develop resistance to most insecticides used for its control. The mechanism(s) of insecticide resistance is yet unknown and genomic sequencing will lead to major advances in managing this pest in commercial plant production.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
Oncopeltus fasciatus genome annotations v0.5.3
공공데이터포털
,The Oncopeltus fasciatus genome was recently sequenced and annotated as part of the i5k pilot project by the Baylor College of Medicine. The O. fasciatus research community has manually reviewed and curated the computational gene predictions and generated an official gene set, OGSv1.1.,Oncopeltus fasciatus has been an established lab organism for over 60 years, and has been used for a wide range of studies from physiology to development and evolution. As a relatively conservative and generalized species, it affords a baseline against which other species can be compared.,For example, this species has the same piercing and sucking type mouthparts as its less benign relatives, including the blood-sucking kissing bug, Rhodnius prolixus, and the brown marmorated stink bug, Halyomorpha halys, which are disease vector and agricultural pest species, respectively. Unlike the pest species, the benign, seed-feeding Oncopeltus can be functionally investigated in the lab by RNA interference (RNAi). Comparing the genomes, and conducting experimental lab work in Oncopeltus, will help to identify unique features of the pest species, and thus inform management strategies for them.,More generally, Oncopeltus is a key species for comparisons across the insects. It is one of the few experimentally tractable hemimetabolous species that can ground comparisons with the completely metamorphosing species of the Holometabola (e.g., flies, beetles, wasps). Topics investigated in this framework include reproductive biology and development of the legs, wings, body segments, extraembryonic membranes, and overall establishment of the body plan.,This dataset presents the Oncopeltus fasciatus gene set BCM_v_0.5.3, which was generated computationally. RNA-Seq data was used with additional protein homology data for a MAKER automated annotation of the Oncopeltus fasciatus genome assembly 1.0. Further annotation method details will be available in a forthcoming publication.,NOTE: This gene set is an unstable pre-release (v0.5.3), and was provided to facilitate manual curation and analyses before the official gene set is released. Gene identifiers from this gene set will likely not be maintained.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
Leptinotarsa decemlineata Official Gene set v1.2
공공데이터포털
,The Leptinotarsa decemlineata genome was recently sequenced and annotated as part of the i5k pilot project by the Baylor College of Medicine. The L. decemlineata research community has manually reviewed and curated the computational gene predictions and generated an official gene set, OGSv1.2. OGSv1.1 is an integration of automatic gene predictions from Maker (performed by Dan Hughes at Baylor College of Medicine) with manual annotations by the research community (done via the Apollo manual annotation software). The coordinates of OGSv1.1 were converted to the latest genome assembly, GCF_000500325.1, using coordinates_conversion and remap-gff3, to generate OGSv1.2.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
Greater sage-grouse genetic warning system, western United States
공공데이터포털
Genetic variation is a well-known indicator of population fitness yet is not typically included in monitoring programs for sensitive species. Additionally, most programs monitor populations at one scale, which can lead to potential mismatches with ecological processes critical to species’ conservation. Recently developed methods generating hierarchically nested population units (i.e., clusters of varying scales) for greater sage-grouse (Centrocercus urophasianus) have identified population trend declines across spatiotemporal scales to help managers target areas for conservation. The same clusters used as a proxy for spatial scale can alert managers to local units (i.e., fine-scale) with low genetic diversity relative to regional units (i.e., coarse-scale), further facilitating identification of management targets. We developed a genetic warning system utilizing previously developed hierarchical population units to identify management-relevant areas with low genetic diversity within the greater sage-grouse range. Within this warning system we characterized conservation concern thresholds based on values of genetic diversity for hierarchically nested populations. We developed a spatial data set to display genetic diversity values and conservation concern information from a Genetic Warning System (GWS) for population monitoring of greater sage-grouse, as described in Zimmerman et al. (2022). Here we added the genetic diversity estimates (allelic richness and expected heterozygosity) and GWS information as attributes to the relevant fine-scale (level 2) and coarse-scale (level 13) previously developed hierarchically nested population clusters (O’Donnell et al. 2019, O’Donnell et al. 2022). The GWS incorporates population trend decline watches and warnings from the Targeted Annual Warning System (TAWS) for greater sage-grouse as reported in Coates et al. (2021) to further refine degree of conservation concern.
Salmonella enterica pangenome graph and variant call data for 539,283 genomes
공공데이터포털
,Salmonella pangenome graph and variant call data for 539,283 genomes,Salmonella enterica causes human disease and decreases agricultural production. The overall goals of this project is to generate a large database of S. enterica variants with 539,283 samples and 236,069 features for applications in machine learning and genomics. We transformed single nucleotide polymorphism (SNP) data into reduced dimensional representations which are tolerant of missing data based on disentangled variational autoencoders. TFRecord files were made with custom Python scripts that parsed the variant call formats (VCF) into sparse tensors and combined them with the Salmonella In Silico Typing Resource (SISTR) serotype data.,The data directory contains:,Google BigQuery was used to download metadata for the SRA accessions from the National Institute of Health (NIH).,Files were processed into batches of ~10,000 and named Sra_completed_XX.csv (00--53).,SCINet users: The data folder can be accessed/retrieved with valid SCINet account at this location: /LTS/ADCdatastorage/NAL/published/node28083194/,See the SCINet File Transfer guide for more information on moving large files: https://scinet.usda.gov/guides/data/datatransfer,Globus users: The files can also be accessed through Globus by following this data link. The user will need to log in to Globus in order to access this data. User accounts are free of charge with several options for signing on. Instructions for creating an account are on the login page.,
GrainGenes- A Global Data Repository for Small Grains
공공데이터포털
,GrainGenes is an international, centralized crop database for peer-reviewed small grains data and information portal that serves the small grains research and breeding communities (wheat, barley, oat, and rye). The GrainGenes project ensures long-term data curation, accessibility, and sustainability so that small grains researchers can develop new, more nutritious, disease and pest resistant, high yielding cultivars. As a digital platform, GrainGenes houses peer-reviewed and curated genetic, genomic, and protein data. It has been hard-funded by the U.S. Department of Agriculture-Agricultural Research Service to ensure long-term data sustainability through a functional and integrated web interface for wheat, barley, oat, and rye.,,
Gene Data Tables from: Pangenomics Links Boll Weevil Divergence with Ancient Mesoamerican Cotton Cultivation
공공데이터포털
,These supplementary data tables represent gene annotations, gene ontology terms (GO), and gene coordinates from the boll weevil (Anthonomus grandis grandis, Agg) reference genome that appear in selectively swept regions and structural variations between the subspecies Agg and Agt (Anthonomus grandis thurberia). These genes were determined from two computer programs for selective analysis: RAiSD and PCAdapt. The gene tables are organized by shared/unique genes per sub-species and shared/unique genes residing on SVs, structural variations, per sub-species, which is based on >80% sequence identity. These data provide supplementary details for the enrichment tests that were carried out to associate gene function with subspecies biology, described in more detail in the manuscript: Pangenomics Links Boll Weevil Divergence with Ancient Mesoamerican Cotton Cultivation. This work is relevant to cotton managers throughout United States cotton growing regions, our international collaborators, as well as genome biologists investigating structural changes in genome architecture and their association with adaptation and trait plasticity.,
Manual annotations of Rhyzopertha dominica genome assembly RdoDt3 Drdd8 decomES
공공데이터포털
,This dataset contains manual annotations from Rhyzopertha dominica community curators, based on genome assembly RdoDt3_Drdd8_decomES.fasta.gz. These annotations are direct exports from Apollo 2.6 (https://doi.org/10.5281/zenodo.5015109), hosted by the i5k Workspace@NAL (https://i5k.nal.usda.gov/). Manual annotations are temporary and will be reviewed by the i5k Workspace@NAL and submitted to NCBI's GenBank database after review.,