Leptinotarsa decemlineata genome assembly 1.0
공공데이터포털
,The Baylor College of Medicine recently sequenced and annotated the Leptinotarsa decemlineata genome as part of the i5k pilot project. This dataset presents the Leptinotarsa decemlineata genome v1.0. This assembly version is the pre-release version, prior to filtering and quality control by the National Center for Biotechnology Information's GenBank resource. Assembly method details will be available in a forthcoming publication.,The Colorado potato beetle is considered the economically most significant defoliator of potato in northern latitudes worldwide. The range of this insect is continuing to expand, and it is likely to eventually colonize all potato-producing areas with temperate climate. Within it's native habitat, the beetle feeds on native solanaceous plants, S. angustifolium, S. elaeagnifolium, and buffalo bur, S. rostratum. However, it has adapted to potatoes and other solanaceous crops after its range expansion.,Due to the lack of any natural enemies that have been able to evolve seasonal adaptations, the cornerstone of Colorado potato beetle management has been the use of insecticides. However, the beetle has shown a remarkable ability to develop resistance to most insecticides used for its control. The mechanism(s) of insecticide resistance is yet unknown and genomic sequencing will lead to major advances in managing this pest in commercial plant production.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
Leptinotarsa decemlineata genome annotations v0.5.3
공공데이터포털
,The Leptinotarsa decemlineata genome was recently sequenced and annotated as part of the i5k pilot project by the Baylor College of Medicine. This dataset presents the Leptinotarsa decemlineata gene set BCM_v_0.5.3, which was generated computationally. RNA-Seq data was used with additional protein homology data for a MAKER automated annotation of the Leptinotarsa decemlineata genome assembly 1.0. Further annotation method details will be available in a forthcoming publication.,NOTE: This gene set is an unstable pre-release (v0.5.3), and was provided to facilitate manual curation and analyses before the official gene set is released. Gene identifiers from this gene set will likely not be maintained.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
Agrilus planipennis community manual annotations
공공데이터포털
,Manual annotation at the i5k Workspace@NAL (https://i5k.nal.usda.gov) is the review and improvement of gene models derived from computational gene prediction. Community curators compare an existing gene model to evidence such as RNA-Seq or protein alignments from the same or closely related species and modify the structure or function of the gene accordingly, typically following the i5k Workspace@NAL manual annotation guidelines (https://i5k.nal.usda.gov/content/rules-web-apollo-annotation-i5k-pilot-project). If a gene model is missing, the annotator can also use this evidence to create a new gene model. Because manual annotation, by definition, improves or creates gene models where computational methods have failed, it can be a powerful tool to improve computational gene sets, which often serve as foundational datasets to facilitate research on a species.,Here, community curators used manual annotation at the i5k Workspace@NAL to improve computational gene predictions from the dataset Agrilus planipennis genome annotations v0.5.3. The i5k Workspace@NAL set up the Apollo v1 manual annotation software and multiple evidence tracks to facilitate manual annotation. From 2014-10-20 to 2018-07-12, five community curators updated 263 genes, including developmental genes; cytochrome P450s; cathepsin peptidases; cuticle proteins; glycoside hydrolases; and polysaccharide lyases. For this dataset, we used the program LiftOff v1.6.3 to map the manual annotations to the genome assembly GCF_000699045.2. We computed overlaps with annotations from the RefSeq database using gff3_merge from the GFF3toolkit software v2.1.0. FASTA sequences were generated using gff3_to_fasta from the same toolkit. These improvements should facilitate continued research on Agrilus planipennis, or emerald ash borer (EAB), which is an invasive insect pest.,While these manual annotations will not be integrated with other computational gene sets, they are available to view at the i5k Workspace@NAL (https://i5k.nal.usda.gov) to enhance future research on Agrilus planipennis.,
HoloBee Database v2016.1
공공데이터포털
,Organisms living in honey bees and honey bee colonies form large associative holobiont communities that are integral to bee biology. High-throughput sequencing approaches to characterize these holobiont communities from honey bees in various states of health and disease are now commonplace, producing large amounts of nucleotide sequence data that must be accurately and consistently analyzed in order to produce reliable and comparable reports. In addition, new species designations and revisions are actively being made from honey bee holobiont communities, complicating nomenclature in larger databases where taxonomic descriptions associated with archived sequences can quickly become outdated and misleading.,To improve the accuracy and consistency of honey bee holobiont research, we have developed HoloBee: a curated database of publicly accessioned nucleotide sequences from the honey bee holobiont community. Except in rare and noted exceptions made by curators, sequences used in HoloBee were obtained from, or in association with, Apis mellifera (Western honey bee) as well as other honey bee species where available (e.g. Apis cerana, Apis dorsata, Apis laboriosa, Apis koschevnikovi, Apis florea, Apis andreniformis and Apis nigrocincta). Sources include: within or on the surface of honey bees (adult, pupae, larvae, egg), corbicular pollen, bee bread, royal jelly, honey, comb, hive surfaces (e.g. bottom board debris, frames, landing platforms), and isolates of microbes, parasites and pathogens from honey bees. HoloBee contains two non-overlapping sets of sequence data, HoloBee-Barcode and HoloBee-Mop, each of which have distinct intended uses.,HoloBee-Barcode is a non-redundant database of taxonomically informative barcoding loci for all viruses, bacteria, fungi, protozoans and metazoans associated with honey bees (Apis spp.). It was created from an exhaustive master sequence archive of all valid holobiont sequences. Redundancy was removed from this master archive using a clustering algorithm that grouped sequences with ≥ 99% identity and retained the longest sequence from each cluster as the representative accession for that sequence type (“centroid”). These centroid sequences were concatenated into a fasta formatted file to create the HoloBee-Barcode database. Associated taxonomy for each centroid, including Superkingdom through Species and Strain/Isolate, was individually reviewed and corrected when necessary by a curator. Cross reference tables (separated according to 5 major taxonomic groups) provide a user-friendly outline of information for each centroid accession within HoloBee-Barcode including taxonomy, gene/product name, sequence length, the unaltered NCBI definition line, the number and identity of redundant sequences clustered within each centroid, and any additional information provided by the curator. HoloBee-Barcode centroid counts are: Viruses = 86; Bacteria = 496; Fungi = 41; Protozoa = 4; Metazoa = 60.,HoloBee-Barcode is intended to improve and standardize quantitative and qualitative metagenomic descriptions of holobiont communities associated with honey bees by providing a curated set of barcode sequences. The goal of genetic barcoding is to associate a nucleotide sequence sample to a taxonomically valid species. Genomic regions targeted for such barcoding purposes varied by taxonomic group. The small subunit (SSU) ribosomal RNA, or 16S rRNA, is the most commonly used barcode for bacteria and is used in HB-Barcode. These 16S rRNA sequences will support the analysis of data generated with the widely used approach of amplicon-based 16S rRNA deep sequencing to study microbiota communities. Although barcode markers for fungi are less definitive than bacteria, HB-Barcode defaults to the ribosomal RNA internal transcribed spacer region (ITS), which typically includes ITS-1, 5.8S, and ITS-2. For some clades that cannot be resolved by this region, other barcode markers were selected. The majority of barcodes for
Oncopeltus fasciatus genome annotations v0.5.3
공공데이터포털
,The Oncopeltus fasciatus genome was recently sequenced and annotated as part of the i5k pilot project by the Baylor College of Medicine. The O. fasciatus research community has manually reviewed and curated the computational gene predictions and generated an official gene set, OGSv1.1.,Oncopeltus fasciatus has been an established lab organism for over 60 years, and has been used for a wide range of studies from physiology to development and evolution. As a relatively conservative and generalized species, it affords a baseline against which other species can be compared.,For example, this species has the same piercing and sucking type mouthparts as its less benign relatives, including the blood-sucking kissing bug, Rhodnius prolixus, and the brown marmorated stink bug, Halyomorpha halys, which are disease vector and agricultural pest species, respectively. Unlike the pest species, the benign, seed-feeding Oncopeltus can be functionally investigated in the lab by RNA interference (RNAi). Comparing the genomes, and conducting experimental lab work in Oncopeltus, will help to identify unique features of the pest species, and thus inform management strategies for them.,More generally, Oncopeltus is a key species for comparisons across the insects. It is one of the few experimentally tractable hemimetabolous species that can ground comparisons with the completely metamorphosing species of the Holometabola (e.g., flies, beetles, wasps). Topics investigated in this framework include reproductive biology and development of the legs, wings, body segments, extraembryonic membranes, and overall establishment of the body plan.,This dataset presents the Oncopeltus fasciatus gene set BCM_v_0.5.3, which was generated computationally. RNA-Seq data was used with additional protein homology data for a MAKER automated annotation of the Oncopeltus fasciatus genome assembly 1.0. Further annotation method details will be available in a forthcoming publication.,NOTE: This gene set is an unstable pre-release (v0.5.3), and was provided to facilitate manual curation and analyses before the official gene set is released. Gene identifiers from this gene set will likely not be maintained.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
Leptinotarsa decemlineata Official Gene set v1.2
공공데이터포털
,The Leptinotarsa decemlineata genome was recently sequenced and annotated as part of the i5k pilot project by the Baylor College of Medicine. The L. decemlineata research community has manually reviewed and curated the computational gene predictions and generated an official gene set, OGSv1.2. OGSv1.1 is an integration of automatic gene predictions from Maker (performed by Dan Hughes at Baylor College of Medicine) with manual annotations by the research community (done via the Apollo manual annotation software). The coordinates of OGSv1.1 were converted to the latest genome assembly, GCF_000500325.1, using coordinates_conversion and remap-gff3, to generate OGSv1.2.,If you wish to use this dataset, please follow the Baylor College of Medicine's conditions for data use: https://www.hgsc.bcm.edu/bcm-hgsc-conditions-use,
In-house annotated gene set for the pecan weevil, Curculio caryae
공공데이터포털
,This in-house annotated gene set was created using the following methods.,RNA was isolated from the head and thorax segments of one adult male and one adult female pecan weevil using the NucleoMag RNA Kit (Macherey-Nagel, Düren, Germany, 744350.1) according to kit protocols. Isolated RNA was processed into PacBio Kinnex sequencing libraries using the Iso-Seq express 2.0 kit (Pacific Biosciences, Menlo Park, CA, USA 103-071-500) and Kinnex full-length RNA kit (Pacific Biosciences, Menlo Park, CA, USA,103-072-000). The prepared library was bound and sequenced at the USDA-ARS Veterinary Pest Genetics Research Unit in Kerrville, Texas, on two Pacific Biosciences SMRT cell trays with a Revio system (Pacific Biosciences, Menlo Park, CA, USA, 102-202-200) beginning with a 2-h pre-extension followed by a 30-h movie collection time. After sequencing, circular consensus sequences from the PacBio Sequel Revio subreads were obtained using the SMRTLink v13.0 software. Reads were subsequently mapped to the repeat-masked genome assembly using minimap2 with arguments for spliced nucleotide sequences (-ax splice:hq) to generate sam mapping files. These were then compressed into bam files using samtools view -bS and used as input for gene model prediction with the Braker version 3.0.8 program (https://github.com/Gaius-Augustus/BRAKER), generating 72,879 gene models. These gene models and amino acid protein predictions were further curated and annotated with gene ontologies and protein domains using InterProScan-5.73-104.0 with PANTHER-19.0 and Pfam-37.2 databases (https://github.com/ebi-pf-team/interproscan), resulting in 19,508 InterProScan results.,