데이터셋 상세
미국
Data from: Assessing metabolomic and chemical diversity of a soybean lineage representing 35 years of breeding
,Information on crop genotype- and phenotype-metabolite associations can be of value to trait development as well as to food security and safety. The unique study presented here assessed seed metabolomic and ionomic diversity in a soybean (Glycine max) lineage representing ~35 years of breeding (launch years 1972–2008) and increasing yield potential. Selected varieties included six conventional and three genetically modified (GM) glyphosate-tolerant lines. A metabolomics approach utilizing capillary electrophoresis (CE)-time-of-flight-mass spectrometry (TOF-MS), gas chromatography (GC)-TOF-MS and liquid chromatography (LC)-quadrupole (q)-TOFMS resulted in measurement of a total of 732 annotated peaks. Ionomics through inductively-coupled plasma (ICP)-MS profiled twenty mineral elements. Orthogonal partial least squares-discriminant analysis (OPLS-DA) of the seed data successfully differentiated newer higher-yielding soybean from earlier lower-yielding accessions at both field sites. This result reflected genetic fingerprinting data that demonstrated a similar distinction between the newer and older soybean. Correlation analysis also revealed associations between yield data and specific metabolites. There were no clear metabolic differences between the conventional and GM lines. Overall, observations of metabolic and genetic differences between older and newer soybean varieties provided novel and significant information on the impact of varietal development on biochemical variability. Proposed applications of omics in food and feed safety assessments will need to consider that GM is not a major source of metabolite variability and that trait development in crops will, of necessity, be associated with biochemical variation.,,
연관 데이터
Data from: Genetic variation among 481 diverse soybean accessions
공공데이터포털
,This data is from the manuscript titled: "Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing". SNP calls were obtained from resequencing 481 diverse soybean lines comprising 52 wild (Glycine soja) and 429 cultivated (Glycine max). This dataset contains 6 gzipped VCF (Variant Call Format) files with variant calls for all 481 USB accessions, all G. max accessions, G. soja accessions, accessions sequenced at 15x coverage, accessions sequenced at 40x coverage, and 106 accessions re-sequenced from a previous study (Valliyodan et al. 2016). SNPs were called using the Haplotype caller algorithm from the Genome Analysis Toolkit (GATK) version gatk-2.5-2-gf57256b. A total of 7.8 million SNPs were identified between the 481 re-sequenced accessions. SNPs were assigned IDs using the script "assign_name.awk" available at https://github.com/soybase/SoySNP-Names. SNP effects were predicted using SnpEff 3.0.,Dataset also available at https://soybase.org/data/v2/Glycine/max/diversity/Wm82.gnm2.div.Valliyodan_Brown_2021/,Funding support provided by the United Soybean Board for the large-scale sequencing of soybean genomes (project #1320-532-5615), Bayer (previously Monsanto and Bayer), and Corteva (previously Dow AgroSciences), with in-kind support for analysis from USDA Agricultural Research Service project 5030-21000-069-00-D.,Resources in this dataset:,,
Genomic regions that underlie soybean seed isoflavone content
공공데이터포털
Soy products contain isoflavones (genistein, daidzein, and glycitein) that display biological effects when ingested by humans and animals, these effects are species, dose and age dependent. Therefore, the content and quality of isoflavones in soybeans is a key to their biological effect. Our objective was to identify loci that underlie isoflavone content in soybean seeds. The study involved 100 recombinant inbred lines (RIL) from the cross of ‘Essex' by ‘Forrest,' two cultivars that contrast for isoflavone content. Isoflavone content of seeds from each RIL was determined by high performance liquid chromatography (HPLC). The distribution of isoflavone content was continuous and unimodal. The heritability estimates on a line mean basis were 79% for daidzein, 22% for genistein, and 88% for glycitein. Isoflavone content of soybean seeds was compared against 150 polymorphic DNA markers in a one-way analysis of variance. Four genomic regions were found to be significantly associated with the isoflavone content of soybean seeds across both locations and years. Molecular linkage group B1 contained a major QTL underlying glycitein content (P = 0.0001, R2 = 50.2%), linkage group N contained a QTL for glycitein (P = 0.0033, R2 = 11.1%) and a QTL for daidzein (P = 0.0023, R2 = 10.3%) and linkage group A1 contained a QTL for daidzein (P = 0.0081, R2 = 9.6%). Selection for these chromosomal regions in a marker assisted selection program will allow for the manipulation of amounts and profiles of isoflavones (genistein, daidzein, and glycitein) content of soybean seeds. In addition, tightly linked markers can be used in map based cloning of genes associated with isoflavone content.
Data from: Development of a versatile resource from 1500 diverse genomes for post-genomics research
공공데이터포털
,This data set contains 32 million annotated SNPs having an average SNP density of 30 SNPs per kb and 12 non-synonymous SNPs per gene model. These SNPs were identified from a genetically diverse, worldwide, collection of soybean germplasm representing wild, landrace, and improved cultivars. A combination of new and publicly available re-sequencing data was used in this analysis. The accession genotypes and their annotations are described in the manuscript titled: "Analysis and characterization of 1500 diverse genome sequences as a versatile resource for post-genomics research".,,
Data from: Phased Genotyping-by-Sequencing Enhances Analysis of Genetic Diversity and Reveals Divergent Copy Number Variants in Maize
공공데이터포털
,High-throughput sequencing (HTS) of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken from heterogeneous populations of heterozygous individuals. This requires that a number of issues encountered with GBS be considered, including the sequencing of nonoverlapping sets of loci across multiple GBS libraries, a common missing data problem that results in low call rates for markers per individual, and a tendency for applicability only in inbred line samples with sufficient linkage disequilibrium for accurate imputation. We addressed these issues while developing and validating a new, comprehensive platform for GBS. This study supports the notion that GBS can be tailored to particular aims, and using Zea mays our results indicate that large samples of unknown pedigree can be genotyped to obtain complete and accurate GBS data. Optimizing size selection to sequence a high proportion of shared loci among individuals in different libraries and using simple in silico filters, a GBS procedure was established that produces high call rates per marker (>85%) with accuracy exceeding 99.4%. Furthermore, by capitalizing on the sequence-read structure of GBS data (stacks of reads), a new tool for resolving local haplotypes and scoring phased genotypes was developed, a feature that is not available in many GBS pipelines. Using local haplotypes reduces the marker dimensionality of the genotype matrix while increasing the informativeness of the data. Phased GBS in maize also revealed the existence of reproducibly inaccurate (apparent accuracy) genotypes that were due to divergent copy number variants (CNVs) unobservable in the underlying single nucleotide polymorphism (SNP) data.,,
Data from: Genome-wide Association and Genomic Prediction Identifies Soybean Cyst Nematode Resistance in Common Bean Including a Syntenic Region to Soybean Rhg1 Locus
공공데이터포털
,A panel of single nucleotide polymorphisms (SNPs) for 363 common bean accessions was generated. A genome-wide association study (GWAS) was applied to detect SNPs significantly associated with resistance to Heterodera glycines (HG) also known as the soybean cyst nematode (SCN) in the core collection of common bean, Phaseolus vulgaris. There were 84,416 SNPs identified in 363 common bean accessions.,,
Dataset for "Cover crop inclusion and residue retention improves soybean production and physiology in drought conditions"
공공데이터포털
,Data and code for "Cover crop inclusion and residue retention improves soybean production and physiology in drought conditions",CONTEXT: Soybean (Glycine max (L.) Merr.) planting has increased in central and western North Dakota despite frequent drought occurrences that limit productivity. Soybean plants need high photosynthetic and transpiration rates to be productive, but they also need high water use efficiency when water is limited. Retaining crop residues and including cover crops in crop rotations are management strategies that could improve soybean drought resilience in the northern Great Plains.,OBJECTIVE: We aimed to examine how a management practice that included cover crops and residue retention impacts agronomic, ecosystem water and carbon dioxide flux, and canopy-scale physiological attributes of soybeans in the northern Great Plains under drought conditions.,METHODS: We compared two soybean fields over two years with business-as-usual and aspirational management that included residue retention and cover crops during a drought year. This comparison was based on yield, aboveground biomass, Phenocam images, and fluxes from eddy covariance and ancillary measurements. These measurements were used to derive meteorological, physical, and physiological attributes with the ‘big leaf’ framework.,RESULTS: Soybean yields were 29% higher under drought conditions in the field managed in a system that included cover crops and residue retention. This yield increase was caused by extending the maturity phenophase by 5 days, increasing agronomic and intrinsic water use efficiency by 27% and 33%, respectively, increasing water uptake, and increasing the rubisco-limited photosynthetic capacity (Vcmax25) by 42%.,CONCLUSIONS: The inclusion of cover crops and residue retention into a cropping system improved soybean productivity because of differences in water use, phenology timing, and photosynthetic capacity.,IMPLICATIONS: These results suggest that farmers can improve soybean productivity and yield stability by incorporating cover crops and residue retention into their management practices because these practices allow soybean plants to shift to a more aggressive water uptake strategy.,Data Half_Hourly.csv: Half hour data from eddy covariance towers,Management.csv: data about field management,Phenocamdata.csv: The output of 1_phenocam.Rmd code,Predicted_Height_LAI.csv: The output of 3_Inferring_LAI_and_Height.Rmd,Vegetation.csv: biomass and yield data,Code 1_phenocam.rmd: Code to download Phenocam data and identify phenophase transition dates.,2_Daily_CO2_Water_Fluxes.Rmd: Code to analyze daily carbon and water fluxes (Figure 1, 2 3 and Table 2).,3_Inferring_LAI_and_Height.Rmd: Code to calculate the predicted LAI and height for each day. The output is used in the big-leaf framework.,4_Big_Leaf.Rmd: Code for the big-leaf ecophysiology estimates (Figure 4, 5 and 6; Table 3 and 4).,4_Data_Dictionary_Variables: Code to identify the data dictionary variables.,
SoyBase and the Soybean Breeder's Toolbox
공공데이터포털
,SoyBase is a repository for genetics, genomics and related data resources for soybean. It contains current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits.,SoyBase database was established in the 1990s as the USDA Soybean Genetics Database. Originally, it contained only genetic information about soybeans such as genetic maps and information about the Mendelian genetics of soybean. In time SoyBase was expanded to include molecular data regarding soybean genes and sequences as they became available. In 2010, the soybean genome sequence was published and it and supporting gene sequences have been integrated into the SoyBase sequence browser. SoyBase genetic maps were used in the assembly of both the Williams 82 2010 assembly (Wm82.a1.v1) and the newest genome assembly (Wm82.a2.v1).,SoyBase also incorporates information about mutant and other soybean genetic stocks and serves as a contact point for ordering strains from those populations. As association analyses continue due to various re-sequencing efforts SoyBase will also incorporate those data into the soybean genome browser as they become available. Gene expression patterns are also available at SoyBase through the SoyBase expression pages and the Soybean Gene Atlas. Other expression/transcriptome/methylomic data sets also have been and continue to be incorporated into the SoyBase genome browser.,Project No:3625-21000-062-00D Accession No: 0425040,
A genome‑wide association and meta‑analysis: Candidate genes
공공데이터포털
Seed size is an important trait for yield and commercial value in dry-grain cowpea. Seed size varies widely among different cowpea accessions, and the genetic basis of such variation is not yet well understood. To better decipher the genetic basis of seed size, a genome-wide association study (GWAS) and meta-analysis were conducted on a panel of 368 cowpea diverse accessions from 51 countries. Four traits, including seed weight, length, width and density were evaluated across three locations. Using 51,128 single nucleotide polymorphisms covering the cowpea genome, 17 loci were identified for these traits. One locus was common to weight, width and length, suggesting pleiotropy. By integrating synteny-based analysis with common bean, six candidate genes (Vigun05g036000, Vigun05g039600, Vigun05g204200, Vigun08g217000, Vigun11g187000, and Vigun11g191300) which are implicated in multiple functional categories related to seed size such as endosperm development, embryo development, and cell elongation were identified. These results suggest that a combination of GWAS meta-analysis with synteny comparison in a related plant is an efficient approach to identify candidate gene (s) for complex traits in cowpea. The identified loci and candidate genes provide useful information for improving cowpea varieties and for molecular investigation of seed size.
Weighing Lysimeter Data for The Bushland, Texas, Soybean Datasets
공공데이터포털
,This dataset consists of five years of weighing lysimeter data for soybean [Glycine max (L.) Merr.] grown at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL) in 1995, 2003, 2004, 2010 and 2019. In 1995, 2003, 2004, and 2010, soybean was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. In 2019, soybean was grown on four large, precision weighing lysimeters, each in the center of a 4.4-ha square field. The weighing lysimeters were used to measure mass, which was converted to relative soil water storage with 0.05 mm accuracy at 5-minute intervals, and the 5-minute change in soil water storage was used along with precipitation and irrigation amounts to calculate crop evapotranspiration (ET), which is reported at 15-minute intervals. Although a quality control process was used, the ET data in this dataset are considered raw data. Advanced algorithms for detection of precipitation, dew and frost were applied in a separate process to determine ET values that are reported in files in a dataset entitled "Evapotranspiration and Water Balance Data for The Bushland, Texas Soybean Datasets". Those files have "water-balance" in their names. Each lysimeter was equipped with a suite of instruments to sense wind speed, air temperature and relative humidity, components of the radiation balance (e.g., net radiation, incoming and reflected shortwave, photosynthetically active radiation (PAR), incoming and reflected longwave, thermal infrared emitted by the plant/soil surface), soil heat flux, soil temperature, and soil volumetric water content at certain depths. Not all properties were always sensed in any one year; and instruments used changed from season to season, which are reasons that subsidiary datasets and data dictionaries for each season are required. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have focused on soybean ET, crop coefficients, crop water productivity, and simulation modeling of crop growth, water use, and yield. Crop coefficients have been used by ET networks. The data have utility for testing simulation models of crop ET, growth, and yield and have been used by both USDA and university researchers.,See the README for descriptions of each data file.,,