교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

GENOSIM

,The genosim package simulates genotypes, breeding values, and phenotypes; simulates DNA sequence read depth (numbers of A and B alleles); and resolves SNP conflicts between parent and offspring genotypes.,,

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/genosim-a0a48
라이선스
cc-zero
비용
제공기관
Department of Agriculture
관리부서
데이터
- https:..www.ars.usda.gov.research.software.download.?softwareid=496&modecode=80-42-05-30
- 랜딩 페이지

연관 데이터

High-throughput genotyping of single nucleotide polymorphisms with rolling circle amplification

공공데이터포털

Background Single nucleotide polymorphisms (SNPs) are the foundation of powerful complex trait and pharmacogenomic analyses. The availability of large SNP databases, however, has emphasized a need for inexpensive SNP genotyping methods of commensurate simplicity, robustness, and scalability. We describe a solution-based, microtiter plate method for SNP genotyping of human genomic DNA. The method is based upon allele discrimination by ligation of open circle probes followed by rolling circle amplification of the signal using fluorescent primers. Only the probe with a 3' base complementary to the SNP is circularized by ligation. Results SNP scoring by ligation was optimized to a 100,000 fold discrimination against probe mismatched to the SNP. The assay was used to genotype 10 SNPs from a set of 192 genomic DNA samples in a high-throughput format. Assay directly from genomic DNA eliminates the need to preamplify the target as done for many other genotyping methods. The sensitivity of the assay was demonstrated by genotyping from 1 ng of genomic DNA. We demonstrate that the assay can detect a single molecule of the circularized probe. Conclusions Compatibility with homogeneous formats and the ability to assay small amounts of genomic DNA meets the exacting requirements of automated, high-throughput SNP scoring.

Jana Sperschneider - Melampsora lini genome assembly and RNA-seq data

공공데이터포털

The Melampsora lini genome was sequenced to improve its genome assembly. PacBio HiFi and Hi-C data was generated as well as RNA-seq data.

FINDHAP

공공데이터포털

,The findhap.f90 program finds haplotypes and imputes genotypes using multiple chip sets and sequence data. Program and download information can be found at the Animal Improvement Program (AIP) web site: http://aipl.arsusda.gov/software/findhap,Downloads Version 4 program, example files, and executable (beta version — not quite ready for routine use on U.S. chip data, but performs better than version 3 for sequence data),Example data files for imputation study presented by VanRaden and Sun at the 2014 World Congress on Genetics Applied to Livestock Production,

Quantitative assessment of the use of modified nucleoside triphosphates in expression profiling: differential effects on signal intensities and impacts on expression ratios

공공데이터포털

Background The power of DNA microarrays derives from their ability to monitor the expression levels of many genes in parallel. One of the limitations of such powerful analytical tools is the inability to detect certain transcripts in the target sample because of artifacts caused by background noise or poor hybridization kinetics. The use of base-modified analogs of nucleoside triphosphates has been shown to increase complementary duplex stability in other applications, and here we attempted to enhance microarray hybridization signal across a wide range of sequences and expression levels by incorporating these nucleotides into labeled cRNA targets. Results RNA samples containing 2-aminoadenosine showed increases in signal intensity for a majority of the sequences. These results were similar, and additive, to those seen with an increase in the hybridization time. In contrast, 5-methyluridine and 5-methylcytidine decreased signal intensities. Hybridization specificity, as assessed by mismatch controls, was dependent on both target sequence and extent of substitution with the modified nucleotide. Concurrent incorporation of modified and unmodified ATP in a 1:1 ratio resulted in significantly greater numbers of above-threshold ratio calls across tissues, while preserving ratio integrity and reproducibility. Conclusions Incorporation of 2-aminoadenosine triphosphate into cRNA targets is a promising method for increasing signal detection in microarrays. Furthermore, this approach can be optimized to minimize impact on yield of amplified material and to increase the number of expression changes that can be detected.

Comparison of complete nuclear receptor sets from the human,

공공데이터포털

Background The availability of complete genome sequences enables all the members of a gene family to be identified without limitations imposed by temporal, spatial or quantitative aspects of mRNA expression. Using the nearly completed human genome sequence, we combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set. This information was used to carry out a comparative genomic study of the NR superfamily. Results Our analysis of the human genome identified two novel NR sequences. Both these contained stop codons within the coding regions, indicating that both are pseudogenes. One (HNF4 γ-related) contained no introns and expressed no detectable mRNA, whereas the other (FXR-related) produced mRNA at relatively high levels in testis. If translated, the latter is predicted to encode a short, non-functional protein. Our analysis indicates that there are fewer than 50 functional human NRs, dramatically fewer than in Caenorhabditis elegans and about twice as many as in Drosophila. Using the complete human NR set we made comparisons with the NR sets of C. elegans and Drosophila. Searches for the >200 NRs unique to C. elegans revealed no human homologs. The comparative analysis also revealed a Drosophila member of NR subfamily NR3, confirming an ancient metazoan origin for this subfamily. Conclusions This work provides the basis for new insights into the evolution and functional relationships of NR superfamily members.

SNP analysis of the inter-alpha-trypsin inhibitor family heavy chain-related protein (IHRP) gene by a fluorescence-adapted SSCP method

공공데이터포털

Background Single-nucleotide polymorphisms (SNPs) are considered to be useful polymorphic markers for genetic studies of polygenic traits. Single-stranded conformational polymorphism (SSCP) analysis has been widely applied to detect SNPs, including point mutations in cancer and congenital diseases. In this study, we describe an application of the fluorescent labeling of PCR fragments using a fluorescent-adapted primer for SSCP analysis as a novel method. Methods Single-nucleotide polymorphisms (SNPs) of the inter-alpha-trypsin inhibitor family heavy chain-related protein (IHRP) gene were analyzed using a fluorescence-adapted SSCP method. The method was constructed from two procedures: 1) a fluorescent labeling reaction of PCR fragments using fluorescence-adapted primers in a single tube, and 2) electrophoresis on a non-denaturing polyacrylamide gel. Results This method was more economical and convenient than the single-stranded conformational polymorphism (SSCP) methods previously reported in the detection of the labeled fragments obtained. In this study, eight SNPs of the IHRP gene were detected by the fluorescence-adapted SSCP. One of the SNPs was a new SNP resulting in an amino acid substitution, while the other SNPs have already been reported in the public databases. Six SNPs of the IHRP were associated with two haplotypes. Conclusions The fluorescence-adapted SSCP was useful for detecting and genotyping SNPs.

Improved analytical methods for microarray-based genome-composition analysis

공공데이터포털

Genome-composition analysis using microarrays can be used to categorize genes into 'present' and 'divergent' categories. This involves selecting a signal value that is used as a cutoff to discriminate present and divergent genes, but this can result in the misclassification of many genes. A method is described that depends on the shape of the signal-ratio distribution and does not require empirical determination of a cutoff. Many genes previously classified as present using static methods are in fact divergent on the basis of microarray signal; this is corrected by our algorithm.

An algorithm for mapping positively selected members of quasispecies-type viruses

공공데이터포털

Background Many RNA viruses do not have a single, representative genome but instead form a set of related variants that has been called a quasispecies. The sequence variability of such viruses presents a significant bioinformatics challenge. In order for the sequence information to be understood, the complete mutational spectrum needs to be distilled to a biologically relevant and analyzable representation. Results Here, we develop a "selection mapping" algorithm--QUASI--that identifies the positively selected variants of viral proteins. The key to the selection mapping algorithm is the identification of particular replacement mutations that are overabundant relative to silent mutations at each codon (e.g., threonine at hemagglutinin position 262). Selection mapping identifies such replacement mutations as positively selected. Conversely, selection mapping recognizes negatively selected variants as mutational "noise" (e.g., serine at hemagglutinin position 262). Conclusion Selection mapping is a fundamental improvement over earlier methods (e.g., dN/dS) that identify positive selection at codons but do not identify which amino acids at these codons confer selective advantage. Using QUASI's selection maps, we characterize the selected mutational landscapes of influenza A H3 hemagglutinin, HIV-1 reverse transcriptase, and HIV-1 gp120.

A hierarchical statistical model for estimating population properties of quantitative genes

공공데이터포털

Background Earlier methods for detecting major genes responsible for a quantitative trait rely critically upon a well-structured pedigree in which the segregation pattern of genes exactly follow Mendelian inheritance laws. However, for many outcrossing species, such pedigrees are not available and genes also display population properties. Results In this paper, a hierarchical statistical model is proposed to monitor the existence of a major gene based on its segregation and transmission across two successive generations. The model is implemented with an EM algorithm to provide maximum likelihood estimates for genetic parameters of the major locus. This new method is successfully applied to identify an additive gene having a large effect on stem height growth of aspen trees. The estimates of population genetic parameters for this major gene can be generalized to the original breeding population from which the parents were sampled. A simulation study is presented to evaluate finite sample properties of the model. Conclusions A hierarchical model was derived for detecting major genes affecting a quantitative trait based on progeny tests of outcrossing species. The new model takes into account the population genetic properties of genes and is expected to enhance the accuracy, precision and power of gene detection.

High correlation between the turnover of nucleotides under mutational pressure and the DNA composition

공공데이터포털

Background Any DNA sequence is a result of compromise between the selection and mutation pressures exerted on it during evolution. It is difficult to estimate the relative influence of each of these pressures on the rate of accumulation of substitutions. However, it is important to discriminate between the effect of mutations, and the effect of selection, when studying the phylogenic relations between taxa. Results We have tested in computer simulations, and analytically, the available substitution matrices for many genomes, and we have found that DNA strands in equilibrium under mutational pressure have unique feature: the fraction of each type of nucleotide is linearly dependent on the time needed for substitution of half of nucleotides of a given type, with a correlation coefficient close to 1. Substitution matrices found for sequences under selection pressure do not have this property. A substitution matrix for the leading strand of the Borrelia burgdorferi genome, having reached equilibrium in computer simulation, gives a DNA sequence with nucleotide composition and asymmetry corresponding precisely to the third positions in codons of protein coding genes located on the leading strand. Conclusions Parameters of mutational pressure allow us to count DNA composition in equilibrium with this mutational pressure. Comparing any real DNA sequence with the sequence in equilibrium it is possible to estimate the distance between these sequences, which could be used as a measure of the selection pressure. Furthermore, the parameters of the mutational pressure enable direct estimation of the relative mutation rates in any DNA sequence in the studied genome.