데이터셋 상세
미국
sequenceMiner algorithm
Detecting and describing anomalies in large repositories of discrete symbol sequences. **sequenceMiner has been open-sourced! Download the file below to try it out.** sequenceMiner was developed to address the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. sequenceMiner works by performing unsupervised clustering (grouping) of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by a detailed analysis of outliers to detect anomalies. sequenceMiner utilizes a new hybrid algorithm for computing the LCS that has been shown to outperform existing algorithms by a factor of five. sequenceMiner also includes new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. This provides analysts with a coherent description of the anomalies identified in the sequence, and why they differ from more “normal” sequences. sequenceMiner was developed with funding from the NASA Aviation Safety Program. In the commercial aviation domain, sequenceMiner can be used to discover atypical behavior in airline performance data that may have possible operational significance for safety analysts. But because the sequenceMiner approach is general and not restricted in any way to a domain, and these algorithms can be applied in other fields where anomaly detection and event mining would be useful.
데이터 정보
연관 데이터
Stata code for analysis
공공데이터포털
This is STATA software code for analysis on publicly available NHANES data
Ceratocystis lukuohia spore dilution for probit analysis
공공데이터포털
Environmental DNA (eDNA) detection tools are becoming increasingly popular for documenting occurrence and distribution of native and invasive species. These tools can allow early detection of new diseases and invasive species and provide critical information for land management. We designed two new samplers for monitoring airborne particulates, including fungal and fern spores and plant pollen, that rely on natural wind currents (Passive Environmental Sampler) or a battery operated fan (Active Environmental Sampler). This dataset contains results of an experiment that was designed to determine probability of detecting known numbers of Ceratocystis lukuohia spores on individual slides in these samplers.
Vector Alignment Search Tool (VAST)
공공데이터포털
A computer algorithm that identifies similar protein 3-dimensional structures. Structure neighbors for every structure in MMDB are pre-computed and accessible via links on the MMDB Structure Summary pages.
Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation
공공데이터포털
Data file for "Vliet SMF, Hazemi M, Blatz D, Jensen M, Mayasich S, Transue TR, Simmons C, Wilkinson A, LaLone CA. Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation. J Vis Exp. 2023 Feb 10;(192). doi: 10.3791/63970. PMID: 36847398.". This dataset is associated with the following publication: Vliet, S., M. Hazemi, D. Blatz, M. Jensen, S. Mayasich, T. Transue, C. Simmons, A. Wilkinson, and C. Lalone. Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation. Journal of Visualized Experiments. JoVE, Somerville, MA, USA, 192, (2023).
Splign
공공데이터포털
Compute cDNA-to-Genomic sequence alignments.
Detection of Apis mellifera DNA in spiked flowers under laboratory and natural conditions, Chesterton Indiana, 2023-2024
공공데이터포털
The data being released were part of a project funded by the Section 40804 Ecosystem Restoration of the Bipartisan Infrastructure Law (PL-117-58) in support of advancing a national revegetation effort. Data included are from a series of DNA degradation experiments targeting the mitochondrial 16S rRNA gene of the European honeybee (Apis mellifera). This study sought to determine how various environmental conditions may affect eDNA left behind on flowers by bee visitation and thus the impact that may have on monitoring bees via eDNA. The experiments occurred in the laboratory and in the natural environment in 2022 and 2023 using store-bought potted or natural flowers spiked with Apis mellifera DNA. Flower samples were processed to elute DNA, DNA was extracted, and Apis mellifera quantified by qPCR. More information about the individual degradation experiments can be found in the Supplemental Information section.
YRC Baseline 9 Loci
공공데이터포털
TXT file of individual genotypes at 9 microsatellite loci. First row has the microsatellite loci names. Columns 1 and 2 are individual identifiers, followed by allele sizes for the 9 microsatellites, two alleles per locus.