New feature subset selection procedures for classification of expression profiles
공공데이터포털
Background Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier. Results We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes. Conclusion When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.
Vector algebra in the analysis of genome-wide expression data
공공데이터포털
Background Data from thousands of transcription-profiling experiments in organisms ranging from yeast to humans are now publicly available. How best to analyze these data remains an important challenge. A variety of tools have been used for this purpose, including hierarchical clustering, self-organizing maps and principal components analysis. In particular, concepts from vector algebra have proven useful in the study of genome-wide expression data. Results Here we present a framework based on vector algebra for the analysis of transcription profiles that is geometrically intuitive and computationally efficient. Concepts in vector algebra such as angles, magnitudes, subspaces, singular value decomposition, bases and projections have natural and powerful interpretations in the analysis of microarray data. Angles in particular offer a rigorous method of defining 'similarity' and are useful in evaluating the claims of a microarray-based study. We present a sample analysis of cells treated with rapamycin, an immunosuppressant whose effects have been extensively studied with microarrays. In addition, the algebraic concept of a basis for a space affords the opportunity to simplify data analysis and uncover a limited number of expression vectors to span the transcriptional range of cell behavior. Conclusions This framework represents a compact, powerful and scalable construction for analysis and computation. As the amount of microarray data in the public domain grows, these vector-based methods are relevant in determining statistical significance. These approaches are also well suited to extract biologically meaningful information in the analysis of signaling networks.
Quantitative assessment of the use of modified nucleoside triphosphates in expression profiling: differential effects on signal intensities and impacts on expression ratios
공공데이터포털
Background The power of DNA microarrays derives from their ability to monitor the expression levels of many genes in parallel. One of the limitations of such powerful analytical tools is the inability to detect certain transcripts in the target sample because of artifacts caused by background noise or poor hybridization kinetics. The use of base-modified analogs of nucleoside triphosphates has been shown to increase complementary duplex stability in other applications, and here we attempted to enhance microarray hybridization signal across a wide range of sequences and expression levels by incorporating these nucleotides into labeled cRNA targets. Results RNA samples containing 2-aminoadenosine showed increases in signal intensity for a majority of the sequences. These results were similar, and additive, to those seen with an increase in the hybridization time. In contrast, 5-methyluridine and 5-methylcytidine decreased signal intensities. Hybridization specificity, as assessed by mismatch controls, was dependent on both target sequence and extent of substitution with the modified nucleotide. Concurrent incorporation of modified and unmodified ATP in a 1:1 ratio resulted in significantly greater numbers of above-threshold ratio calls across tissues, while preserving ratio integrity and reproducibility. Conclusions Incorporation of 2-aminoadenosine triphosphate into cRNA targets is a promising method for increasing signal detection in microarrays. Furthermore, this approach can be optimized to minimize impact on yield of amplified material and to increase the number of expression changes that can be detected.
DNA loops and semicatenated DNA junctions
공공데이터포털
Background Alternative DNA conformations are of particular interest as potential signals to mark important sites on the genome. The structural variability of CA microsatellites is particularly pronounced; these are repetitive poly(CA) · poly(TG) DNA sequences spread in all eukaryotic genomes as tracts of up to 60 base pairs long. Many in vitro studies have shown that the structure of poly(CA) · poly(TG) can vary markedly from the classical right handed DNA double helix and adopt diverse alternative conformations. Here we have studied the mechanism of formation and the structure of an alternative DNA structure, named Form X, which was observed previously by polyacrylamide gel electrophoresis of DNA fragments containing a tract of the CA microsatellite poly(CA) · poly(TG) but had not yet been characterized. Results Formation of Form X was found to occur upon reassociation of the strands of a DNA fragment containing a tract of poly(CA) · poly(TG), in a process strongly stimulated by the nuclear proteins HMG1 and HMG2. By inserting Form X into DNA minicircles, we show that the DNA strands do not run fully side by side but instead form a DNA knot. When present in a closed DNA molecule, Form X becomes resistant to heating to 100°C and to alkaline pH. Conclusions Our data strongly support a model of Form X consisting in a DNA loop at the base of which the two DNA duplexes cross, with one of the strands of one duplex passing between the strands of the other duplex, and reciprocally, to form a semicatenated DNA junction also called a DNA hemicatenane.
Differential expressed mRNA and microRNA from expression profiling by RNA and small RNA sequencing
공공데이터포털
data was from HepG2 cells treated with nano-silver particles using silver nitrate as negative controls. Differentially expressed messenger RNA and microRNA were obtained by RNA sequencing and data analysis. Differentially expressed RNA and microRNA lists were than uploaded to Ingenuity Pathway Analysis to find the pathways altered by the differentially expressed genes. This dataset is associated with the following publication: Thai, S., C. Jones, B. Robinette, H. Ren, B. Vallanat, A. Fisher, and K. Kitchin. Effects of Silver Nanoparticles and Silver Nitrate on mRNA and microRNA Expression in Human Hepatocellular Carcinoma Cells (HepG2). Journal of Nanoscience and Nanotechnology. American Scientific Publishers, VALENCIA, CA, USA, 21(11): 5414-5428, (2021).