교육데이터 활용•지원 서비스

로그인

데이터셋 상세

미국

Human members of the eukaryotic protein kinase family

Publicly available genetic sequence data were searched for human sequences that potentially represent protein kinases, important players in virtually every signaling pathway. After removal of duplicates, splice variants and pseudogenes, this search yielded 510 sequences with recognizable similarity to eukaryotic protein kinases.

데이터 정보

데이터 포털
미국
META URL
https://catalog.data.gov/dataset/human-members-of-the-eukaryotic-protein-kinase-family
라이선스
notspecified
비용
제공기관
U.S. Department of Health & Human Services
관리부서
데이터
- Official Government Data Source
- 랜딩 페이지

연관 데이터

Multigene family isoform profiling from blood cell lineages

공공데이터포털

Background Analysis of cell-selective gene expression for families of proteins of therapeutic interest is crucial when deducing the influence of genes upon complex traits and disease susceptibility. Presently, there is no convenient tool for examining isoform-selective expression for large gene families. A multigene isoform profiling strategy was developed and used to investigate the inwardly rectifying K+ (Kir) channel family in human leukocytes. Comprised of seven subfamilies, Kir channels have important roles in setting the resting membrane potential in excitable and non-excitable cells. Results Gene sequence alignment allowed determination of "islands" of amino acid homology, and sub-family "centred" priming permitted simultaneous co-amplification of each family member. Validation and cross-priming analysis was performed against a panel of cognate Kir channel clones. Radiolabelling and diagnostic restriction digestion of pooled PCR products enabled determination of distinct Kir gene expression profiles in pure populations of human neutrophils, eosinophils and lung mast cells, with conservation of Kir2.0 isoforms amongst the leukocyte subsets. We also identified a Kir2.0 channel product, which may potentially represent a novel family member. Conclusions We have developed a novel, rapid and flexible strategy for the determination of gene family isoform composition in any cell type with the additional capacity to detect hitherto unidentified family members and verified its application in a study of Kir channel isoform expression in human leukocytes.

Evaluation of thresholds for the detection of binding sites for regulatory proteins in

공공데이터포털

Background Sites in DNA that bind regulatory proteins can be detected computationally in various ways. Pattern discovery methods analyze collections of genes suspected to be co-regulated on the evidence, for example, of clustering of transcriptome data. Pattern searching methods use sequences with known binding sites to find other genes regulated by a given protein. Such computational methods are important strategies in the discovery and elaboration of regulatory networks and can provide the experimental biologist with a precise prediction of a binding site or identify a gene as a member of a set of co-regulated genes (a regulon). As more variations on such methods are published, however, thorough evaluation is necessary, as performance may differ depending on the conditions of use. Detailed evaluation also helps to improve and understand the behavior of the different methods and computational strategies. Results We used a collection of 86 regulons from Escherichia coli as datasets to evaluate two methods for pattern discovery and pattern searching: dyad analysis/dyad sweeping using the program Dyad-analysis, and multiple alignment using the programs Consensus/Patser. Clearly defined statistical parameters are used to evaluate the two methods in different situations. We placed particular emphasis on minimizing the rate of false positives. Conclusions As a general rule, sensors obtained from experimentally reported binding sites in DNA frequently locate true sites as the highest-scoring sequences within a given upstream region, especially using Consensus/Patser. Pattern discovery is still an unsolved problem, although in the cases where Dyad-analysis finds significant dyads (around 50%), these frequently correspond to true binding sites. With more robust methods, regulatory predictions could help identify the function of unknown genes.

공공데이터포털

NIH Genetic sequence database; an annotated collection of all publicly available DNA sequences.

Evidence for large domains of similarly expressed genes in the

공공데이터포털

Background Transcriptional regulation in eukaryotes generally operates at the level of individual genes. Regulation of sets of adjacent genes by mechanisms operating at the level of chromosomal domains has been demonstrated in a number of cases, but the fraction of genes in the genome subject to regulation at this level is unknown. Results Drosophila gene-expression profiles that were determined from over 80 experimental conditions using high-density oligonucleotide microarrays were searched for groups of adjacent genes that show similar expression profiles. We found about 200 groups of adjacent and similarly expressed genes, each having between 10 and 30 members; together these groups account for over 20% of assayed genes. Each group covers between 20 and 200 kilobase pairs of genomic sequence, with a mean group size of about 100 kilobase pairs. Groups do not appear to show any correlation with polytene banding patterns or other known chromosomal structures, nor were genes within groups functionally related to one another. Conclusions Groups of adjacent and co-regulated genes that are not otherwise functionally related in any obvious way can be identified by expression profiling in Drosophila. The mechanism underlying this phenomenon is not yet known.

The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases

공공데이터포털

Background: Protein fold recognition using sequence profile searches frequently allows prediction of the structure and biochemical mechanisms of proteins with an important biological function but unknown biochemical activity. Here we describe such predictions resulting from an analysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that are widespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule. Results: We employ sequence profile analysis to show that the DNA repair protein AlkB, the extracellular matrix protein leprecan, the disease-resistance-related protein EGL-9 and several uncharacterized proteins define novel families of enzymes of the 2OG-Fe(II) oxygenase superfamily. The identification of AlkB as a member of the 2OG-Fe(II) oxygenase superfamily suggests that this protein catalyzes oxidative detoxification of alkylated bases. More distant homologs of AlkB were detected in eukaryotes and in plant RNA viruses, leading to the hypothesis that these proteins might be involved in RNA demethylation. The EGL-9 protein from Caenorhabditis elegans is necessary for normal muscle function and its inactivation results in resistance against paralysis induced by the Pseudomonas aeruginosa toxin. EGL-9 and leprecan are predicted to be novel protein hydroxylases that might be involved in the generation of substrates for protein glycosylation. Conclusions: Here, using sequence profile searches, we show that several previously undetected protein families contain 2OG-Fe(II) oxygenase fold. This allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.

Expression profiling of

공공데이터포털

A combination of linear RNA amplification and DNA microarray hybridization has allowed the determination of expression profiles of individual imaginal discs and larval tissues and the identification of genes expressed in tissue-specific patterns.

Molecular genetics and structural genomics of the human protein kinase C gene module

공공데이터포털

Background Protein kinase C (PKC) has become a major focus among cell biologists interested in second-messenger signal transduction and much has been learned about differences in the cellular localization and function of its different isotypes. In this study we systematically address the genomic locations and gene structures of the human PKC gene module. Results We first carried out fine chromosomal mapping of all nine PKC genes by fluorescence in situ hybridization (FISH), using cosmid and BAC probes. The PKC genes are found to be dispersed throughout the genome, and in some positions distinct from those previously reported: PKCα is at 17q24, PKCβ at 16p12, PKCγ at 19q13.4, PKCδ at 3p21.2, PKCε at 2p21, PKCζ at 1p36.3, PKCη at 14q22-23, PKCθ at 10p15 and PKCι at 3q26. For PKCι, an additional FISH signal mapped on Xq21.3 revealed a pseudogene (derived by retrotransposition). PKCγ, ζ, and θ are found to map to the most distal positions on the chromosomes, potentially implicating telomere position effects in their expression. Using the complete human genome draft sequence and bioinformatics tools, we then carried out a systematic analysis of PKC gene structure, including determination of the occurrence of single-nucleotide polymorphisms corresponding to the PKC loci. Conclusion This resource of genomic information now facilitates investigation of the PKC gene module in structural chromosomal abnormalities and human disease locus mapping studies.

The society of genes: networks of functional links between genes from comparative genomics

공공데이터포털

Comparative genomics provides at least three methods for identifying functional links between genes: examination of phylogenetic distributions, analysis of conserved proximity and observations of fusions of genes into a multidomain gene in another organism. We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods.

DNA loops and semicatenated DNA junctions

공공데이터포털

Background Alternative DNA conformations are of particular interest as potential signals to mark important sites on the genome. The structural variability of CA microsatellites is particularly pronounced; these are repetitive poly(CA) · poly(TG) DNA sequences spread in all eukaryotic genomes as tracts of up to 60 base pairs long. Many in vitro studies have shown that the structure of poly(CA) · poly(TG) can vary markedly from the classical right handed DNA double helix and adopt diverse alternative conformations. Here we have studied the mechanism of formation and the structure of an alternative DNA structure, named Form X, which was observed previously by polyacrylamide gel electrophoresis of DNA fragments containing a tract of the CA microsatellite poly(CA) · poly(TG) but had not yet been characterized. Results Formation of Form X was found to occur upon reassociation of the strands of a DNA fragment containing a tract of poly(CA) · poly(TG), in a process strongly stimulated by the nuclear proteins HMG1 and HMG2. By inserting Form X into DNA minicircles, we show that the DNA strands do not run fully side by side but instead form a DNA knot. When present in a closed DNA molecule, Form X becomes resistant to heating to 100°C and to alkaline pH. Conclusions Our data strongly support a model of Form X consisting in a DNA loop at the base of which the two DNA duplexes cross, with one of the strands of one duplex passing between the strands of the other duplex, and reciprocally, to form a semicatenated DNA junction also called a DNA hemicatenane.

Comparison of complete nuclear receptor sets from the human,

공공데이터포털

Background The availability of complete genome sequences enables all the members of a gene family to be identified without limitations imposed by temporal, spatial or quantitative aspects of mRNA expression. Using the nearly completed human genome sequence, we combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set. This information was used to carry out a comparative genomic study of the NR superfamily. Results Our analysis of the human genome identified two novel NR sequences. Both these contained stop codons within the coding regions, indicating that both are pseudogenes. One (HNF4 γ-related) contained no introns and expressed no detectable mRNA, whereas the other (FXR-related) produced mRNA at relatively high levels in testis. If translated, the latter is predicted to encode a short, non-functional protein. Our analysis indicates that there are fewer than 50 functional human NRs, dramatically fewer than in Caenorhabditis elegans and about twice as many as in Drosophila. Using the complete human NR set we made comparisons with the NR sets of C. elegans and Drosophila. Searches for the >200 NRs unique to C. elegans revealed no human homologs. The comparative analysis also revealed a Drosophila member of NR subfamily NR3, confirming an ancient metazoan origin for this subfamily. Conclusions This work provides the basis for new insights into the evolution and functional relationships of NR superfamily members.

목록