데이터셋 상세
미국
Key Real-World Applications of Classifier Ensembles
Broad classes of statistical classification algorithms have beendeveloped and applied successfully to a wide range of real worlddomains. In general, ensuring that the particular classificationalgorithm matches the properties of the data is crucial inproviding results that meet the needs of the particular applicationdomain. One way in which the impact of this algorithm/applicationmatch can be alleviated is by using ensembles of classifiers, wherea variety of classifiers (either different types of classifiers ordifferent instantiations of the same classifier) are pooled before afinal classification decision is made. Intuitively, classifierensembles allow the different needs of a difficult problem to behandled by classifiers suited to those particular needs.Mathematically, classifier ensembles provide an extra degree offreedom in the classical bias/variance tradeoff, allowing solutionsthat would be difficult (if not impossible) to reach with only asingle classifier. Because of these advantages, classifier ensembles have been applied to many difficult real world problems. In this paper, we surveyselect applications of ensemble methods to problems that havehistorically been most representative of the difficulties inclassification. In particular, we survey applications of ensemblemethods to remote sensing, person recognition, one vs. allrecognition, and medicine.
데이터 정보
연관 데이터
Discriminative Mixed-Membership Models
공공데이터포털
Although mixed-membership models have achieved great success in unsupervised learning, they have not been widely applied to classification problems. In this paper, we propose a family of discriminative mixed-membership models for classification by combining unsupervised mixed membership models with multi-class logistic regression. In particular, we propose two variants respectively applicable to text classification based on latent Dirichlet allocation and usual feature vector classification based on mixed membership naive Bayes models. The proposed models allow the number of components in the mixed membership to be different from the number of classes. We propose two variational inference based algorithms for learning the models, including a fast variational inference which is substantially more efficient than mean-field variational approximation. Through extensive experiments on UCI and text classification benchmark datasets, we show that the models are competitive with the state of the art, and can discover components not explicitly captured by the class labels.
Classification
공공데이터포털
A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output consists of one or more classes to which the corresponding input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. In this chapter, we explain several basic classification algorithms.
Empirical Evaluation of Diagnostic Algorithm Performance Using a Generic Framework
공공데이터포털
A variety of rule-based, model-based and datadriven techniques have been proposed for detection and isolation of faults in physical systems. However, there have been few efforts to comparatively analyze the performance of these approaches on the same system under identical conditions. One reason for this was the lack of a standard framework to perform this comparison. In this paper we introduce a framework, called DXF, that provides a common language to represent the system description, sensor data and the fault diagnosis results; a run-time architecture to execute the diagnosis algorithms under identical conditions and collect the diagnosis results; and an evaluation component that can compute performance metrics from the diagnosis results to compare the algorithms. We have used DXF to perform an empirical evaluation of 13 diagnostic algorithms on a hardware testbed (ADAPT) at NASA Ames Research Center and on a set of synthetic circuits typically used as benchmarks in the model-based diagnosis community. Based on these empirical data we analyze the performance of each algorithm and suggest directions for future development.
경기도 데이터분석 분석 데이터셋 분류 유형 코드
공공데이터포털
이 데이터는 경기도 데이터분석시스템에서 사용되는 분석 데이터셋 분류 유형 코드를 정의한 메타데이터입니다. 주요 항목으로는 번호, 데이터셋분류유형코드, 데이터셋분류유형명, 등록자ID, 등록일시, 수정자ID, 수정일시가 포함됩니다. 이 데이터는 데이터셋의 분류 체계를 식별하고, 코드 기반으로 다른 데이터셋과 연계할 수 있도록 지원합니다. 활용 방안으로는 행정기관의 데이터 관리 체계화, 연구기관의 주제별 데이터 접근 용이화, 민간의 데이터 서비스 개발을 위한 코드 매핑 등이 있습니다.
충청남도 금산군 인삼경작지현황 합성데이터
공공데이터포털
해당 데이터는 금산군 인삼경작지 현황을 기반으로 인공지능 생성모델 알고리즘에 적용하여 산출된 합성데이터입니다. 합성데이터는 원천데이터와 통계적으로 유사한 패턴을 가진 가상데이터로, 익명화와 통계적 변형 기법을 통해 개인정보를 철저히 보호합니다. 합성 모델(CTGAN, GMM)에 따라 각각 9,983건, 10,000건의 데이터가 수록되어 있습니다. 해당 데이터를 활용하여 금산군 특용작물인 인삼의 경작연도·면적 추세를 파악하여 지원정책 및 유통계획을 수립할 수 있습니다. 또한 지목·읍명동 기반 공간분포와 경작 특성을 분석하고 타 합성데이터와 융복합을 통해 경작지의 계절별, 날찌별 수확량의 연관관계 등을 복합적으로 분석할 수 있습니다. ※ csv 파일 내 D열(경작면적) 및 E열(실제면적)의 단위는 ㎡입니다.
충청남도 금산군 인삼경작지현황 합성데이터
공공데이터포털
해당 데이터는 금산군 인삼경작지 현황을 기반으로 인공지능 생성모델 알고리즘에 적용하여 산출된 합성데이터입니다. 합성데이터는 원천데이터와 통계적으로 유사한 패턴을 가진 가상데이터로, 익명화와 통계적 변형 기법을 통해 개인정보를 철저히 보호합니다. 합성 모델(CTGAN, GMM)에 따라 각각 9,983건, 10,000건의 데이터가 수록되어 있습니다. 해당 데이터를 활용하여 금산군 특용작물인 인삼의 경작연도·면적 추세를 파악하여 지원정책 및 유통계획을 수립할 수 있습니다. 또한 지목·읍명동 기반 공간분포와 경작 특성을 분석하고 타 합성데이터와 융복합을 통해 경작지의 계절별, 날찌별 수확량의 연관관계 등을 복합적으로 분석할 수 있습니다.※ csv 파일 내 D열(경작면적) 및 E열(실제면적)의 단위는 ㎡입니다.