데이터셋 상세
폴란드
Extreme Robotics Sp. z o.o. - Automatic Data Explorer
,Industrial research: Task No. 1. Research on Data Refinement and Feature Engineering algorithms,Stage tasks: Task 1: Development of algorithms for statistical analysis of attribute values for data purification. The aim of the task was to develop an algorithm that is able to identify the type of attribute (scalar, discrete) and depending on the type (text, number, date, text label, etc.) and deduce which values can be considered correct and which are incorrect and cause noise dataset, which in turn affects the quality of the ML model.,Task 2: Development of algorithms for statistical analysis of data attributes in terms of optimal coding of learning vectors. The aim of the task was to develop an algorithm that is able to propose optimal coding of the learning vector to be used in the ML process and perform the appropriate conversion, depending on the type (text, number, date, text label, etc.) for each type of attribute (scalar, discrete). e.g. converting text to word instance matrix format. It was necessary to predict several possible conversion scenarios that are most often used in practice, resulting from the heuristic knowledge of experts.,Task 3: Developing a prototype of an automatic data cleaning and coding environment and testing the solution on samples of production data.,Industrial Research: Task No. 2. Research on the meta-learning algorithm,Task 1: Review of existing meta-learning concepts and selection of algorithms for further development The aim of the task was to analyze the state of knowledge on meta-learning in terms of the possibility of using existing research results in the project - a task carried out in the form of subcontracting by a scientific unit.,Task 2: Review and development of the most commonly used ML algorithms in terms of their susceptibility to hyperparameter meta-learning and practical usefulness of the obtained models. The aim of the task was to develop a pool of basic algorithms that will be used as production algorithms, i.e. performing the right predictions. The hyperparameters of these algorithms have been meta-learning. It was therefore necessary to develop a model of interaction of the main algorithm with individual production algorithms. – task carried out in the form of subcontracting by a scientific unit.,Task 3: Development of a meta-learning algorithm for selected types of ML models The aim of the task was to develop the main algorithm implementing the function of optimizing hyperparameters of production models. It should be noted that the hyperparameters have a different structure depending on the specific production model, so the de facto appropriate solution was to use a different optimization algorithm for each model separately.,Task 4: Developing a prototype of the algorithm and testing the operation of the obtained production data models.,Experimental development work: Task No. 3. Research on the prototype of the architecture of the platform implementation environment,Task 1: Developing the architecture of the data acquisition and storage module. The aim of the task was to develop an architecture for a scalable ETL (Extract Transform Load) solution for efficient implementation of the source data acquisition process (Data Ingest). An attempt was made to consider appropriate parsing algorithms and standardization of encoding data of various types (e.g. dates, numbers) in terms of effective further processing.,Task 2: Development of a module for configuring and executing data processing pipelines in a distributed architecture. Due to the high complexity of the implemented algorithms, it was necessary to develop an architecture that would allow pipeline processing of subsequent data processing steps on various machines with the possibility of using a distributed architecture in a cloud and/or virtual environment. The use of existing concepts of distributed architectures, such as Map Reduce, was considered here.,Task 3: Development of a user interface enabling intuitive control
연관 데이터
Workshop Data on Autonomous Methodologies for Accelerating X-ray Measurements
공공데이터포털
The National Institute of Standards and Technology and the International Centre for Diffraction Data co-hosted a workshop on 17-18 October 2023 to identify and prioritize the goals, challenges, and opportunities for critical and emerging technology needs within industry, with an emphasis on leveraging artificial intelligence, data-driven methodologies, and high-throughput and automated workflows for accelerating x-ray-based structural analysis for materials development and manufacturing. Participants, predominantly from industry, gathered in-person at ICDD headquarters in Newtown Square, Pennsylvania. The data collected during this workshop is published in this data publication. This data is interpreted in the workshop report, which cites this dataset.Certain equipment, instruments, software, or materials, commercial or non-commercial, are identified in this dataset. Such identification does not imply recommendation or endorsement of any product or service by NIST, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.
한국전자통신연구원 인프라 2D 동적객체 검출 학습 데이터
공공데이터포털
프라 엣지에서 주변 동적객체를 2차원 Bounding Box 형태로 검출하기 위한 인공지능 학습 데이터 셋입니다.아래 링크에서 세부 정보를 확인하실 수 있으며 전체 데이터를 다운로드 받을 수 있습니다.https://nanum.etri.re.kr/share/teslasystem/Infra2DObjectDetection?lang=ko_KR
Trojan Detection Software Challenge - image-classification-jun2020-train
공공데이터포털
Round 1 Training DatasetThe data being generated and disseminated is the training data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 1000 trained, human level, image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present. Errata: This dataset had a software bug in the trigger embedding code that caused 4 models trained for this dataset to have a ground truth value of 'poisoned' but which did not contain any triggers embedded. These models should not be used. Models Without a Trigger Embedded: id-00000184 id-00000599 id-00000858 id-00001088 Google Drive Mirror: https://drive.google.com/open?id=1uwVt3UCRL2fCX9Xvi2tLoz_z-DwbU6Ce
㈜모핑아이 - AI탑재 생체모방로봇을 활용한 상수도관 내외부 데이터
공공데이터포털
- 상수도관로의 이상을 손상 없이 탐지하기 위해, 소프트 스킨의 생체모방 주행 로봇을 내부 투입하고, 각종 센서 및 장비를 통한 영상/음향 정보를 수집 후, AI 기반 빅데이터 분석 통해 이상유무 판단 및 예측 수행할 데이터 구축함 <데이터의 한계> 외부 음향데이터가 기존에는 상수도관 내의 이상부분에서의 음향의 차이가 있을 것으로 예측하고 수집하였으나 이상징후의 종류에 따른 차이가 크지 않았음
빌트온 - [이커머스] 2020년 12월 일별 e커머스 로봇청소기 리뷰 정보
공공데이터포털
※ 데이터 소개 - 국내 17개 온라인 마켓에 판매되고 있는 로봇청소기 제품의 구매후기 정보 - 제조사, 모델, 채널, 판매자 별 구매후기 현황 분석 가능,
Trojan Detection Software Challenge - nlp-sentiment-classification-apr2021-train part2
공공데이터포털
Round 6 Train Dataset part2This is the training data used to construct and evaluate trojan detection software solutions. This data, generated at NIST, consists of natural language processing (NLP) AIs trained to perform text sentiment classification on English text. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 96 sentiment classification AI models using a small set of model architectures. The models were trained on text data drawn from product reviews. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.
연세대학교원주산학협력단 - [AI 학습용 데이터셋] Object Detection Problem 학습을 위한 데이터(이미지좌표)
공공데이터포털
[개요] Object Detection Problem 학습을 위한 인공지능 학습용 데이터셋 [학습목표] 식판 내 음식 위치 추출 및 음식 분류 [제공항목] - 전체 식판에서 각 메뉴의 위치 좌표(X좌표 최대/최소값, Y좌표 최대/최소값)
Trojan Detection Software Challenge - nlp-sentiment-classification-apr2021-train
공공데이터포털
Round 6 Train DatasetThis is the training data used to construct and evaluate trojan detection software solutions. This data, generated at NIST, consists of natural language processing (NLP) AIs trained to perform text sentiment classification on English text. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 48 adversarially trained, sentiment classification AI models using a small set of model architectures. The models were trained on text data drawn from movie and product reviews. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.
Trojan Detection Software Challenge - image-classification-sep2022-train
공공데이터포털
Round 11 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of image classification AIs trained on synthetic image data build from Cityscapes. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 288 AI models using a small set of model architectures. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.
한국전자통신연구원 객체 복합 상태 인식 학습 데이터
공공데이터포털
자율주행 차량에서 전방의 동적, 정적 객체를 2D-Bounding Box로 위치를 표현하고, 해당 객체의 Class, Location, Action을 분류하기 위한 인공지능 학습 데이터 셋입니다.아래 링크에서 세부 정보를 확인하실 수 있으며 협약서 작성 후 전체 데이터를 다운로드 받을 수 있습니다.https://nanum.etri.re.kr/share/kimjy/ObjectStateDetection?lang=ko_KR