데이터셋 상세
데이터안심구역
KAIST - 문서형악성코드 PDF 모델 데이터 셋
PDF악성코드 분석/분류를 위한 PDF내 존재하는 위협식별인자 [개요] ㅇ PDF악성코드 분석/분류를 위한 PDF내 존재하는 위협식별인자 [특징] ㅇ 입력 파일을 전처리하여 AI/기계학습 모델의 학습 데이터로 활용 가능 ㅇ 악성, 정상 구분 없이 외부 라이브러리 및 추가 작업으로 전처리 함수 생성 [활용 사례] ㅇ 연구/교육 데이터로 활용 ㅇ 악성PDF 탐지 및 분류를 위한 AI/기계학습 모델의 학습데이터에 활용
데이터 정보
연관 데이터
KAIST - 문서형악성코드 MS-Office 모델 데이터 셋
공공데이터포털
MS-Office악성코드 분석/분류를 위한 PDF내 존재하는 위협식별인자 [개요] ㅇ MS-Office악성코드 분석/분류를 위한 PDF내 존재하는 위협식별인자 [특징] ㅇ 입력 파일을 전처리하여 AI/기계학습 모델의 학습 데이터로 활용 가능 ㅇ 악성, 정상 구분 없이 외부 라이브러리 및 추가 작업으로 전처리 함수 생성 [활용 사례] ㅇ 연구/교육 데이터로 활용 ㅇ 악성MS-Office 탐지 및 분류를 위한 AI/기계학습 모델의 학습데이터에 활용
한국인터넷진흥원 - 악성코드 바이너리 원본
공공데이터포털
PC 환경에서 동작하는 악성코드 샘플입니다. 이 데이터셋은 악성코드가 수집된 날짜와 악성코드 바이너리의 해시값, 그리고 악성코드 바이너리 샘플 데이터를 포함합니다. ● 암호 알고리즘 : AES-256 ● 패스워드 : Kisa@infecteD
Trojan Detection Software Challenge - image-classification-dec2020-holdout
공공데이터포털
Round 3 Holdout DatasetThe data being generated and disseminated is the training data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform image classification. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 288 adversarially trained, human level, image classification AI models using a variety of model architectures. The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.
Counterintelligence Case Files
공공데이터포털
Documentation of activities designed to identify and prevent potential threats within all DHS components. These files may also contain executive summaries written by Division personnel used to brief the Secretary and Executive Secretariat. In addition, other Federal agencies may send copies of counterintelligence reports for reference to current DHS cases or the case file may contain derivative memos that describe a threat assessment compiled by outside sources.
Trojan Detection Software Challenge - image-classification-aug2020-train
공공데이터포털
Round 2 Training DatasetThe data being generated and disseminated is the training data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform image classification. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 1104 trained, human level, image classification AI models using a variety of model architectures. The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.
Trojan Detection Software Challenge - image-classification-jun2020-train
공공데이터포털
Round 1 Training DatasetThe data being generated and disseminated is the training data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 1000 trained, human level, image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present. Errata: This dataset had a software bug in the trigger embedding code that caused 4 models trained for this dataset to have a ground truth value of 'poisoned' but which did not contain any triggers embedded. These models should not be used. Models Without a Trigger Embedded: id-00000184 id-00000599 id-00000858 id-00001088 Google Drive Mirror: https://drive.google.com/open?id=1uwVt3UCRL2fCX9Xvi2tLoz_z-DwbU6Ce
Trojan Detection Software Challenge - image-classification-jun2020-holdout
공공데이터포털
Round1 Holdout DatasetThe data being generated and disseminated is the holdout data used to evaluate trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 1000 trained, human level, image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.
Trojan Detection Software Challenge - image-classification-sep2022-train
공공데이터포털
Round 11 Train DatasetThis is the training data used to create and evaluate trojan detection software solutions. This data, generated at NIST, consists of image classification AIs trained on synthetic image data build from Cityscapes. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 288 AI models using a small set of model architectures. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.
Trojan Detection Software Challenge - image-classification-aug2020-holdout
공공데이터포털
Round 2 Holdout DatasetThe data being generated and disseminated is the holdout data used to evaluate trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 144 trained, human level, image classification AI models using a variety of architectures. The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.
Trojan Detection Software Challenge - image-classification-feb2021-test
공공데이터포털
Round 4 Test DatasetThe data being generated and disseminated is the test data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform image classification. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 288 adversarially trained, human level, image classification AI models using a variety of model architectures. The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.