데이터셋 상세
미국
Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models
The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations.Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.
연관 데이터
Dataset: An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models
공공데이터포털
The open dataset, software, and other files accompanying the manuscript "An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models," submitted for publication to Integrated Materials and Manufacturing Innovations.Machine learning and autonomy are increasingly prevalent in materials science, but existing models are often trained or tuned using idealized data as absolute ground truths. In actual materials science, "ground truth" is often a matter of interpretation and is more readily determined by consensus. Here we present the data, software, and other files for a study using as-obtained diffraction data as a test case for evaluating the performance of machine learning models in the presence of differing expert opinions. We demonstrate that experts with similar backgrounds can disagree greatly even for something as intuitive as using diffraction to identify the start and end of a phase transformation. We then use a logarithmic likelihood method to evaluate the performance of machine learning models in relation to the consensus expert labels and their variance. We further illustrate this method's efficacy in ranking a number of state-of-the-art phase mapping algorithms. We propose a materials data challenge centered around the problem of evaluating models based on consensus with uncertainty. The data, labels, and code used in this study are all available online at data.gov, and the interested reader is encouraged to replicate and improve the existing models or to propose alternative methods for evaluating algorithmic performance.
Workshop Data on Autonomous Methodologies for Accelerating X-ray Measurements
공공데이터포털
The National Institute of Standards and Technology and the International Centre for Diffraction Data co-hosted a workshop on 17-18 October 2023 to identify and prioritize the goals, challenges, and opportunities for critical and emerging technology needs within industry, with an emphasis on leveraging artificial intelligence, data-driven methodologies, and high-throughput and automated workflows for accelerating x-ray-based structural analysis for materials development and manufacturing. Participants, predominantly from industry, gathered in-person at ICDD headquarters in Newtown Square, Pennsylvania. The data collected during this workshop is published in this data publication. This data is interpreted in the workshop report, which cites this dataset.Certain equipment, instruments, software, or materials, commercial or non-commercial, are identified in this dataset. Such identification does not imply recommendation or endorsement of any product or service by NIST, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.
Workshop Data on Autonomous Methodologies for Accelerating X-ray Measurements
공공데이터포털
The National Institute of Standards and Technology and the International Centre for Diffraction Data co-hosted a workshop on 17-18 October 2023 to identify and prioritize the goals, challenges, and opportunities for critical and emerging technology needs within industry, with an emphasis on leveraging artificial intelligence, data-driven methodologies, and high-throughput and automated workflows for accelerating x-ray-based structural analysis for materials development and manufacturing. Participants, predominantly from industry, gathered in-person at ICDD headquarters in Newtown Square, Pennsylvania. The data collected during this workshop is published in this data publication. This data is interpreted in the workshop report, which cites this dataset.Certain equipment, instruments, software, or materials, commercial or non-commercial, are identified in this dataset. Such identification does not imply recommendation or endorsement of any product or service by NIST, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.
㈜위지윅스튜디오 - 안면 랜드마크 데이터
공공데이터포털
- 인공지능 학습용 데이터 구축을 위한 ‘안면 랜드마크 데이터’의 원천데이터를 획득하여 AI 학습용 데이터로 정제, 가공하여 공개하는 것을 목표로 함
㈜위지윅스튜디오 - 3D 사람 간 상호작용 데이터(3인 이상)
공공데이터포털
인공지능 학습용 데이터 구축을 위한 ‘사람 간 상호작용’ 데이터의 원천데이터를 획득하여 AI 학습용 데이터로 정제, 가공하여 공개하는 것을 목표로 함
㈜워트인텔리전스 - 지식재산권법 LLM 사전학습 및 Instruction Tuning 데이터
공공데이터포털
사법분야 초거대 AI 학습을 위해 지식재산권법의 법령, 판결문, 심결례, 심결문 및 유권해석 원시데이터로부터 각 분야의 문장을 추출/가공하여 질의/응답 및 요약을 위한 초거대 AI 학습용 Instruction tuning data를 구축함
㈜위지윅스튜디오 - 3D 사람 간 상호작용 데이터(2인)
공공데이터포털
인공지능 학습용 데이터 구축을 위한 ‘사람 간 상호작용’ 데이터의 원천데이터를 획득하여 AI 학습용 데이터로 정제, 가공하여 공개하는 것을 목표로 함
㈜위지윅스튜디오 - 시나리오 기반 표정 3D 데이터
공공데이터포털
- 인공지능 학습용 데이터 구축을 위한 ‘시나리오 기반 표정 3D 데이터’의 원천데이터를 획득하여 AI 학습용 데이터로 정제, 가공하여 공개하는 것을 목표로 함
AI 학습용 원시데이터 - 굿모닝 MBN
공공데이터포털
매경미디어그룹 MBN방송 동영상 데이터로 AI 인공지능 학습 활용 및 연구 개발 할 수 있는 원시 데이터 및 메타 데이터 셋을 제공 합니다. (동영상 가격 및 제공 프로토콜은 협의),,