데이터셋 상세
미국
Machine Learning Modeling of Water Quality Based Risk Assessment
This is the geospatial and hydroclimate input data used to develop data-driven Machine Learning (ML) models as well as model estimated water quality based risk metrics and watershed health composite measure in three river basins in the Midwest. Model outputs that are used to construct the figures in the paper are displayed in the Excel file with the definitions of the data reported in each datasheet. The directory to the GIS data that were used to construct the inputs and spatially distributed risk metrics at the HUC-10 level is listed here and in the Scientific Data Management Plan. Portions of this dataset are inaccessible because: Large size GIS database. They can be accessed through the following means: C:\Users\MHantush\OneDrive - Environmental Protection Agency (EPA)\ScienceHUB\WH ML Modeling. Format: Generic GIS database.
데이터 정보
연관 데이터
Machine Learning Modeling of Water Quality Based Risk Assessment
공공데이터포털
This is the geospatial and hydroclimate input data used to develop data-driven Machine Learning (ML) models as well as model estimated water quality based risk metrics and watershed health composite measure in three river basins in the Midwest. Model outputs that are used to construct the figures in the paper are displayed in the Excel file with the definitions of the data reported in each datasheet. The directory to the GIS data that were used to construct the inputs and spatially distributed risk metrics at the HUC-10 level is listed here and in the Scientific Data Management Plan. Portions of this dataset are inaccessible because: Large size GIS database. They can be accessed through the following means: C:\Users\MHantush\OneDrive - Environmental Protection Agency (EPA)\ScienceHUB\WH ML Modeling. Format: Generic GIS database.
Data to support Leveraging machine learning to automate regression model evaluations for large multi-site water-quality trend studies
공공데이터포털
This data release contains one dataset and one model archive in support of the journal article "Leveraging machine learning to automate regression model evaluations for large multi-site water-quality trend studies" by Jennifer C. Murphy and Jeffrey G. Chanat. The model archive contains scripts (run in R) to reproduce the four machine learning models (logistic regression, linear and quadratic discriminant analysis, and k-nearest neighbors) trained and tested as part of the journal article. The dataset contains the estimated probabilities for each of these models when applied to a training and test dataset.
Machine learning model that estimates total monthly and annual per capita public-supply water use (version 2.0)
공공데이터포털
This child item describes a machine learning model that was developed to estimate public-supply water use by water service area (WSA) boundary and 12-digit hydrologic unit code (HUC12) for the conterminous United States. This model was used to develop an annual and monthly reanalysis of public supply water use for the period 2000-2020. This data release contains model input feature datasets, python codes used to develop and train the water use machine learning model, and output water use predictions by HUC12 and WSA. Public supply water use estimates and statistics files for HUC12s are available on this child item landing page. Public supply water use estimates and statistics for WSAs are available in public_water_use_model.zip. This page includes the following files: PS_HUC12_Tot_2000_2020.csv - a csv file with estimated monthly public supply total water use from 2000-2020 by HUC12, in million gallons per day PS_HUC12_GW_2000_2020.csv - a csv file with estimated monthly public supply groundwater use for 2000-2020 by HUC12, in million gallons per day PS_HUC12_SW_2000_2020.csv - a csv file with estimated monthly public supply surface water use for 2000-2020 by HUC12, in million gallons per day Note: 1) Groundwater and surface water fractions were determined using source counts as described in the 'R code that determines groundwater and surface water source fractions for public-supply water service areas, counties, and 12-digit hydrologic units' child item. 2) Some HUC12s have estimated water use of zero because no public-supply water service areas were modeled within the HUC. STAT_PS_HUC12_Tot_2000_2020.csv - a csv file with statistics by HUC12 for the estimated monthly public supply total water use from 2000-2020 STAT_PS_HUC12_GW_2000_2020.csv - a csv file with statistics by HUC12 for the estimated monthly public supply groundwater use for 2000-2020 STAT_PS_HUC12_SW_2000_2020.csv - a csv file with statistics by HUC12 for the estimated monthly public supply surface water use for 2000-2020 public_water_use_model.zip - a zip file containing input datasets, scripts, and output datasets for the public supply water use machine learning model version_history_MLmodel.txt - a txt file describing changes in this version
Machine learning model that estimates total monthly and annual per capita public-supply water use (version 2.0)
공공데이터포털
This child item describes a machine learning model that was developed to estimate public-supply water use by water service area (WSA) boundary and 12-digit hydrologic unit code (HUC12) for the conterminous United States. This model was used to develop an annual and monthly reanalysis of public supply water use for the period 2000-2020. This data release contains model input feature datasets, python codes used to develop and train the water use machine learning model, and output water use predictions by HUC12 and WSA. Public supply water use estimates and statistics files for HUC12s are available on this child item landing page. Public supply water use estimates and statistics for WSAs are available in public_water_use_model.zip. This page includes the following files: PS_HUC12_Tot_2000_2020.csv - a csv file with estimated monthly public supply total water use from 2000-2020 by HUC12, in million gallons per day PS_HUC12_GW_2000_2020.csv - a csv file with estimated monthly public supply groundwater use for 2000-2020 by HUC12, in million gallons per day PS_HUC12_SW_2000_2020.csv - a csv file with estimated monthly public supply surface water use for 2000-2020 by HUC12, in million gallons per day Note: 1) Groundwater and surface water fractions were determined using source counts as described in the 'R code that determines groundwater and surface water source fractions for public-supply water service areas, counties, and 12-digit hydrologic units' child item. 2) Some HUC12s have estimated water use of zero because no public-supply water service areas were modeled within the HUC. STAT_PS_HUC12_Tot_2000_2020.csv - a csv file with statistics by HUC12 for the estimated monthly public supply total water use from 2000-2020 STAT_PS_HUC12_GW_2000_2020.csv - a csv file with statistics by HUC12 for the estimated monthly public supply groundwater use for 2000-2020 STAT_PS_HUC12_SW_2000_2020.csv - a csv file with statistics by HUC12 for the estimated monthly public supply surface water use for 2000-2020 public_water_use_model.zip - a zip file containing input datasets, scripts, and output datasets for the public supply water use machine learning model version_history_MLmodel.txt - a txt file describing changes in this version
Machine-learning model predictions and groundwater-quality rasters of chloride in aquifers of the Mississippi Embayment
공공데이터포털
Groundwater is a vital resource in the Mississippi embayment of the central United States. An innovative approach using machine learning (ML) was employed to predict groundwater salinity—including specific conductance (SC), total dissolved solids (TDS), and chloride (Cl) concentrations—across three drinking-water aquifers of the Mississippi embayment. A ML approach was used because it accommodates a large and diverse set of explanatory variables, does not assume monotonic relations between predictors and response data, and results can be extrapolated to areas of the aquifer not sampled. These aspects of ML allowed potential drivers and sources of high salinity water that have been hypothesized in other studies to be included as explanatory variables. The ML approach integrated output from a groundwater-flow model and water-quality data to predict salinity, and the approach can be applied to other aquifers to provide context for the long-term availability of groundwater resources. The Mississippi embayment includes two principal regional aquifer systems; the surficial aquifer system, dominated by the Quaternary Mississippi River Valley Alluvial aquifer (MRVA), and the Mississippi embayment aquifer system, which includes deeper Tertiary aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling focused on the MRVA, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ). Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were developed to predict SC and Cl to 1-kilometer (km) raster grid cells of the National Hydrologic Grid (Clark and others, 2018) for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework of Hart and others (2008). TDS maps were created using the correlation between SC and TDS. Explanatory variables for the BRT models included attributes associated with well location and construction, surficial variables (such as soils and land use), and variables extracted from a MODFLOW groundwater flow model for the Mississippi embayment (Haugh and others, 2020a; Haugh and others, 2020b). Prediction intervals were calculated for SC and Cl by bootstrapping raster-cell predictions following methods from Ransom and others (2017). For a full description of modeling workflow and final model selection see Knierim and others (2020).
Machine-learning model predictions and groundwater-quality rasters of chloride in aquifers of the Mississippi Embayment
공공데이터포털
Groundwater is a vital resource in the Mississippi embayment of the central United States. An innovative approach using machine learning (ML) was employed to predict groundwater salinity—including specific conductance (SC), total dissolved solids (TDS), and chloride (Cl) concentrations—across three drinking-water aquifers of the Mississippi embayment. A ML approach was used because it accommodates a large and diverse set of explanatory variables, does not assume monotonic relations between predictors and response data, and results can be extrapolated to areas of the aquifer not sampled. These aspects of ML allowed potential drivers and sources of high salinity water that have been hypothesized in other studies to be included as explanatory variables. The ML approach integrated output from a groundwater-flow model and water-quality data to predict salinity, and the approach can be applied to other aquifers to provide context for the long-term availability of groundwater resources. The Mississippi embayment includes two principal regional aquifer systems; the surficial aquifer system, dominated by the Quaternary Mississippi River Valley Alluvial aquifer (MRVA), and the Mississippi embayment aquifer system, which includes deeper Tertiary aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling focused on the MRVA, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ). Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were developed to predict SC and Cl to 1-kilometer (km) raster grid cells of the National Hydrologic Grid (Clark and others, 2018) for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework of Hart and others (2008). TDS maps were created using the correlation between SC and TDS. Explanatory variables for the BRT models included attributes associated with well location and construction, surficial variables (such as soils and land use), and variables extracted from a MODFLOW groundwater flow model for the Mississippi embayment (Haugh and others, 2020a; Haugh and others, 2020b). Prediction intervals were calculated for SC and Cl by bootstrapping raster-cell predictions following methods from Ransom and others (2017). For a full description of modeling workflow and final model selection see Knierim and others (2020).
Supporting data for analysis of general water-quality conditions, long-term trends, and network analysis at selected sites within the Missouri Ambient Water-Quality Monitoring Network, water years 1993–2017
공공데이터포털
The U.S. Geological Survey (USGS), in cooperation with the Missouri Department of Natural Resources (MDNR), collects data pertaining to the surface-water resources of Missouri. These data are collected as part of the Missouri Ambient Water-Quality Monitoring Network (AWQMN) and are stored and maintained by the USGS National Water Information System (NWIS) database. These data constitute a valuable source of reliable, impartial, and timely information for developing an improved understanding of the water resources of the State. Water-quality data collected between water years 1993 and 2017 were analyzed for long term trends and the network was investigated to identify data gaps or redundant data to assist MDNR on how to optimize the network in the future. This is a companion data release product to the Scientific Investigation Report: Richards, J.M., and Barr, M.N., 2021, General water-quality conditions, long-term trends, and network analysis at selected sites within the Ambient Water-Quality Monitoring Network in Missouri, water years 1993–2017: U.S. Geological Survey Scientific Investigations Report 2021–5079, 75 p., https://doi.org/10.3133/sir20215079. The following selected tables are included in this data release in compressed (.zip) format: AWQMN_EGRET_data.xlsx -- Data retrieved from the USGS National Water Information System database that was quality assured and conditioned for network analysis of the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_data.xlsx -- Data retrieved from the USGS National Water Information System database that was quality assured and conditioned for analysis of flow-weighted trends for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_outliers.xlsx -- Data flagged as outliers during analysis of flow-weighted trends for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_outliers_quarterly.xlsx -- Data flagged as outliers during analysis of flow-weighted trends using a simulated quarterly sampling frequency dataset for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_descriptive_statistics_WY1993-2017.xlsx -- Descriptive statistics for selected water-quality parameters at selected sites in the Missouri Ambient Water-Quality Monitoring Network The following selected graphics are included in this data release in .pdf format. Also included in this data release are web pages accessible for people with disabilities provided in compressed .zip format. The web pages present the same information as the .pdf files: Annual and seasonal discharge trends.pdf -- Graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Annual_and_seasonal_discharge_trends_htm.zip -- Compressed web page presenting graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of simulated quarterly sampling frequency trends.pdf -- Graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics_of_simulated_quarterly_sampling_frequency_trends_htm.zip -- Compressed web page presenting graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of median parameter values.pdf -- Graphics of median values for selected parameters at selected
Supporting data for analysis of general water-quality conditions, long-term trends, and network analysis at selected sites within the Missouri Ambient Water-Quality Monitoring Network, water years 1993–2017
공공데이터포털
The U.S. Geological Survey (USGS), in cooperation with the Missouri Department of Natural Resources (MDNR), collects data pertaining to the surface-water resources of Missouri. These data are collected as part of the Missouri Ambient Water-Quality Monitoring Network (AWQMN) and are stored and maintained by the USGS National Water Information System (NWIS) database. These data constitute a valuable source of reliable, impartial, and timely information for developing an improved understanding of the water resources of the State. Water-quality data collected between water years 1993 and 2017 were analyzed for long term trends and the network was investigated to identify data gaps or redundant data to assist MDNR on how to optimize the network in the future. This is a companion data release product to the Scientific Investigation Report: Richards, J.M., and Barr, M.N., 2021, General water-quality conditions, long-term trends, and network analysis at selected sites within the Ambient Water-Quality Monitoring Network in Missouri, water years 1993–2017: U.S. Geological Survey Scientific Investigations Report 2021–5079, 75 p., https://doi.org/10.3133/sir20215079. The following selected tables are included in this data release in compressed (.zip) format: AWQMN_EGRET_data.xlsx -- Data retrieved from the USGS National Water Information System database that was quality assured and conditioned for network analysis of the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_data.xlsx -- Data retrieved from the USGS National Water Information System database that was quality assured and conditioned for analysis of flow-weighted trends for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_outliers.xlsx -- Data flagged as outliers during analysis of flow-weighted trends for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_outliers_quarterly.xlsx -- Data flagged as outliers during analysis of flow-weighted trends using a simulated quarterly sampling frequency dataset for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_descriptive_statistics_WY1993-2017.xlsx -- Descriptive statistics for selected water-quality parameters at selected sites in the Missouri Ambient Water-Quality Monitoring Network The following selected graphics are included in this data release in .pdf format. Also included in this data release are web pages accessible for people with disabilities provided in compressed .zip format. The web pages present the same information as the .pdf files: Annual and seasonal discharge trends.pdf -- Graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Annual_and_seasonal_discharge_trends_htm.zip -- Compressed web page presenting graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of simulated quarterly sampling frequency trends.pdf -- Graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics_of_simulated_quarterly_sampling_frequency_trends_htm.zip -- Compressed web page presenting graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of median parameter values.pdf -- Graphics of median values for selected parameters at selected
Lake Erie HABs Modeling Dataset
공공데이터포털
The dataset include hydroclimate and ambient environmental as the input data and Cyanobacterial HABs Index (CI) calculated from satellite imageries as the output data altogether used to train and validate three data-driven (machine learning) models and their Ensemble Average (AE) to predict HABs cell count in southwest Lake Erie. The data also include HABs volumetric and areal concentrations obtained from literature and used in conjunction with the CI calculated from satellite data to develop statistical regression models for use to convert model predicted CI values (cell counts) to volumetric/areal concentrations of HABs.
Lake Erie HABs Modeling Dataset
공공데이터포털
The dataset include hydroclimate and ambient environmental as the input data and Cyanobacterial HABs Index (CI) calculated from satellite imageries as the output data altogether used to train and validate three data-driven (machine learning) models and their Ensemble Average (AE) to predict HABs cell count in southwest Lake Erie. The data also include HABs volumetric and areal concentrations obtained from literature and used in conjunction with the CI calculated from satellite data to develop statistical regression models for use to convert model predicted CI values (cell counts) to volumetric/areal concentrations of HABs.