데이터셋 상세
미국
Data for Elevated Manganese Concentrations in United States Groundwater, Role of Land Surface-Soil-Aquifer Connections
Chemical data from 43,334 wells were used to examine the role of land surface-soil-aquifer connections in producing elevated manganese concentrations (>300 µg/L) in United States (U.S.) groundwater. Elevated manganese and dissolved organic carbon (DOC) concentrations were associated with shallow water tables and organic-carbon rich soils, suggesting soil-derived DOC supported manganese reduction. Manganese and DOC concentrations were higher near rivers than farther from rivers, suggesting river-derived DOC also supported manganese reduction. Anthropogenic nitrogen may also affect manganese concentrations in groundwater. In parts of the northeastern U.S. containing poorly buffered soils, ~40% of the samples with elevated manganese concentrations had pH values <6 and elevated concentrations of dissolved oxygen and nitrate relative to samples with pH ≥6, suggesting acidic recharge produced by the oxidation of ammonium in fertilizer helped mobilize manganese. An estimated 2.6 million people potentially consume groundwater with elevated manganese concentrations, the highest densities of which occur near rivers and in areas with organic-carbon rich soil. Results from this study indicate land surface-soil-aquifer connections play an important role in producing elevated manganese concentrations in groundwater used for human consumption.
데이터 정보
연관 데이터
Data for Elevated Manganese Concentrations in United States Groundwater, Role of Land Surface-Soil-Aquifer Connections
공공데이터포털
Chemical data from 43,334 wells were used to examine the role of land surface-soil-aquifer connections in producing elevated manganese concentrations (>300 µg/L) in United States (U.S.) groundwater. Elevated manganese and dissolved organic carbon (DOC) concentrations were associated with shallow water tables and organic-carbon rich soils, suggesting soil-derived DOC supported manganese reduction. Manganese and DOC concentrations were higher near rivers than farther from rivers, suggesting river-derived DOC also supported manganese reduction. Anthropogenic nitrogen may also affect manganese concentrations in groundwater. In parts of the northeastern U.S. containing poorly buffered soils, ~40% of the samples with elevated manganese concentrations had pH values <6 and elevated concentrations of dissolved oxygen and nitrate relative to samples with pH ≥6, suggesting acidic recharge produced by the oxidation of ammonium in fertilizer helped mobilize manganese. An estimated 2.6 million people potentially consume groundwater with elevated manganese concentrations, the highest densities of which occur near rivers and in areas with organic-carbon rich soil. Results from this study indicate land surface-soil-aquifer connections play an important role in producing elevated manganese concentrations in groundwater used for human consumption.
Arsenic, manganese, and pH groundwater quality data, selected well construction characteristics, and aquifer assignments for wells in the conterminous U.S.
공공데이터포털
This data release contains groundwater-quality data for three parameters of interest (arsenic, manganese, and pH) and well information for sample sites for aquifers in the conterminous U.S. Water-quality data and well information were derived from a dataset compiled from three sources: the U.S. Geological Survey (USGS) National Water Information System (NWIS), the U.S. Environmental Protection Agency (USEPA) Safe Drinking Water Information System (SDWIS), and numerous agencies and organizations at the state, regional, and local level. The data compilation of the National Water Quality Program’s groundwater assessment team is an internal dataset informally referred to as the National Groundwater Aggregation (NGA). The current study of groundwater quality in the conterminous U.S. augments data compiled by others globally. Only geochemical parameters of interest (arsenic, manganese, pH) from wells in the national groundwater aggregation are presented—data from springs were not used. A table of site information includes attributes for each well, such as the state, water use code, depth, open interval (if available) and aquifer (if available). The provider of the water-quality data and well information in also in this table.
Arsenic, manganese, and pH groundwater quality data, selected well construction characteristics, and aquifer assignments for wells in the conterminous U.S.
공공데이터포털
This data release contains groundwater-quality data for three parameters of interest (arsenic, manganese, and pH) and well information for sample sites for aquifers in the conterminous U.S. Water-quality data and well information were derived from a dataset compiled from three sources: the U.S. Geological Survey (USGS) National Water Information System (NWIS), the U.S. Environmental Protection Agency (USEPA) Safe Drinking Water Information System (SDWIS), and numerous agencies and organizations at the state, regional, and local level. The data compilation of the National Water Quality Program’s groundwater assessment team is an internal dataset informally referred to as the National Groundwater Aggregation (NGA). The current study of groundwater quality in the conterminous U.S. augments data compiled by others globally. Only geochemical parameters of interest (arsenic, manganese, pH) from wells in the national groundwater aggregation are presented—data from springs were not used. A table of site information includes attributes for each well, such as the state, water use code, depth, open interval (if available) and aquifer (if available). The provider of the water-quality data and well information in also in this table.
Data used to model and map manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA
공공데이터포털
Data used to model and map manganese concentrations in groundwater in the Northern Atlantic Coastal Plain (NACP) aquifer system, eastern USA, are documented in this data release. The model predicts manganese concentration within four classes and is based on concentration data from 4492 wells. The well data were compiled from U.S. Geological Survey, U.S. Environmental Protection Agency, Suffolk County Water Authority (Suffolk County, New York), and state agency sources. The four concentration classes are based on guidelines for drinking water quality: below detection (class 1, less than 10 micrograms per liter (ug/L)); detected but less than the aesthetic guideline of 50 ug/L (class 2); greater than the aesthetic guideline but less than the health guideline of 300 ug/L (class 3); and greater than the health guideline of 300 ug/L (class 4). The thresholds of 50 ug/L and 300 ug/L are a Secondary Maximum Contaminant Level and a lifetime health advisory, respectively, from the U.S. Environmental Protection Agency for public water supplies. The model is built with the XGboost machine learning method. Explanatory variables (predictors) include well depth, soil characteristics, hydrologic variables, groundwater residence time, and predicted values of pH and of the probability of low dissolved oxygen from previous machine learning models of the aquifer system. The data are provided in data tables, raster files, and model files, organized as follows. One data table describes the 27 explanatory variables used in the model (NACP_Mn_explanatory_variables.csv). There is a data table for the well data used to develop the models, which includes the manganese concentrations, concentration classes, regional aquifer, explanatory variables, and predicted concentration class for the wells (NACP_Mn_well_data.csv). There is a compressed group (zip file) of 10 files (one for each regional aquifer) for explanatory variable data used to make predictions for the regional aquifers (NACP_Mn_prediction_input_aquifers.zip). There are two zip files providing model output, one for predictions made for each aquifer in text format and one for tif-format rasters of predictions for each aquifer. The data release also contains a tif-format raster file of the prediction grid and a zip file with the model object file (R data format) and a script that can be used to run the model to produce the predictions provided in this data release. Filenames for prediction input and for model output are distinguished by codes abbreviating the aquifer name and position in the vertical stack of 19 regional aquifers and confining units, as follows: Surficial aquifer, 1surf; Upper Chesapeake aquifer, 3upch; Lower Chesapeake aquifer, 5loch; Piney Point aquifer, 7pipt; Aquia aquifer, 9aqia; Monmouth - Mt. Laurel Aquifer, 11moml; Matawan aquifer, 13mtwn; Magothy Aquifer, 15mgty; Potomac-Patapsco aquifer, 17popt; Potomac-Patuxent aquifer, 19popx. The nine confining units are not represented in the model or predictions.
Data used to model and map manganese in the Northern Atlantic Coastal Plain aquifer system, eastern USA
공공데이터포털
Data used to model and map manganese concentrations in groundwater in the Northern Atlantic Coastal Plain (NACP) aquifer system, eastern USA, are documented in this data release. The model predicts manganese concentration within four classes and is based on concentration data from 4492 wells. The well data were compiled from U.S. Geological Survey, U.S. Environmental Protection Agency, Suffolk County Water Authority (Suffolk County, New York), and state agency sources. The four concentration classes are based on guidelines for drinking water quality: below detection (class 1, less than 10 micrograms per liter (ug/L)); detected but less than the aesthetic guideline of 50 ug/L (class 2); greater than the aesthetic guideline but less than the health guideline of 300 ug/L (class 3); and greater than the health guideline of 300 ug/L (class 4). The thresholds of 50 ug/L and 300 ug/L are a Secondary Maximum Contaminant Level and a lifetime health advisory, respectively, from the U.S. Environmental Protection Agency for public water supplies. The model is built with the XGboost machine learning method. Explanatory variables (predictors) include well depth, soil characteristics, hydrologic variables, groundwater residence time, and predicted values of pH and of the probability of low dissolved oxygen from previous machine learning models of the aquifer system. The data are provided in data tables, raster files, and model files, organized as follows. One data table describes the 27 explanatory variables used in the model (NACP_Mn_explanatory_variables.csv). There is a data table for the well data used to develop the models, which includes the manganese concentrations, concentration classes, regional aquifer, explanatory variables, and predicted concentration class for the wells (NACP_Mn_well_data.csv). There is a compressed group (zip file) of 10 files (one for each regional aquifer) for explanatory variable data used to make predictions for the regional aquifers (NACP_Mn_prediction_input_aquifers.zip). There are two zip files providing model output, one for predictions made for each aquifer in text format and one for tif-format rasters of predictions for each aquifer. The data release also contains a tif-format raster file of the prediction grid and a zip file with the model object file (R data format) and a script that can be used to run the model to produce the predictions provided in this data release. Filenames for prediction input and for model output are distinguished by codes abbreviating the aquifer name and position in the vertical stack of 19 regional aquifers and confining units, as follows: Surficial aquifer, 1surf; Upper Chesapeake aquifer, 3upch; Lower Chesapeake aquifer, 5loch; Piney Point aquifer, 7pipt; Aquia aquifer, 9aqia; Monmouth - Mt. Laurel Aquifer, 11moml; Matawan aquifer, 13mtwn; Magothy Aquifer, 15mgty; Potomac-Patapsco aquifer, 17popt; Potomac-Patuxent aquifer, 19popx. The nine confining units are not represented in the model or predictions.
Probability distribution grids of dissolved oxygen and dissolved manganese concentrations at selected thresholds in drinking water depth zones, Central Valley, California
공공데이터포털
The ascii grids represent regional probabilities that groundwater in a particular location will have dissolved oxygen (DO) concentrations less than selected threshold values representing anoxic groundwater conditions or will have dissolved manganese (Mn) concentrations greater than selected threshold values representing secondary drinking water-quality contaminant levels (SMCL) and health-based screening levels (HBSL) for water quality. The probability models were constrained by the alluvial boundary of the Central Valley to a depth of approximately 300 meters (m). We utilized prediction modeling methods, specifically boosted regression trees (BRT) with a Bernoulli error distribution within a statistical learning framework within R's computing framework (http://www.r-project.org/) to produce two-dimensional probability grids at selected depths throughout the modeling domain. The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. Models were constructed using measured dissolved oxygen and manganese concentrations sampled from 2,767 wells within the alluvial boundary of the Central Valley and over 60 predictor variables from 7 sources (see metadata) and were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt and others, 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. The wells used in the BRT models described above were attributed to predictor variable values in ArcGIS using a 500-m buffer. The response variable data consisted of measured DO and Mn concentrations from 2,767 wells within the alluvial boundary of the Central Valley. The data were compiled from two sources: U.S. Geological Survey (USGS) National Water Information System (NWIS) database (all data are publicly available from the USGS at http://waterdata.usgs.gov/ca/nwis/nwis) and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water-quality data are publicly available from the SWRCB at http://geotracker.waterboards.ca.gov/gama/). Only wells with well depth data were selected, and for wells with multiple records, only the most recent sample in the period 1993–2014 that had the required water-quality data was used. Data were available for 932 wells for the NWIS dataset and 1,835 wells for the SWRCB-DDW dataset. Models were trained on a USGS NWIS dataset of 932 wells and evaluated on an independent hold-out dataset of 1,835 wells from the SWRCB-DDW. We used cross-validation to assess the predictive performance of models of varying complexity as a basis for selecting the final models used to create the prediction grids. Trained models were applied to cross-validation testing data and a separate hold-out dataset to evaluate model predictive performance by emphasizing three model metrics of fit: Kappa, accuracy, and the area under the receiver operator characteristic (ROC) curve. The final trained models were used for mapping predictions at discrete depths to a depth of approximately 300 m. Trained DO and Mn models had accuracies of 86–100 percent, Kappa values of 0.69–0.99, and ROC values of 0.92–1.0. Model accuracies for cross-validation testing datasets were 82–95 percent, and ROC values were 0.87–0.91, indicating good predictive performance. Kappa values for the cross-validation testing dataset were 0.30–0.69, indicating fair to substantial agreement between testing observations and model predictions. Hold-out data were available for the manganese model only and indicated accuracies of 89–97 percent, ROC values of 0.73–0.75, and Kappa values of 0.06–0.30. The
Data and Model Archive for Preliminary Machine Learning Models of Manganese and 1,4-Dioxane in Groundwater on Long Island, New York
공공데이터포털
Data and preliminary machine-learning models used to predict manganese and 1,4-dioxane in groundwater on Long Island are documented in this data release. Concentration data used to develop the models were from 910 wells for manganese and 553 wells for 1,4-dioxane, primarily public supply wells, from U.S. Geological Survey, U.S. Environmental Protection Agency (USEPA), and Suffolk County Water Authority sources. Thirty-two explanatory variables describe depth, groundwater flow, land use, soil properties, and other features of the aquifer system. The models use XGBoost, an ensemble tree machine learning method. Four models are documented for manganese, predicting the probability of concentrations relative to four thresholds: 10 micrograms per liter (detection), 50 micrograms per liter (the USEPA Secondary Maximum Contaminant Level), 150 micrograms per liter, and 300 micrograms per liter (the USEPA lifetime health advisory). One model is documented for 1,4-dioxane, predicting the probability of concentrations relative to 0.07 micrograms per liter (detection). The models were used to predict concentrations in two layers of the upper glacial aquifer and three layers of the Magothy aquifer. Predictions were made at a 500-square-foot resolution across the entire island for manganese and across Suffolk County, which occupies the eastern two-thirds of Long Island, for 1,4-dioxane. The data are provided in data tables, raster files, and model files. One data table describes the 32 explanatory variables (LI_mn_14dx_exp_vars.txt). One data table describes the well data and includes the manganese and 1,4-dioxane concentrations, explanatory variables, and predictions for the wells (LI_mn_14dx_well_data.txt). There is a compressed group (zip file) of five files providing the explanatory variable data used to make predictions for the five aquifer layers (LI_mn_14dx_predinput_griddata.zip) and a zip file of 25 files providing model predictions for each model and aquifer layer (LI_mn_14dx_predoutput_rasters.zip). The data release also contains a tif-format raster file of the prediction grid (LI_mn_14dx_prediction_grid.tif). The models are documented in a zip file (LI_mn_14dx_models.zip) that contains the model object files (R data format) and scripts that can be used to run the models to produce the predictions provided in this data release. Filenames for prediction input and for model output are distinguished by names and numbers as follows: 1_upper_glacial, top layer of the upper glacial aquifer; 3_upper_glacial, bottom layer of the upper glacial aquifer; 5_Magothy, top layer of the Magothy aquifer; 14_Magothy, middle layer of the Magothy aquifer; and 23_Magothy, bottom layer of the Magothy aquifer.
Groundwater data, predictor variables, and rasters used for predicting the probability of high arsenic and high manganese in the Glacial Aquifer System, northern continental United States
공공데이터포털
This data release contains input data used in model development and TIF raster files used to predict the probability of high arsenic (As) and high manganese (Mn) in groundwater within the glacial aquifer system in the northern United States. Input data include measured As and Mn concentrations at groundwater wells, and associated predictor variable data. The probability of high As and high Mn was predicted using boosted regression tree methods using the gbm package in R version 4.0.0. The response variables for individual models were the occurrence of: (1) As >10 µg/L, and (2) Mn >300 µg/L. Water-quality data were compiled from three sources, as described in Wilson and others (2019): a compilation of data from numerous agencies and organizations at the state, regional, and local level; the U.S. Geological Survey National Water Information System; and the U.S. Environmental Protection Agency Safe Drinking Water Information System. The resultant dataset consisted of 10,001 As and 14,565 Mn measurements across the study area. A total of 108 predictor variables were originally considered for model development which included well characteristics, soil properties, aquifer properties, predicted nitrate, hydrologic position on the landscape, groundwater age, predicted pH, and predicted anoxic conditions. After model refinement, a total of 79 and 55 predictor variables were used for predicting the probability of high As and high Mn, respectively. The probability of high As and high Mn was predicted at two depths representative of public and domestic drinking water supply depths at a resolution of 1 km across the glacial aquifer.
American River At Rainbow Bridge Manganese ug/L Time Series Data
공공데이터포털
Measurements of Manganese collected at American River At Rainbow Bridge. Currently collected twice a year, previously collected quarterly. Access further information for this data set by contacting Bureau of Reclamation, California-Great Basin Region, Environmental Affairs Division (CGB-157). See ResultAttributes for STAFF_GAUGE, SMPL_DEPTH, SMPL_CATEGORY_NAME, METHOD_CODE, RESULT_RL, RESULT_RL-UNIT_STD_NAME, RESULT_MDL, RESULT_MDL-UNIT_STD_NAME, USBR_QA_SUBTYPE_NAME, USBR_QULFR_DESCRIPTION. STAFF_GAUGE is the water height in decimal feet measured by gauge (e.g., 15.2). SMPL_DEPTH is the vertical depth at which sample is collected (e.g., 0 - 15 cm). For water samples: depth below water/air interface. For sediment and soil samples: depth below water/solid or air/solid interface. SMPL_CATEGORY_NAME is the category type of sample (e.g., Composite). METHOD_CODE is the name of method used to obtain result (e.g., EPA 200.8). RESULT_RL is the result reporting limit (accounting for dilution) (e.g., 0.02). RESULT_RL-UNIT_STD_NAME is the unit associated with RESULT_RL (e.g., mg/L). RESULT_MDL is the result method detection limit (e.g., 0.007). RESULT_MDL-UNIT_STD_NAME is the unit associated with RESULT_MDL (e.g., mg/L). USBR_QA_SUBTYPE_NAME is the quality control type of the sample (e.g., USBR_BLANK_SPIKE). USBR_QULFR_DESCRIPTION is the quality assurance description (if any) (e.g., Result may have a high bias.).