Probability distribution grids of dissolved oxygen and dissolved manganese concentrations at selected thresholds in drinking water depth zones, Central Valley, California
공공데이터포털
The ascii grids represent regional probabilities that groundwater in a particular location will have dissolved oxygen (DO) concentrations less than selected threshold values representing anoxic groundwater conditions or will have dissolved manganese (Mn) concentrations greater than selected threshold values representing secondary drinking water-quality contaminant levels (SMCL) and health-based screening levels (HBSL) for water quality. The probability models were constrained by the alluvial boundary of the Central Valley to a depth of approximately 300 meters (m). We utilized prediction modeling methods, specifically boosted regression trees (BRT) with a Bernoulli error distribution within a statistical learning framework within R's computing framework (http://www.r-project.org/) to produce two-dimensional probability grids at selected depths throughout the modeling domain. The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. Models were constructed using measured dissolved oxygen and manganese concentrations sampled from 2,767 wells within the alluvial boundary of the Central Valley and over 60 predictor variables from 7 sources (see metadata) and were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt and others, 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. The wells used in the BRT models described above were attributed to predictor variable values in ArcGIS using a 500-m buffer. The response variable data consisted of measured DO and Mn concentrations from 2,767 wells within the alluvial boundary of the Central Valley. The data were compiled from two sources: U.S. Geological Survey (USGS) National Water Information System (NWIS) database (all data are publicly available from the USGS at http://waterdata.usgs.gov/ca/nwis/nwis) and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water-quality data are publicly available from the SWRCB at http://geotracker.waterboards.ca.gov/gama/). Only wells with well depth data were selected, and for wells with multiple records, only the most recent sample in the period 1993–2014 that had the required water-quality data was used. Data were available for 932 wells for the NWIS dataset and 1,835 wells for the SWRCB-DDW dataset. Models were trained on a USGS NWIS dataset of 932 wells and evaluated on an independent hold-out dataset of 1,835 wells from the SWRCB-DDW. We used cross-validation to assess the predictive performance of models of varying complexity as a basis for selecting the final models used to create the prediction grids. Trained models were applied to cross-validation testing data and a separate hold-out dataset to evaluate model predictive performance by emphasizing three model metrics of fit: Kappa, accuracy, and the area under the receiver operator characteristic (ROC) curve. The final trained models were used for mapping predictions at discrete depths to a depth of approximately 300 m. Trained DO and Mn models had accuracies of 86–100 percent, Kappa values of 0.69–0.99, and ROC values of 0.92–1.0. Model accuracies for cross-validation testing datasets were 82–95 percent, and ROC values were 0.87–0.91, indicating good predictive performance. Kappa values for the cross-validation testing dataset were 0.30–0.69, indicating fair to substantial agreement between testing observations and model predictions. Hold-out data were available for the manganese model only and indicated accuracies of 89–97 percent, ROC values of 0.73–0.75, and Kappa values of 0.06–0.30. The
Data for Elevated Manganese Concentrations in United States Groundwater, Role of Land Surface-Soil-Aquifer Connections
공공데이터포털
Chemical data from 43,334 wells were used to examine the role of land surface-soil-aquifer connections in producing elevated manganese concentrations (>300 µg/L) in United States (U.S.) groundwater. Elevated manganese and dissolved organic carbon (DOC) concentrations were associated with shallow water tables and organic-carbon rich soils, suggesting soil-derived DOC supported manganese reduction. Manganese and DOC concentrations were higher near rivers than farther from rivers, suggesting river-derived DOC also supported manganese reduction. Anthropogenic nitrogen may also affect manganese concentrations in groundwater. In parts of the northeastern U.S. containing poorly buffered soils, ~40% of the samples with elevated manganese concentrations had pH values <6 and elevated concentrations of dissolved oxygen and nitrate relative to samples with pH ≥6, suggesting acidic recharge produced by the oxidation of ammonium in fertilizer helped mobilize manganese. An estimated 2.6 million people potentially consume groundwater with elevated manganese concentrations, the highest densities of which occur near rivers and in areas with organic-carbon rich soil. Results from this study indicate land surface-soil-aquifer connections play an important role in producing elevated manganese concentrations in groundwater used for human consumption.
Data for Elevated Manganese Concentrations in United States Groundwater, Role of Land Surface-Soil-Aquifer Connections
공공데이터포털
Chemical data from 43,334 wells were used to examine the role of land surface-soil-aquifer connections in producing elevated manganese concentrations (>300 µg/L) in United States (U.S.) groundwater. Elevated manganese and dissolved organic carbon (DOC) concentrations were associated with shallow water tables and organic-carbon rich soils, suggesting soil-derived DOC supported manganese reduction. Manganese and DOC concentrations were higher near rivers than farther from rivers, suggesting river-derived DOC also supported manganese reduction. Anthropogenic nitrogen may also affect manganese concentrations in groundwater. In parts of the northeastern U.S. containing poorly buffered soils, ~40% of the samples with elevated manganese concentrations had pH values <6 and elevated concentrations of dissolved oxygen and nitrate relative to samples with pH ≥6, suggesting acidic recharge produced by the oxidation of ammonium in fertilizer helped mobilize manganese. An estimated 2.6 million people potentially consume groundwater with elevated manganese concentrations, the highest densities of which occur near rivers and in areas with organic-carbon rich soil. Results from this study indicate land surface-soil-aquifer connections play an important role in producing elevated manganese concentrations in groundwater used for human consumption.
Groundwater data, predictor variables, and rasters used for predicting the probability of high arsenic and high manganese in the Glacial Aquifer System, northern continental United States
공공데이터포털
This data release contains input data used in model development and TIF raster files used to predict the probability of high arsenic (As) and high manganese (Mn) in groundwater within the glacial aquifer system in the northern United States. Input data include measured As and Mn concentrations at groundwater wells, and associated predictor variable data. The probability of high As and high Mn was predicted using boosted regression tree methods using the gbm package in R version 4.0.0. The response variables for individual models were the occurrence of: (1) As >10 µg/L, and (2) Mn >300 µg/L. Water-quality data were compiled from three sources, as described in Wilson and others (2019): a compilation of data from numerous agencies and organizations at the state, regional, and local level; the U.S. Geological Survey National Water Information System; and the U.S. Environmental Protection Agency Safe Drinking Water Information System. The resultant dataset consisted of 10,001 As and 14,565 Mn measurements across the study area. A total of 108 predictor variables were originally considered for model development which included well characteristics, soil properties, aquifer properties, predicted nitrate, hydrologic position on the landscape, groundwater age, predicted pH, and predicted anoxic conditions. After model refinement, a total of 79 and 55 predictor variables were used for predicting the probability of high As and high Mn, respectively. The probability of high As and high Mn was predicted at two depths representative of public and domestic drinking water supply depths at a resolution of 1 km across the glacial aquifer.
Groundwater data, predictor variables, and rasters used for predicting the probability of high arsenic and high manganese in the Glacial Aquifer System, northern continental United States
공공데이터포털
This data release contains input data used in model development and TIF raster files used to predict the probability of high arsenic (As) and high manganese (Mn) in groundwater within the glacial aquifer system in the northern United States. Input data include measured As and Mn concentrations at groundwater wells, and associated predictor variable data. The probability of high As and high Mn was predicted using boosted regression tree methods using the gbm package in R version 4.0.0. The response variables for individual models were the occurrence of: (1) As >10 µg/L, and (2) Mn >300 µg/L. Water-quality data were compiled from three sources, as described in Wilson and others (2019): a compilation of data from numerous agencies and organizations at the state, regional, and local level; the U.S. Geological Survey National Water Information System; and the U.S. Environmental Protection Agency Safe Drinking Water Information System. The resultant dataset consisted of 10,001 As and 14,565 Mn measurements across the study area. A total of 108 predictor variables were originally considered for model development which included well characteristics, soil properties, aquifer properties, predicted nitrate, hydrologic position on the landscape, groundwater age, predicted pH, and predicted anoxic conditions. After model refinement, a total of 79 and 55 predictor variables were used for predicting the probability of high As and high Mn, respectively. The probability of high As and high Mn was predicted at two depths representative of public and domestic drinking water supply depths at a resolution of 1 km across the glacial aquifer.
The Measurement of Water Column Methylmercury Production Potential Rates in Four Oxygenated Reservoirs of the San Francisco Bay Watershed
공공데이터포털
Valley Water (formerly Santa Clara Valley Water District) provides stream stewardship, wholesale water supply, and flood protection for Santa Clara County, California, in the southern San Francisco Bay Area. A number of the reservoirs owned by the agency sit in the eastern foothills of the Santa Cruz mountains above the city of San Jose and in the watershed below the former New Almaden Mining District, which was historically the largest mercury (Hg) mining district in North America. Aquatic biota in a number of these reservoirs have elevated levels of Hg, presumably as a result of their downstream location relative to the former mining district. The form of Hg that most readily accumulates in aquatic food webs is methylmercury (MeHg), which is produced from inorganic Hg in aquatic systems under oxygen limited and anoxic conditions by specific microbial groups. Reservoirs have been shown to be particularly prone to elevated levels of MeHg and subsequent bioaccumulation due to their propensity for water column stratification and low oxygen conditions in bottom waters and surface sediment. Water column aeration/oxygenation has been used in a number of reservoirs to mitigate in situ MeHg formation by changing the oxidation-reduction status in both bottom waters and surface sediment from conditions that are more conducive to microbial MeHg production (reducing conditions) to those that are less conducive (less reducing to oxidized conditions). As such, Valley Water installed hypolimnetic oxygenation systems in the downstream portions of three reservoirs (Almaden, Calero, and Guadalupe) contaminated with Hg associated with the New Almaden Mining District and in one reservoir (Steven’s Creek) that is located in a watershed with no historical Hg mining. Most studies of MeHg production over the last 45 years have focused on the shallow benthic zone, where this process was first identified, has been studied intensively, and where rates of MeHg formation are often readily measurable. More recently however, a few studies have suggested, both by indirect evidence and direct measurement, that under certain conditions the formation of MeHg can also occur within the water column. Compared to the direct measurement of MeHg production in surface sediment (e.g. using radiotracer or stable isotope approaches), MeHg formation in the water column is generally far more technically challenging to directly measure due to the limited concentrations of both organic particulates and bacteria involved in the process, relative to their concentrations in surface sediment. In coordination with Valley Water, the U.S. Geological Survey investigated the potential for water column MeHg production in the four reservoirs noted above. Measurements were made at five depths (ranging from just below the water surface to just above the sediment surface) using a 24-hour stable isotope (200Hg(II)) amendment / bottle incubation approach with freshly collected water samples. Precise sampling depths were determined on site after profiling the complete water column with a water quality sonde and considering the profiles and inflection points associated with temperature, pH, specific conductance and dissolved oxygen. Two sets of experiments were conducted. The first was during May 2019, just prior to the initiation of the seasonal water column aeration process. The second was during August 2019 after the aeration units had been in operation for approximately 3.5 months (May through August). In each case, two experimental treatments were applied to each set of water samples. The first treatment involved raw (unfiltered) water samples. The second treatment involved the same water, but amended with additional suspended particulate material (SPM) collected from the same water collection site using either a 10 micron or 64 micron plankton net (composited vertical tows from the full water column). This second treatment was included to increase the abundance in the incubation
The Measurement of Water Column Methylmercury Production Potential Rates in Four Oxygenated Reservoirs of the San Francisco Bay Watershed
공공데이터포털
Valley Water (formerly Santa Clara Valley Water District) provides stream stewardship, wholesale water supply, and flood protection for Santa Clara County, California, in the southern San Francisco Bay Area. A number of the reservoirs owned by the agency sit in the eastern foothills of the Santa Cruz mountains above the city of San Jose and in the watershed below the former New Almaden Mining District, which was historically the largest mercury (Hg) mining district in North America. Aquatic biota in a number of these reservoirs have elevated levels of Hg, presumably as a result of their downstream location relative to the former mining district. The form of Hg that most readily accumulates in aquatic food webs is methylmercury (MeHg), which is produced from inorganic Hg in aquatic systems under oxygen limited and anoxic conditions by specific microbial groups. Reservoirs have been shown to be particularly prone to elevated levels of MeHg and subsequent bioaccumulation due to their propensity for water column stratification and low oxygen conditions in bottom waters and surface sediment. Water column aeration/oxygenation has been used in a number of reservoirs to mitigate in situ MeHg formation by changing the oxidation-reduction status in both bottom waters and surface sediment from conditions that are more conducive to microbial MeHg production (reducing conditions) to those that are less conducive (less reducing to oxidized conditions). As such, Valley Water installed hypolimnetic oxygenation systems in the downstream portions of three reservoirs (Almaden, Calero, and Guadalupe) contaminated with Hg associated with the New Almaden Mining District and in one reservoir (Steven’s Creek) that is located in a watershed with no historical Hg mining. Most studies of MeHg production over the last 45 years have focused on the shallow benthic zone, where this process was first identified, has been studied intensively, and where rates of MeHg formation are often readily measurable. More recently however, a few studies have suggested, both by indirect evidence and direct measurement, that under certain conditions the formation of MeHg can also occur within the water column. Compared to the direct measurement of MeHg production in surface sediment (e.g. using radiotracer or stable isotope approaches), MeHg formation in the water column is generally far more technically challenging to directly measure due to the limited concentrations of both organic particulates and bacteria involved in the process, relative to their concentrations in surface sediment. In coordination with Valley Water, the U.S. Geological Survey investigated the potential for water column MeHg production in the four reservoirs noted above. Measurements were made at five depths (ranging from just below the water surface to just above the sediment surface) using a 24-hour stable isotope (200Hg(II)) amendment / bottle incubation approach with freshly collected water samples. Precise sampling depths were determined on site after profiling the complete water column with a water quality sonde and considering the profiles and inflection points associated with temperature, pH, specific conductance and dissolved oxygen. Two sets of experiments were conducted. The first was during May 2019, just prior to the initiation of the seasonal water column aeration process. The second was during August 2019 after the aeration units had been in operation for approximately 3.5 months (May through August). In each case, two experimental treatments were applied to each set of water samples. The first treatment involved raw (unfiltered) water samples. The second treatment involved the same water, but amended with additional suspended particulate material (SPM) collected from the same water collection site using either a 10 micron or 64 micron plankton net (composited vertical tows from the full water column). This second treatment was included to increase the abundance in the incubation
Multivariate regression model for predicting oxygen reduction rates in groundwater for the State of Wisconsin
공공데이터포털
A multivariate regression model was developed to predict zero-order oxygen reduction rates (mg/L/yr) in aquifers across the State of Wisconsin. The model used a combination of dissolved oxygen concentrations and mean groundwater ages estimated with sampled age tracers from wells in the U.S. Geological Survey National Water Information System and previously published project reports from state agencies and universities. The multivariate regression model was solved using the Microsoft Excel solver, with 461 wells used for training and 46 wells held-out for validation. A total of 31 predictor variables were used for model development (56 were tested), including basic well characteristics, soil properties, aquifer properties, hydrologic position on the landscape, recharge and evapotranspiration rates, and land use characteristics. Model results indicate that the mean oxygen reduction rate for the training wells is 0.15 mg/L/yr (ranges from 0.07 to 0.59 mg/L/yr), with a root mean weighted square error of 3.13 mg/L/yr and Coefficient of Correlation (r^2) of 0.49 for the holdout validation data. This data release includes the Microsoft Excel file that represents the final solved regression model, as well as an Excel file that describes all of the predictor variables that were tested with the model.
Multivariate regression model for predicting oxygen reduction rates in groundwater for the State of Wisconsin
공공데이터포털
A multivariate regression model was developed to predict zero-order oxygen reduction rates (mg/L/yr) in aquifers across the State of Wisconsin. The model used a combination of dissolved oxygen concentrations and mean groundwater ages estimated with sampled age tracers from wells in the U.S. Geological Survey National Water Information System and previously published project reports from state agencies and universities. The multivariate regression model was solved using the Microsoft Excel solver, with 461 wells used for training and 46 wells held-out for validation. A total of 31 predictor variables were used for model development (56 were tested), including basic well characteristics, soil properties, aquifer properties, hydrologic position on the landscape, recharge and evapotranspiration rates, and land use characteristics. Model results indicate that the mean oxygen reduction rate for the training wells is 0.15 mg/L/yr (ranges from 0.07 to 0.59 mg/L/yr), with a root mean weighted square error of 3.13 mg/L/yr and Coefficient of Correlation (r^2) of 0.49 for the holdout validation data. This data release includes the Microsoft Excel file that represents the final solved regression model, as well as an Excel file that describes all of the predictor variables that were tested with the model.
Concentration Data for 12 Elements of Concern Used in the Development of Surrogate Models for Estimating Elemental Concentrations in Surface Water of Three Hydrologic Basins (Delaware River, Illinois River and Upper Colorado River)
공공데이터포털
The release of elements of concern (EoC) to surface water can involve both natural and anthropogenic sources. Elevated EoC concentrations can pose a risk to human health, wildlife, and ecosystem health, with the modes of toxicity and extent of risk varying as a function of the specific element, its chemical form and the matrix with which it is associated (for example, dissolved versus particulate). As part of the U.S. Geological Survey (USGS) Water Mission Area (WMA) Water Quality Processes Program, the Proxies (Surrogate) Project was created, in part, to develop models that can be used to estimate the concentration of EoC in riverine surface water at spatial scales ranging from (sub)basin to multi-basin. Three hydrologic basins were selected for EoC proxy model development; the Delaware River Basin (DRB), the Illinois River Basin (ILRB) and the Upper Colorado (UCOL) River Basin. These basins are part of the USGS network of integrated water science basins identified by the USGS WMA Integrated Water Availability Assessments (IWAAs) program (www.usgs.gov/mission-areas/water-resources/science/integrated-water-availability-assessments-iwaas) and the Next Generation Water Observing System (NGWOS) program (www.usgs.gov/mission-areas/water-resources/science/next-generation-water-observing-system-ngwos). Based on a survey of NGWOS basin coordinators and others familiar with these three hydrologic units, 12 EoC were identified as contaminants of concern and targeted for potential detailed modeling by the Proxies project. These 12 EoC include: seven transition metals (Cd, Cu, Cr, Fe, Hg, Mn, Zn), two post-transition metals (Al & Pb), one metelloid (As), one reactive non-metal: (Se), and one actinoid (U). This data release contains surface water concentration data for the above 12 EoC and co-collected water quality parameter data (pH, temperature, specific conductance, turbidity, total suspended solids, and dissolved oxygen) from the DRB, ILRB and UCOL, as obtained from the Water Quality Portal (WQP, www.waterqualitydata.us/). With over 1,500,000 observations, the primary EoC database includes both historic and contemporary measurements (collected between 1900 and 2022) and among the three NGWOS basins. The initial data retrieval results were screened to exclude any aqueous data that was not true surface water (groundwater, well water, sewer, and atmospheric measurements were excluded). A separate WQP data retrieval associated with additional sample location information was also conducted and merged with the initial EoC data retrieval results. The data was further harmonized (coded) for three water matrix fractions (filtered, particulate, and unfiltered), data source (USGS, EPA, and other), monitoring location type, analytical method, and the type of data censoring (if any). Elemental concentrations were also harmonized to common units, such that those associated with both the dissolved and whole water fractions are reported in micrograms per liter (µg/L) and those associated with the particulate fraction are reported on either a volumetric basis as µg/L or on a gravimetric basis as milligrams per kilogram (mg/kg), depending on the information retrieved. The various WQP data retrieval steps were automated using ‘R’ scripts (‘R’ version 4.1.0), as were all the subsequent data conversion, data merging, data harmonization, and file clean-up steps. This ‘R’ script is publicly available as a provisional software release (https://zenodo.org/record/6986087#.Y02Dh3bMJPb). This product comprises six data files in machine readable (comma separated value, *.csv) format. The primary EoC data results, along with the additional sampling site information, are provided as three individual files organized by NGWOS basin, which include: EoC_DRB.csv, EoC_ILRB.csv, and EoC_UCOL.csv. Similarly, the co-collected discrete ancillary water quality data are provided as three individual files organized by NGWOS basin, and include: ancillary_DRB.csv,