데이터셋 상세
미국
7Q10 records and basin characteristics for 224 basins in South Carolina, Georgia, and Alabama (2015)
This data release provides the data and R scripts used for the 2018 publication titled "Improving predictions of hydrological low-flow indices in ungaged basins using machine learning", Environmental Modeling and Software, https://doi.org/10.1016/j.envsoft.2017.12.021. There are two .csv files and 14 R-scripts included below. The lowflow_sc_ga_al_gagesII_2015.csv datafile contains the annual minimum seven-day mean streamflow with an annual exceedance probability of 90% (7Q10) for 224 basins in South Carolina, Georgia, and Alabama. The datafile also contains 231 basin characteristics from the Gages II dataset (https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011). The "all_preds.csv" file contains the leave-one-out cross validated predictions for all the models. The paper associated with the data release compares the ability of eight machine-learning models (elastic net, gradient boosting, kernel-k-nearest neighbors, two variants of support vector machines, M5-cubist, random forest, and a meta-learning ensemble M5-cubist model) and four baseline models (ordinary kriging, a unit-area discharge model, and two variants of censored regression) to generate estimates of the 7Q10 at 224 unregulated sites in South Carolina, Georgia, and Alabama.
데이터 정보
연관 데이터
7Q10 records and basin characteristics for 224 basins in South Carolina, Georgia, and Alabama (2015)
공공데이터포털
This data release provides the data and R scripts used for the 2018 publication titled "Improving predictions of hydrological low-flow indices in ungaged basins using machine learning", Environmental Modeling and Software, https://doi.org/10.1016/j.envsoft.2017.12.021. There are two .csv files and 14 R-scripts included below. The lowflow_sc_ga_al_gagesII_2015.csv datafile contains the annual minimum seven-day mean streamflow with an annual exceedance probability of 90% (7Q10) for 224 basins in South Carolina, Georgia, and Alabama. The datafile also contains 231 basin characteristics from the Gages II dataset (https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011). The "all_preds.csv" file contains the leave-one-out cross validated predictions for all the models. The paper associated with the data release compares the ability of eight machine-learning models (elastic net, gradient boosting, kernel-k-nearest neighbors, two variants of support vector machines, M5-cubist, random forest, and a meta-learning ensemble M5-cubist model) and four baseline models (ordinary kriging, a unit-area discharge model, and two variants of censored regression) to generate estimates of the 7Q10 at 224 unregulated sites in South Carolina, Georgia, and Alabama.
7Q10 Records and Basin Characteristics for 173 basins in Arkansas, Iowa, Kansas, Missouri, Nebraska, and Oklahoma (2017)
공공데이터포털
This data release replicates the methods detailed in the 2017 publication titled "Improving predictions of hydrological low-flow indices in ungaged basins using machine learning" for a different data set. The original data set and the associated readme file for the model archive can be viewed here: https://doi.org/10.5066/F7CR5S4T. The original data set contained streamflow data for sites located in South Carolina, Georgia, and Alabama. The data set used in this data release is for 6 states in the Southern Midwest U.S.A. The datafile contains the annual minimum seven-day mean streamflow with an annual exceedance probability of 90% (7Q10) for 173 basins in Arkansas (AR), Iowa (IA), Kansas (KS), Missouri (MO), Nebraska (NE), and Oklahoma (OK). The datafile also contains 231 basin characteristics from the Gages II dataset (https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011).
7Q10 Records and Basin Characteristics for 173 basins in Arkansas, Iowa, Kansas, Missouri, Nebraska, and Oklahoma (2017)
공공데이터포털
This data release replicates the methods detailed in the 2017 publication titled "Improving predictions of hydrological low-flow indices in ungaged basins using machine learning" for a different data set. The original data set and the associated readme file for the model archive can be viewed here: https://doi.org/10.5066/F7CR5S4T. The original data set contained streamflow data for sites located in South Carolina, Georgia, and Alabama. The data set used in this data release is for 6 states in the Southern Midwest U.S.A. The datafile contains the annual minimum seven-day mean streamflow with an annual exceedance probability of 90% (7Q10) for 173 basins in Arkansas (AR), Iowa (IA), Kansas (KS), Missouri (MO), Nebraska (NE), and Oklahoma (OK). The datafile also contains 231 basin characteristics from the Gages II dataset (https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011).
Summary of basin characteristics for National Hydrography Dataset, version 2 catchments in the southeastern United States, 1950 - 2010 at USGS streamflow-gaging stations
공공데이터포털
This dataset provides numerical and categorical descriptions of 48 basin characteristics for 956 basins with observed streamflow information at U.S. Geological Survey (USGS) streamflow-gaging stations. Characteristics are indexed by National Hydrography Dataset (NHD) version 2 COMID (integer that uniquely identifies each feature in the NHD) and USGS station number for streamflow-gaging station. The variables represent mutable and immutable basin characteristics and are organized by characteristic type: physical (5), hydrologic (6), categorical (12), climate (6), landscape alteration (7), and land cover (12). Mutable characteristics such as climate, land cover, and landscape alteration variables are reported in decadal increments (for example, average percent forest for the decade 1950-1959, 1960-1969, etc). The majority of basin characteristics in this dataset were calculated using divergence-routing methods and are often referred to as “network-accumulated”. This method uses a modified routing database to navigate the NHDPlus reach network to aggregate (accumulate) the values derived from the reach catchment scale (Schwarz, G.E., and Wieczorek, M.E., 2018, Database of modified routing for NHDPlus version 2.1 flowlines: ENHDPlusV2_us: U.S. Geological Survey data release, https://doi.org/10.5066/P9PA63SM ). In four instances, values are also provided for the entire catchment above a site and area designated using the “CAT_” prefix.
Machine-learning model predictions and groundwater-quality rasters of specific conductance, total dissolved solids, and chloride in aquifers of the Mississippi Embayment
공공데이터포털
Groundwater is a vital resource in the Mississippi embayment of the central United States. An innovative approach using machine learning (ML) was employed to predict groundwater salinity—including specific conductance (SC), total dissolved solids (TDS), and chloride (Cl) concentrations—across three drinking-water aquifers of the Mississippi embayment. A ML approach was used because it accommodates a large and diverse set of explanatory variables, does not assume monotonic relations between predictors and response data, and results can be extrapolated to areas of the aquifer not sampled. These aspects of ML allowed potential drivers and sources of high salinity water that have been hypothesized in other studies to be included as explanatory variables. The ML approach integrated output from a groundwater-flow model and water-quality data to predict salinity, and the approach can be applied to other aquifers to provide context for the long-term availability of groundwater resources. The Mississippi embayment includes two principal regional aquifer systems; the surficial aquifer system, dominated by the Quaternary Mississippi River Valley Alluvial aquifer (MRVA), and the Mississippi embayment aquifer system, which includes deeper Tertiary aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling focused on the MRVA, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ). Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were developed to predict SC and Cl to 1-kilometer (km) raster grid cells of the National Hydrologic Grid (Clark and others, 2018) for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework of Hart and others (2008). TDS maps were created using the correlation between SC and TDS. Explanatory variables for the BRT models included attributes associated with well location and construction, surficial variables (such as soils and land use), and variables extracted from a MODFLOW groundwater flow model for the Mississippi embayment (Haugh and others, 2020a; Haugh and others, 2020b). Prediction intervals were calculated for SC and Cl by bootstrapping raster-cell predictions following methods from Ransom and others (2017). For a full description of modeling workflow and final model selection see Knierim and others (2020).
Machine-learning model predictions and groundwater-quality rasters of specific conductance, total dissolved solids, and chloride in aquifers of the Mississippi Embayment
공공데이터포털
Groundwater is a vital resource in the Mississippi embayment of the central United States. An innovative approach using machine learning (ML) was employed to predict groundwater salinity—including specific conductance (SC), total dissolved solids (TDS), and chloride (Cl) concentrations—across three drinking-water aquifers of the Mississippi embayment. A ML approach was used because it accommodates a large and diverse set of explanatory variables, does not assume monotonic relations between predictors and response data, and results can be extrapolated to areas of the aquifer not sampled. These aspects of ML allowed potential drivers and sources of high salinity water that have been hypothesized in other studies to be included as explanatory variables. The ML approach integrated output from a groundwater-flow model and water-quality data to predict salinity, and the approach can be applied to other aquifers to provide context for the long-term availability of groundwater resources. The Mississippi embayment includes two principal regional aquifer systems; the surficial aquifer system, dominated by the Quaternary Mississippi River Valley Alluvial aquifer (MRVA), and the Mississippi embayment aquifer system, which includes deeper Tertiary aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling focused on the MRVA, middle Claiborne aquifer (MCAQ), and lower Claiborne aquifer (LCAQ). Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were developed to predict SC and Cl to 1-kilometer (km) raster grid cells of the National Hydrologic Grid (Clark and others, 2018) for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework of Hart and others (2008). TDS maps were created using the correlation between SC and TDS. Explanatory variables for the BRT models included attributes associated with well location and construction, surficial variables (such as soils and land use), and variables extracted from a MODFLOW groundwater flow model for the Mississippi embayment (Haugh and others, 2020a; Haugh and others, 2020b). Prediction intervals were calculated for SC and Cl by bootstrapping raster-cell predictions following methods from Ransom and others (2017). For a full description of modeling workflow and final model selection see Knierim and others (2020).
Summary of basin characteristics for National Hydrography Dataset, version 2 catchments in the southeastern United States, 1950 - 2010 at 12-digit hydrologic unit code (HUC12) pour points
공공데이터포털
This dataset provides numerical and categorical descriptions of 48 basin characteristics for 9,314 ungaged basins coinciding with 12-digit hydrologic unit code (HUC12) pour points that drain to the Gulf of Mexico. Characteristics are indexed by National Hydrography Dataset (NHD) version 2 COMID (integer that uniquely identifies each feature in the NHD) and HUC12 identifying number. The variables represent mutable and immutable basin characteristics and are organized by characteristic type: physical (5), hydrologic (6), categorical (12), climate (6), landscape alteration (7), and land cover (12). Mutable characteristics such as climate, land cover, and landscape alteration variables are reported in decadal increments (for example, average percent forest for the decade 1950-1959, 1960-1969, etc). The majority of basin characteristics in this dataset were calculated using divergence-routing methods and are often referred to as “network-accumulated”. This method uses a modified routing database to navigate the NHDPlus reach network to aggregate (accumulate) the values derived from the reach catchment scale (Schwarz, G.E., and Wieczorek, M.E., 2018, Database of modified routing for NHDPlus version 2.1 flowlines: ENHDPlusV2_us: U.S. Geological Survey data release, https://doi.org/10.5066/P9PA63SM ). In four instances, values are also provided for the entire catchment above a site and area designated using the “CAT_” prefix.
Summary of basin characteristics for National Hydrography Dataset, version 2 catchments in the southeastern United States, 1950 - 2010 at 12-digit hydrologic unit code (HUC12) pour points
공공데이터포털
This dataset provides numerical and categorical descriptions of 48 basin characteristics for 9,314 ungaged basins coinciding with 12-digit hydrologic unit code (HUC12) pour points that drain to the Gulf of Mexico. Characteristics are indexed by National Hydrography Dataset (NHD) version 2 COMID (integer that uniquely identifies each feature in the NHD) and HUC12 identifying number. The variables represent mutable and immutable basin characteristics and are organized by characteristic type: physical (5), hydrologic (6), categorical (12), climate (6), landscape alteration (7), and land cover (12). Mutable characteristics such as climate, land cover, and landscape alteration variables are reported in decadal increments (for example, average percent forest for the decade 1950-1959, 1960-1969, etc). The majority of basin characteristics in this dataset were calculated using divergence-routing methods and are often referred to as “network-accumulated”. This method uses a modified routing database to navigate the NHDPlus reach network to aggregate (accumulate) the values derived from the reach catchment scale (Schwarz, G.E., and Wieczorek, M.E., 2018, Database of modified routing for NHDPlus version 2.1 flowlines: ENHDPlusV2_us: U.S. Geological Survey data release, https://doi.org/10.5066/P9PA63SM ). In four instances, values are also provided for the entire catchment above a site and area designated using the “CAT_” prefix.
Prediction grids of pH for the Mississippi River Valley Alluvial and Claiborne Aquifers
공공데이터포털
Groundwater is a vital resource to the Mississippi embayment region of the central United States. Regional and integrated assessments of water availability that link physical flow models and water quality in principal aquifer systems provide context for the long-term availability of these water resources. An innovative approach using machine learning was employed to predict groundwater pH across drinking water aquifers of the Mississippi embayment. The region includes two principal regional aquifer systems; the Mississippi River Valley alluvial (MRVA) aquifer and the Mississippi embayment aquifer system that includes several regional aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling effort was focused on the MRVA, Middle Claiborne aquifer (MCAQ), and Lower Claiborne aquifer (LCAQ)of the Mississippi embayment aquifer system. Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were used to predict pH to 1-km raster grid cells of the National Hydrologic Grid (Clark and others, 2018). Predictions were made for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework used in a regional groundwater flow model (Hart and others, 2008). Explanatory variables for the BRT models included attributes associated with well position and construction, surficial variables, and variables extracted from a MODFLOW groundwater flow model for the MISE (Haugh and others, 2020a,b). For a full description of modeling workflow see Knierim and others (2020).
Prediction grids of pH for the Mississippi River Valley Alluvial and Claiborne Aquifers
공공데이터포털
Groundwater is a vital resource to the Mississippi embayment region of the central United States. Regional and integrated assessments of water availability that link physical flow models and water quality in principal aquifer systems provide context for the long-term availability of these water resources. An innovative approach using machine learning was employed to predict groundwater pH across drinking water aquifers of the Mississippi embayment. The region includes two principal regional aquifer systems; the Mississippi River Valley alluvial (MRVA) aquifer and the Mississippi embayment aquifer system that includes several regional aquifers and confining units. Based on the distribution of groundwater use for drinking water, the modeling effort was focused on the MRVA, Middle Claiborne aquifer (MCAQ), and Lower Claiborne aquifer (LCAQ)of the Mississippi embayment aquifer system. Boosted regression tree (BRT) models (Elith and others, 2008; Kuhn and Johnson, 2013) were used to predict pH to 1-km raster grid cells of the National Hydrologic Grid (Clark and others, 2018). Predictions were made for 7 aquifer layers (1 MRVA, 4 MCAQ, 2 LCAQ) following the hydrogeologic framework used in a regional groundwater flow model (Hart and others, 2008). Explanatory variables for the BRT models included attributes associated with well position and construction, surficial variables, and variables extracted from a MODFLOW groundwater flow model for the MISE (Haugh and others, 2020a,b). For a full description of modeling workflow see Knierim and others (2020).