데이터셋 상세
미국
Example Groundwater-Level Datasets and Benchmarking Results for the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) Software Package
This data release provides two example groundwater-level datasets used to benchmark the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) software package (Levy and others, 2024). The first dataset contains groundwater-level records and site metadata for wells located on Long Island, New York (NY) and some surrounding mainland sites in New York and Connecticut. The second dataset contains groundwater-level records and site metadata for wells located in the southeastern San Joaquin Valley of the Central Valley, California (CA). For ease of exposition these are referred to as NY and CA datasets, respectively. Both datasets are formatted with column headers that can be read by the ARCHI software package within the R computing environment. These datasets were used to benchmark the imputation accuracy of three ARCHI model settings (OLS, ridge, and MOVE.1) against the widely used imputation program missForest (Stekhoven and Bühlmann, 2012). The ARCHI program was used to process the NY and CA datasets on monthly and annual timesteps, respectively, filter out sites with insufficient data for imputation, and create 200 test datasets from each of the example datasets with 5 percent of observations removed at random (herein, referred to as "holdouts"). Imputation accuracy for test datasets was assessed using normalized root mean square error (NRMSE), which is the root mean square error divided by the standard deviation of the observed holdout values. ARCHI produces prediction intervals (PIs) using a non-parametric bootstrapping routine, which were assessed by computing a coverage rate (CR) defined as the proportion of holdout observations falling within the estimated PI. The multiple regression models included with the ARCHI package (OLS and ridge) were further tested on all test datasets at eleven different levels of the p_per_n input parameter, which limits the maximum ratio of regression model predictors (p) per observations (n) as a decimal fraction greater than zero and less than or equal to one. This data release contains ten tables formatted as tab-delimited text files. The “CA_data.txt” and “NY_data.txt” tables contain 243,094 and 89,997 depth-to-groundwater measurement values (value, in feet below land surface) indexed by site identifier (site_no) and measurement date (date) for CA and NY datasets, respectively. The “CA_sites.txt” and “NY_sites.txt” tables contain site metadata for the 4,380 and 476 unique sites included in the CA and NY datasets, respectively. The “CA_NRMSE.txt” and “NY_NRMSE.txt” tables contain NRMSE values computed by imputing 200 test datasets with 5 percent random holdouts to assess imputation accuracy for three different ARCHI model settings and missForest using CA and NY datasets, respectively. The “CA_CR.txt” and “NY_CR.txt” tables contain CR values used to evaluate non-parametric PIs generated by bootstrapping regressions with three different ARCHI model settings using the CA and NY test datasets, respectively. The “CA_p_per_n.txt” and “NY_p_per_n.txt” tables contain mean NRMSE values computed for 200 test datasets with 5 percent random holdouts at 11 different levels of p_per_n for OLS and ridge models compared to training error for the same models on the entire CA and NY datasets, respectively. References Cited Levy, Z.F., Stagnitta, T.J., and Glas, R.L., 2024, ARCHI: Automated Regional Correlation Analysis for Hydrologic Record Imputation, v1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/P1VVHWKE. Stekhoven, D.J., and Bühlmann, P., 2012, MissForest—non-parametric missing value imputation for mixed-type data: Bioinformatics 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597.
데이터 정보
연관 데이터
Example Groundwater-Level Datasets and Benchmarking Results for the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) Software Package
공공데이터포털
This data release provides two example groundwater-level datasets used to benchmark the Automated Regional Correlation Analysis for Hydrologic Record Imputation (ARCHI) software package (Levy and others, 2024). The first dataset contains groundwater-level records and site metadata for wells located on Long Island, New York (NY) and some surrounding mainland sites in New York and Connecticut. The second dataset contains groundwater-level records and site metadata for wells located in the southeastern San Joaquin Valley of the Central Valley, California (CA). For ease of exposition these are referred to as NY and CA datasets, respectively. Both datasets are formatted with column headers that can be read by the ARCHI software package within the R computing environment. These datasets were used to benchmark the imputation accuracy of three ARCHI model settings (OLS, ridge, and MOVE.1) against the widely used imputation program missForest (Stekhoven and Bühlmann, 2012). The ARCHI program was used to process the NY and CA datasets on monthly and annual timesteps, respectively, filter out sites with insufficient data for imputation, and create 200 test datasets from each of the example datasets with 5 percent of observations removed at random (herein, referred to as "holdouts"). Imputation accuracy for test datasets was assessed using normalized root mean square error (NRMSE), which is the root mean square error divided by the standard deviation of the observed holdout values. ARCHI produces prediction intervals (PIs) using a non-parametric bootstrapping routine, which were assessed by computing a coverage rate (CR) defined as the proportion of holdout observations falling within the estimated PI. The multiple regression models included with the ARCHI package (OLS and ridge) were further tested on all test datasets at eleven different levels of the p_per_n input parameter, which limits the maximum ratio of regression model predictors (p) per observations (n) as a decimal fraction greater than zero and less than or equal to one. This data release contains ten tables formatted as tab-delimited text files. The “CA_data.txt” and “NY_data.txt” tables contain 243,094 and 89,997 depth-to-groundwater measurement values (value, in feet below land surface) indexed by site identifier (site_no) and measurement date (date) for CA and NY datasets, respectively. The “CA_sites.txt” and “NY_sites.txt” tables contain site metadata for the 4,380 and 476 unique sites included in the CA and NY datasets, respectively. The “CA_NRMSE.txt” and “NY_NRMSE.txt” tables contain NRMSE values computed by imputing 200 test datasets with 5 percent random holdouts to assess imputation accuracy for three different ARCHI model settings and missForest using CA and NY datasets, respectively. The “CA_CR.txt” and “NY_CR.txt” tables contain CR values used to evaluate non-parametric PIs generated by bootstrapping regressions with three different ARCHI model settings using the CA and NY test datasets, respectively. The “CA_p_per_n.txt” and “NY_p_per_n.txt” tables contain mean NRMSE values computed for 200 test datasets with 5 percent random holdouts at 11 different levels of p_per_n for OLS and ridge models compared to training error for the same models on the entire CA and NY datasets, respectively. References Cited Levy, Z.F., Stagnitta, T.J., and Glas, R.L., 2024, ARCHI: Automated Regional Correlation Analysis for Hydrologic Record Imputation, v1.0.0: U.S. Geological Survey software release, https://doi.org/10.5066/P1VVHWKE. Stekhoven, D.J., and Bühlmann, P., 2012, MissForest—non-parametric missing value imputation for mixed-type data: Bioinformatics 28(1), 112-118. https://doi.org/10.1093/bioinformatics/btr597.
Datasets for Integrating stream gage data and Landsat imagery to complete time-series of surface water extents in Central Valley, California
공공데이터포털
This data release comprises the data files and code necessary to perform all analyses presented in the associated publication. The *.csv data files are aggregations of water extent on the basis of the European Commission's Joint Research Centre (JRC) Monthly Water History database (v1.0) and the Dynamic Surface Water Extent (DSWE) algorithm. The shapefile dataset contains the study area 8-digit hydrologic unit code (HUC) regions used as the basis for analysis. Html files provide an overview of the study workflow and integrated R notebooks (in .Rmd format) for recreating all project results and plots. The R notebook ingest the necessary data files from their online locations. These data support the following publication: Walker JJ, Soulard CE, Petrakis RE. In press. Integrating stream gage data and Landsat imagery to complete time-series of surface water extents in Central Valley, California. International Journal of Applied Earth Observation and Geoinformation, http://dx.doi.org/xx.xxxxx/
Datasets for Integrating stream gage data and Landsat imagery to complete time-series of surface water extents in Central Valley, California
공공데이터포털
This data release comprises the data files and code necessary to perform all analyses presented in the associated publication. The *.csv data files are aggregations of water extent on the basis of the European Commission's Joint Research Centre (JRC) Monthly Water History database (v1.0) and the Dynamic Surface Water Extent (DSWE) algorithm. The shapefile dataset contains the study area 8-digit hydrologic unit code (HUC) regions used as the basis for analysis. Html files provide an overview of the study workflow and integrated R notebooks (in .Rmd format) for recreating all project results and plots. The R notebook ingest the necessary data files from their online locations. These data support the following publication: Walker JJ, Soulard CE, Petrakis RE. In press. Integrating stream gage data and Landsat imagery to complete time-series of surface water extents in Central Valley, California. International Journal of Applied Earth Observation and Geoinformation, http://dx.doi.org/xx.xxxxx/
Time-series water level and water quality data to accompany Scientific Investigations Report 2018-5040
공공데이터포털
This Data Release serves as a repository for a set of time-series data used in Scientific Investigations Report 2018-5040. The data represent continuous measurements of specific conductance, water temperature, and/or water level (stage), recorded by a variety of types of data loggers during three multi-day interference tests conducted on the Virgin River at Pah Tempe Springs during November 2013, February 2014, and November 2014. The data presented are the raw data downloaded from the data loggers and are organized according to the date of the test and the type and name of the observation site. The Data Release contains 3 items: 1. An explanatory table, "PahTempe_table1.xlsx", which indicates which parameters were collected and on what instrument at each site during a given test 2. The data, "PahTempe_data.zip"; this zipped file contains the raw data logger files in comma-separated values (CSV) format, organized into folders according to the date of the interference pumping test 3. The metadata document, "PahTempe_metadata.xml" Because these data were collected during multi-day interference pumping tests, they do not represent natural hydrologic conditions in the river, springs, or shallow groundwater system. Users of this data are advised to refer to the larger work citation for proper use and interpretation of the data.
i08 GroundwaterElevationSeasonal Points
공공데이터포털
This dataset depicts groundwater level (expressed as elevation in feet amsl) at selected monitoring locations (wells) , by season and year. Other information on the monitoring location is also included. Water level monitoring locations and measurements used are selected based on measurement date and well construction information, where available, and approximate groundwater levels in the unconfined to uppermost semi-confined aquifers. This dataset was created to assist state and local agencies in assessing the status of groundwater levels statewide for recent seasons/years and in relation to other periods. For more information on this service, please contact gis@water.ca.gov
Anza-Terwilliger study wells in Riverside County, California
공공데이터포털
This digital data set contains the locations, water-level altitude, and water-level differences of 70 wells selected to document water-level changes between fall 2004 and spring 2005 in the Anza-Terwilliger area of Riverside County, California. The winter of 2005 was one of the wettest periods on record. Links to the U.S. Geological Survey National Water Information Systems Website (NWISWeb) have been established to interactively view recent water-level information via the internet by clicking on a specific well.
Anza-Terwilliger study wells in Riverside County, California
공공데이터포털
This digital data set contains the locations, water-level altitude, and water-level differences of 70 wells selected to document water-level changes between fall 2004 and spring 2005 in the Anza-Terwilliger area of Riverside County, California. The winter of 2005 was one of the wettest periods on record. Links to the U.S. Geological Survey National Water Information Systems Website (NWISWeb) have been established to interactively view recent water-level information via the internet by clicking on a specific well.
Hydrogeologic Data from the Cahuilla Valley and Terwilliger Valley Groundwater Basins, Riverside County, California, 2022 (ver. 2.0, August 2025)
공공데이터포털
The U.S. Geological Survey (USGS) entered into a cooperative study with the California Department of Water Resources and the Ramona Band of Cahuilla to characterize the hydrogeology of the Cahuilla Valley and Terwilliger Valley groundwater basins and surrounding water-bearing units, with the ultimate goal of developing a calibrated integrated hydrologic model to manage the groundwater supplies on a sustainable basis. A three-dimensional geologic framework model (GFM) was developed to quantify the structural geometry and distribution of water-bearing units in the groundwater basins, using borehole lithology and hydraulic information, geologic maps, and gravity-derived depth-to-basement information. This dataset includes (1) tabular data of selected boreholes with their location and construction information, (2) borehole lithology information, (3) a geographic information systems (GIS) shapefile of a cellular array containing interpolated elevations and thicknesses of modeled geologic units from the GFM in the format of a polygon feature class, (4) and a table of summary textural classes for the alluvial fill unit from borehole logs and summary textural classes used in geologic framework model.
Hydrogeologic Data from the Cahuilla Valley and Terwilliger Valley Groundwater Basins, Riverside County, California, 2022 (ver. 2.0, August 2025)
공공데이터포털
The U.S. Geological Survey (USGS) entered into a cooperative study with the California Department of Water Resources and the Ramona Band of Cahuilla to characterize the hydrogeology of the Cahuilla Valley and Terwilliger Valley groundwater basins and surrounding water-bearing units, with the ultimate goal of developing a calibrated integrated hydrologic model to manage the groundwater supplies on a sustainable basis. A three-dimensional geologic framework model (GFM) was developed to quantify the structural geometry and distribution of water-bearing units in the groundwater basins, using borehole lithology and hydraulic information, geologic maps, and gravity-derived depth-to-basement information. This dataset includes (1) tabular data of selected boreholes with their location and construction information, (2) borehole lithology information, (3) a geographic information systems (GIS) shapefile of a cellular array containing interpolated elevations and thicknesses of modeled geologic units from the GFM in the format of a polygon feature class, (4) and a table of summary textural classes for the alluvial fill unit from borehole logs and summary textural classes used in geologic framework model.
C2VSimFG Groundwater Head Observations
공공데이터포털
,