Modeled daily salinity derived from multiple machine learning methodologies for 91 salinity monitoring sites in the northern Gulf of Mexico, 1980–2021
공공데이터포털
This data release consists of statistical predictions of daily salinity time series generated from the makESTUSAL software repository described by Asquith and others (2023b). The statistical methods included multiple methods of machine learning, which produced the daily salinity prediction and attendant credible uncertainties included in the data release. The geographic scope includes the predictions for 91 locations within bays and estuaries of the Gulf of Mexico, United States. The 91 locations are organized across 15 salinity groups and represented in the organizational structure of this data release. The input data files of imputed salinity (observations, response variable) and covariates (predictor variables) for the makESTUSAL software were created by use of a companion software (covESTUSAL) (Asquith and others, 2023a). These input data are provided by Banks and others (2024).
Modeled daily salinity derived from multiple machine learning methodologies and generalized additive models for three salinity monitoring sites in Mobile Bay, northern Gulf of Mexico, 1980–2021
공공데이터포털
Results from generalized additive models (GAM), random forest models (RFM), and cubist models (CUB) for three Dauphin Island Sealab (DIS) operated salinity sites in Mobile Bay are reported in this data release. These sites included Meaher Park (DIS:MHPA1), Middle Bay Lighthouse (DIS:MBLA1), and Dauphin Island (DIS:DPIA1). The constructed models predicted a 42-year daily salinity record from 1980 to 2021 at each site based on incomplete imputed salinity records and several explanatory variables. Explanatory variables included: daily streamflow from 8 United States Geological Survey (USGS) streamgages, daily minimum and maximum temperature, precipitation, vapor pressure, wind speed, wind direction, horizontal and vertical wind speed lagged from 0 to 7 days, altitude and azimuth of the sun and moon, and the positive and negative slopes of streamflow change over the previous seven days. Two GAM, RFM, and CUB salinity models were developed for each site using even- and odd-year-holdout. The final predicted salinity time series were derived from inverse error weighted pooling of the even- and odd-year model results for each model type. A similar methodology was used to pool the even- and odd-year models from the three model types to create a time series of daily salinity predictions from the ensemble of models. By applying model tests, prediction intervals estimations for the GAM, RFM, CUB were determined with model ensemble pooled predictions as shown in model input. Model input even- and odd-year models, helped determine pooling predictions and prediction intervals. RFM and CUB models displayed variable importance along with variable significance as seen in the GAM model. Predicted salinity levels exhibit variation from measured values, with certain maximum salinity predictions potentially exceeding the natural conditions expected in Mobile Bay.
Modeled daily salinity derived from multiple machine learning methodologies and generalized additive models for three salinity monitoring sites in Mobile Bay, northern Gulf of Mexico, 1980–2021
공공데이터포털
Results from generalized additive models (GAM), random forest models (RFM), and cubist models (CUB) for three Dauphin Island Sealab (DIS) operated salinity sites in Mobile Bay are reported in this data release. These sites included Meaher Park (DIS:MHPA1), Middle Bay Lighthouse (DIS:MBLA1), and Dauphin Island (DIS:DPIA1). The constructed models predicted a 42-year daily salinity record from 1980 to 2021 at each site based on incomplete imputed salinity records and several explanatory variables. Explanatory variables included: daily streamflow from 8 United States Geological Survey (USGS) streamgages, daily minimum and maximum temperature, precipitation, vapor pressure, wind speed, wind direction, horizontal and vertical wind speed lagged from 0 to 7 days, altitude and azimuth of the sun and moon, and the positive and negative slopes of streamflow change over the previous seven days. Two GAM, RFM, and CUB salinity models were developed for each site using even- and odd-year-holdout. The final predicted salinity time series were derived from inverse error weighted pooling of the even- and odd-year model results for each model type. A similar methodology was used to pool the even- and odd-year models from the three model types to create a time series of daily salinity predictions from the ensemble of models. By applying model tests, prediction intervals estimations for the GAM, RFM, CUB were determined with model ensemble pooled predictions as shown in model input. Model input even- and odd-year models, helped determine pooling predictions and prediction intervals. RFM and CUB models displayed variable importance along with variable significance as seen in the GAM model. Predicted salinity levels exhibit variation from measured values, with certain maximum salinity predictions potentially exceeding the natural conditions expected in Mobile Bay.
Geospatial representations of salinity monitoring site and bay and estuary group boundaries in the Gulf of Mexico
공공데이터포털
The polygon datasets were created to assist in visualizing the results of salinity modeling in Gulf of Mexico estuaries and bays. Statistical algorithms (Asquith and others, 2023) were developed to predict daily salinities for 91 salinity monitoring sites (Rodgers and Swarzenski, 2019) operated by 7 agencies in near coastal United States waters of the Gulf of Mexico. These monitoring sites are assigned to 15 salinity groups roughly corresponding to distinct bays and estuaries. The statistical algorithms facilitate the study of trends and drivers of salinity in near coastal waters. The groups polygon dataset consists of 15 polygons representing the outer boundary or hull of each of the 15 salinity groups. The site polygons dataset consists of 91 polygons—one polygon each per salinity monitoring site. The polygons were created using the Watershed Boundary Dataset, the National Hydrography Dataset, and aerial imagery. A detailed description of the polygon creation method is in the metadata processing steps. Creation of the polygons was motivated by a need to construct visual cues (maps and map animations) for testing the veracity of the statistical algorithms.