데이터셋 상세
미국
Supplementary material for Lee et al. 2019 Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets
Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008–2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined “slash groups” did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variation explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., Sellaphora atomoides) species complex, and elevated Nitzschia, Diploneis, and Tryblionella taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera Sellaphora, Mayamaea, and Psammodictyon, indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove analyst signal, this work provides a method to minimize analyst signal and improve detection of diatom association with TP in large datasets involving multiple analysts. Examination of variation in assemblage data explained by analyst and taxonomic harmonization may be necessary steps for improving data quality and the utility of diatoms as indicators of environmental variables. This dataset is associated with the following publication: Lee, S., I. Bishop, S. Spaulding, R. Mitchell, and L. Yuan. Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 102: 166-174, (2019).
연관 데이터
Supplementary material for Lee et al. 2019 Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets
공공데이터포털
Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008–2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined “slash groups” did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variation explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., Sellaphora atomoides) species complex, and elevated Nitzschia, Diploneis, and Tryblionella taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera Sellaphora, Mayamaea, and Psammodictyon, indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove analyst signal, this work provides a method to minimize analyst signal and improve detection of diatom association with TP in large datasets involving multiple analysts. Examination of variation in assemblage data explained by analyst and taxonomic harmonization may be necessary steps for improving data quality and the utility of diatoms as indicators of environmental variables. This dataset is associated with the following publication: Lee, S., I. Bishop, S. Spaulding, R. Mitchell, and L. Yuan. Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 102: 166-174, (2019).
Diatom and Environmental Data
공공데이터포털
Raw data associated with this research. This dataset is associated with the following publication: Yuan, L., R. Mitchell, E. Pilgrim, and N. Smucker. Inferences based on diatom compositions improve estimates of nutrient concentrations in streams. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 952: 176032, (2024).
Supplementary material for Lee et al. in review: Harmonization and Revision of a National Diatom Dataset for Use in the Development of Water Quality Indicators
공공데이터포털
ABSTRACT Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008-2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined “slash groups” did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variability explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., Sellaphora atomoides) species complex, and elevated Nitzschia, Diploneis, and Tryblionella taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera Sellaphora, Mayamaea, and Psammodictyon, indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove the analyst signal, this work clarifies the extent of the problem and provides a method to minimize analyst signal. Resolution of these taxonomic issues makes large datasets such as the NRSA more suitable for the development of diatom-based water quality indicators. This dataset is associated with the following publication: Lee, S., I. Bishop, S. Spaulding, R. Mitchell, and L. Yuan. Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 102: 166-174, (2019). NOTE: This dataset has been removed from public access due to revocation. Please refer inquiries regarding this dataset to the listed contact person.
Diatom and environmental data
공공데이터포털
These raw data associated with this research were collected as part of the United States Environmental Protection Agency's 2018-2019 National Rivers and Streams Assessment (NRSA). The worksheet "Water chemistry data" includes environmental variables examined in this study including total phosphorus (TP), total nitrogen (TN), conductivity, pH, and ecoregion. The worksheet "Diatom ASVs" includes relative abundances of gene sequence reads for each amplicon sequence variant (ASV), which are referred to as taxa. This dataset is associated with the following publication: Smucker, N., E. Pilgrim, C. Nietch, L. Gains-Germain, C. Carpenter, J. Darling, L. Yuan, R. Mitchell, and A. Pollard. Using DNA metabarcoding to characterize national scale diatom-environment relationships and to develop indicators in streams and rivers of the United States. SCIENCE OF THE TOTAL ENVIRONMENT. Elsevier BV, AMSTERDAM, NETHERLANDS, 939: 173502, (2024).
NARS Lake and Stream Predictor Dataset, V1
공공데이터포털
This dataset allows the user to explore the potential impacts of various environmental and anthropogenic drivers on observed growing season total phosphorus concentrations in lakes and streams across the United States.
Datasets to evaluate the effects of reducing the standard count size on multimetric diatom indices of ecological condition for U.S. rivers and streams
공공데이터포털
The National Rivers and Stream Assessment 2008-2009 and 2013-2014 diatom datasets and associated site information. This dataset is associated with the following publication: Riato, L., J. Stoddard, A. Herlihy, and K. Blocksom. Reduced count size can provide a robust and more efficient diatom assessment of environmental conditions. Journal of Applied Ecology. Blackwell Publishing, Malden, MA, USA, 61(9): 2308-2320, (2024).
National Aquatic Resource Survey data
공공데이터포털
Surface water monitoring data from national aquatic surveys (lakes, streams, rivers). This dataset is associated with the following publication: Stoddard , J., J. Van Sickle, A. Herlihy, J. Brahney, S. Paulsen , D. Peck , R. Mitchell , and A. Pollard. Continental-scale increase in stream and lake phosphorus: Are oligotrophic systems disappearing in the U.S.?. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 50(7): 3409-3415, (2016).
Harmonization of sediment diatoms from hundreds of lakes in the northeastern United States
공공데이터포털
Sediment diatoms are widely used to track environmental histories of lakes and their watersheds, but merging datasets generated by different researchers for further large-scale studies is challenging because of the taxonomic discrepancies caused by rapidly evolving diatom nomenclature and taxonomic concepts. Here we collated five datasets of lake sediment diatoms from the northeastern USA using a harmonization process which included updating synonyms, tracking the identity of inconsistently identified taxa and grouping those that could not be resolved taxonomically. The Dataset consists of a Portable Document Format (.pdf) file of the Voucher Flora, six Microsoft Excel (.xlsx) data files, an R script, and five output Comma Separated Values (.csv) files. The Voucher Flora documents the morphological species concepts in the dataset using diatom images compiled into plates (NE_Lakes_Voucher_Flora_102421.pdf) and the translation scheme of the OTU codes to diatom scientific or provisional names with identification sources, references, and notes (VoucherFloraTranslation_102421.xlsx). The file Slide_accession_numbers_102421.xlsx has slide accession numbers in the ANS Diatom Herbarium. The “DiatomHarmonization_032222_files for R.zip” archive contains four Excel input data files, the R code, and a subfolder “OUTPUT” with five .csv files. The file Counts_original_long_102421.xlsx contains original diatom count data in long format. The file Harmonization_102421.xlsx is the taxonomic harmonization scheme with notes and references. The file SiteInfo_031922.xlsx contains sampling site- and sample-level information. WaterQualityData_021822.xlsx is a supplementary file with water quality data. R code (DiatomHarmonization_032222.R) was used to apply the harmonization scheme to the original diatom counts to produce the output files. The resulting output files are five wide format files containing diatom count data at different harmonization steps (Counts_1327_wide.csv, Step1_1327_wide.csv, Step2_1327_wide.csv, Step3_1327_wide.csv) and the summary of the Indicator Species Analysis (INDVAL_RESULT.csv). The harmonization scheme (Harmonization_102421.xlsx) can be further modified based on additional taxonomic investigations, while the associated R code (DiatomHarmonization_032222.R) provides a straightforward mechanism to diatom data versioning. This dataset is associated with the following publication: Potapova, M., S. Lee, S. Spaulding, and N. Schulte. A harmonized dataset of sediment diatoms from hundreds of lakes in the northeastern United States. Scientific Data. Springer Nature, New York, NY, 9(540): 1-8, (2022).
Total phosphorus loads estimated from tributaries and direct drainages to the Great Lakes during 2012–2018 using the model load ratio approach and the unit area load approach
공공데이터포털
In this data release, we provide the data used to compute total annual phosphorus loads from tributaries and direct drainages to the Great Lakes during 2012-18 using the model load ratio approach and the unit area load approach and the resulting annual loads for 2012–18. This data release consists of: (1) measured loads at 24 sites, computed using the surrogate regression approach (Robertson et al., 2018; Koltun, 2020) that were monitored as part of the Great Lakes Restoration Initiative project, (2) estimated annual loads at point source facilities throughout the Great Lakes Basin obtained from the U.S. Environmental Protection Agency and state agencies, (3) loads subdivided into nonpoint and point source contributions, (4) extrapolation factors for each basin (nonpoint model load ratios, nonpoint yields, point source delivery factors), and (5) estimated annual loads for each tributary and direct drainage area to the Great Lakes during 2012–18. This data release consists of three tab delimited files: (1) ReferenceSite.txt, which contains all of the information (site data and phosphorus loads from SPARROW (Robertson and Saad, 2019) and measured during 2012–18 for the 24 reference (extrapolation sites) used in this study, (2) PointSource.txt, which contains all of the measured or extrapolated point source information for wastewater treatment plants in the Great Lake Basin for 2012–18; and (3) ExtrapolatedLoads.txt, which contains all of the nonpoint model load ratios, point source delivery ratios, measured wastewater treatment plant loads, and resulting annual loads for each tributary and direct drainage area in the Great Lakes Basin for the long-term average and 2012–18.
Total phosphorus loads estimated from tributaries and direct drainages to the Great Lakes during 2012–2018 using the model load ratio approach and the unit area load approach
공공데이터포털
In this data release, we provide the data used to compute total annual phosphorus loads from tributaries and direct drainages to the Great Lakes during 2012-18 using the model load ratio approach and the unit area load approach and the resulting annual loads for 2012–18. This data release consists of: (1) measured loads at 24 sites, computed using the surrogate regression approach (Robertson et al., 2018; Koltun, 2020) that were monitored as part of the Great Lakes Restoration Initiative project, (2) estimated annual loads at point source facilities throughout the Great Lakes Basin obtained from the U.S. Environmental Protection Agency and state agencies, (3) loads subdivided into nonpoint and point source contributions, (4) extrapolation factors for each basin (nonpoint model load ratios, nonpoint yields, point source delivery factors), and (5) estimated annual loads for each tributary and direct drainage area to the Great Lakes during 2012–18. This data release consists of three tab delimited files: (1) ReferenceSite.txt, which contains all of the information (site data and phosphorus loads from SPARROW (Robertson and Saad, 2019) and measured during 2012–18 for the 24 reference (extrapolation sites) used in this study, (2) PointSource.txt, which contains all of the measured or extrapolated point source information for wastewater treatment plants in the Great Lake Basin for 2012–18; and (3) ExtrapolatedLoads.txt, which contains all of the nonpoint model load ratios, point source delivery ratios, measured wastewater treatment plant loads, and resulting annual loads for each tributary and direct drainage area in the Great Lakes Basin for the long-term average and 2012–18.