Harmonization of sediment diatoms from hundreds of lakes in the northeastern United States
공공데이터포털
Sediment diatoms are widely used to track environmental histories of lakes and their watersheds, but merging datasets generated by different researchers for further large-scale studies is challenging because of the taxonomic discrepancies caused by rapidly evolving diatom nomenclature and taxonomic concepts. Here we collated five datasets of lake sediment diatoms from the northeastern USA using a harmonization process which included updating synonyms, tracking the identity of inconsistently identified taxa and grouping those that could not be resolved taxonomically. The Dataset consists of a Portable Document Format (.pdf) file of the Voucher Flora, six Microsoft Excel (.xlsx) data files, an R script, and five output Comma Separated Values (.csv) files. The Voucher Flora documents the morphological species concepts in the dataset using diatom images compiled into plates (NE_Lakes_Voucher_Flora_102421.pdf) and the translation scheme of the OTU codes to diatom scientific or provisional names with identification sources, references, and notes (VoucherFloraTranslation_102421.xlsx). The file Slide_accession_numbers_102421.xlsx has slide accession numbers in the ANS Diatom Herbarium. The “DiatomHarmonization_032222_files for R.zip” archive contains four Excel input data files, the R code, and a subfolder “OUTPUT” with five .csv files. The file Counts_original_long_102421.xlsx contains original diatom count data in long format. The file Harmonization_102421.xlsx is the taxonomic harmonization scheme with notes and references. The file SiteInfo_031922.xlsx contains sampling site- and sample-level information. WaterQualityData_021822.xlsx is a supplementary file with water quality data. R code (DiatomHarmonization_032222.R) was used to apply the harmonization scheme to the original diatom counts to produce the output files. The resulting output files are five wide format files containing diatom count data at different harmonization steps (Counts_1327_wide.csv, Step1_1327_wide.csv, Step2_1327_wide.csv, Step3_1327_wide.csv) and the summary of the Indicator Species Analysis (INDVAL_RESULT.csv). The harmonization scheme (Harmonization_102421.xlsx) can be further modified based on additional taxonomic investigations, while the associated R code (DiatomHarmonization_032222.R) provides a straightforward mechanism to diatom data versioning. This dataset is associated with the following publication: Potapova, M., S. Lee, S. Spaulding, and N. Schulte. A harmonized dataset of sediment diatoms from hundreds of lakes in the northeastern United States. Scientific Data. Springer Nature, New York, NY, 9(540): 1-8, (2022).
Supplementary material for Lee et al. in review: Harmonization and Revision of a National Diatom Dataset for Use in the Development of Water Quality Indicators
공공데이터포털
ABSTRACT Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008-2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined “slash groups” did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variability explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., Sellaphora atomoides) species complex, and elevated Nitzschia, Diploneis, and Tryblionella taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera Sellaphora, Mayamaea, and Psammodictyon, indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove the analyst signal, this work clarifies the extent of the problem and provides a method to minimize analyst signal. Resolution of these taxonomic issues makes large datasets such as the NRSA more suitable for the development of diatom-based water quality indicators. This dataset is associated with the following publication: Lee, S., I. Bishop, S. Spaulding, R. Mitchell, and L. Yuan. Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 102: 166-174, (2019). NOTE: This dataset has been removed from public access due to revocation. Please refer inquiries regarding this dataset to the listed contact person.
Datasets to develop and validate the genus-level, trait-based multimetric diatom indices for assessing the ecological condition of river and stream across the conterminous United States
공공데이터포털
Data is from National Aquatic Resource Surveys. This dataset is associated with the following publication: Riato, L., R. Hill, A. Herlihy, D. Peck, P. Kaufmann, J. Stoddard, and S. Paulsen. Genus-level, trait-based multimetric diatom indices for assessing the ecological condition of river and stream across the conterminous United States.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 141: 109131, (2022).
Supplementary material for Lee et al. 2019 Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets
공공데이터포털
Diatom data have been collected in large-scale biological assessments in the United States, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). However, the effectiveness of diatoms as indicators may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To reduce these inconsistencies, we harmonized the 2008–2009 NRSA data from nine analysts by updating names to current synonyms and by statistically identifying taxa with high analyst signal (taxa with more variation in relative abundance explained by the analyst factor, relative to environmental variables). We then screened a subset of samples with QA/QC data and combined taxa with mismatching identifications by the primary and secondary analysts. When these combined “slash groups” did not reduce analyst signal, we elevated taxa to the genus level or omitted taxa in difficult species complexes. We examined the variation explained by analyst in the original and revised datasets. Further, we examined how revising the datasets to reduce analyst signal can reduce inconsistency, thereby uncovering the variation in assemblage composition explained by total phosphorus (TP), an environmental variable of high priority for water managers. To produce a revised dataset with the greatest taxonomic consistency, we ultimately made 124 slash groups, omitted 7 taxa in the small naviculoid (e.g., Sellaphora atomoides) species complex, and elevated Nitzschia, Diploneis, and Tryblionella taxa to the genus level. Relative to the original dataset, the revised dataset had more overlap among samples grouped by analyst in ordination space, less variation explained by the analyst factor, and more than double the variation in assemblage composition explained by TP. Elevating all taxa to the genus level did not eliminate analyst signal completely, and analyst remained the most important predictor for the genera Sellaphora, Mayamaea, and Psammodictyon, indicating that these taxa present the greatest obstacle to consistent identification in this dataset. Although our process did not completely remove analyst signal, this work provides a method to minimize analyst signal and improve detection of diatom association with TP in large datasets involving multiple analysts. Examination of variation in assemblage data explained by analyst and taxonomic harmonization may be necessary steps for improving data quality and the utility of diatoms as indicators of environmental variables. This dataset is associated with the following publication: Lee, S., I. Bishop, S. Spaulding, R. Mitchell, and L. Yuan. Taxonomic harmonization may reveal a stronger association between diatom assemblages and total phosphorus in large datasets.. ECOLOGICAL INDICATORS. Elsevier Science Ltd, New York, NY, USA, 102: 166-174, (2019).
Data Release for: A Web-Based Tool for Assessing the Condition of Benthic Diatom Assemblages in Streams and Rivers of the Conterminous United States
공공데이터포털
Benthic diatom assemblages are known to be indicative of water quality but have yet to be widely adopted in biological assessments in the United States due to several limitations. Our goal was to address some of these limitations by developing regional multi-metric indices (MMIs) that are robust to inter-laboratory taxonomic inconsistency, adjusted for natural covariates, and sensitive to a wide range of anthropogenic stressors. We aggregated bioassessment data from two national-scale federal programs and used a data-driven analysis in which all-possible combinations of 2-7 metrics were compared for three measures of performance. The datasets in this release support the Carlisle, et al. 2022 report cited herein. The article provides full details of data aggregation, model development, and application.
Southeast Regional Stream Quality Assessment Ecological Data
공공데이터포털
Aquatic ecological surveys are valuable to understanding the interaction between the biotic and abiotic components in rivers and streams. However, large-scale assessments of the water chemistry, geomorphology, and ecological community are usually not feasible due to limited resources. Beginning in 2013, the Regional Stream Quality Assessment Project of the US Geological Survey’s National Water Quality Program, began sampling 89-120 streams in each of 5 regions across the conterminous United States—the Midwest (2013), Southeast (2014), Pacific Northwest (2015), Northeast (2016), and California (2017). Sampling included water and streambed sediment chemistry, stage and temperature (Journey and others, 2015). The abiotic data is available from the National Water Information System (nwis.waterdata.usgs.gov). Geospatial data for the Southeastern U.S. study sites are available from Qi and others (2017). Ecological data collected included benthic algae, macroinvertebrates, and fish communities, in addition to in-stream habitat and geomorphology measurements for each reach. Ecological and habitat data for the Southeastern United States are summarized in this data release.
Foy Lake paleodiatom data
공공데이터포털
Percent abundance of 109 diatom species collected from a Foy Lake (Montana, USA) sediment core that was sampled every ∼5–20 years, yielding a ∼7 kyr record over 800 time-steps. This dataset is associated with the following publication: Angeler, D., T. Eason, A. Garmestani, T. Spanbauer, and C. Allen. Assessing cross-scale patterns and the composition of ecological communities of alternative lake regimes. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 01, (2018).
Foy Lake paleodiatom data
공공데이터포털
Percent abundance of 109 diatom species collected from a Foy Lake (Montana, USA) sediment core that was sampled every ∼5–20 years, yielding a ∼7 kyr record over 800 time-steps. This dataset is associated with the following publication: Angeler, D., T. Eason, A. Garmestani, T. Spanbauer, and C. Allen. Assessing cross-scale patterns and the composition of ecological communities of alternative lake regimes. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 01, (2018).