Show simple item record

dc.contributor.authorSun, Peng
dc.contributor.authorHolden, Philip B.
dc.contributor.authorBirks, Harry John Betteley
dc.date.accessioned2025-01-17T12:45:39Z
dc.date.available2025-01-17T12:45:39Z
dc.date.created2024-11-11T14:49:06Z
dc.date.issued2024
dc.identifier.issn1814-9324
dc.identifier.urihttps://hdl.handle.net/11250/3173196
dc.description.abstractClassical palaeoenvironmental reconstruction models often incorporate biological ideas and commonly assume that the taxa comprising a fossil assemblage exhibit unimodal response functions of the environmental variable of interest. In contrast, machine-learning approaches do not rely upon any biological assumptions but instead need training with large data sets to extract some understanding of the relationships between biological assemblages and their environment. To explore the relative merits of these two approaches, we have developed a two-layered machine-learning reconstruction model MEMLM (Multi Ensemble Machine Learning Model). The first layer applies three different ensemble machine-learning models (random forests, extra random trees, and LightGBM), trained on the modern taxon assemblage and associated environmental data to make reconstructions based on the three different models, while the second layer uses multiple linear regression to integrate these three reconstructions into a consensus reconstruction. We considered three versions of the model: (1) a standard version of MEMLM, which uses only taxon abundance data; (2) MEMLMe, which uses only dimensionally reduced assemblage information, using a natural language-processing model (GloVe), to detect associations between taxa across the training data set; and (3) MEMLMc which incorporates both raw taxon abundance and dimensionally reduced summary (GloVe) data. We trained these MEMLM model variants with three high-quality diatom and pollen training sets and compared their reconstruction performance with three weighted-averaging (WA) approaches (WA-Cla for classical deshrinking, WA-Inv for inverse deshrinking, and WA-PLS for partial least squares). In general, the MEMLM approaches, even when trained on only dimensionally reduced assemblage data, performed substantially better than the WA approaches in the larger training sets, as judged by cross-validatory prediction error. When applied to fossil data, MEMLM variants sometimes generated qualitatively different palaeoenvironmental reconstructions from each other and from reconstructions based on WA approaches. We applied a statistical significance test to all the reconstructions. This successfully identified each incidence for which the reconstruction is not robust with respect to the model choice. We found that machine-learning approaches could outperform classical approaches but could sometimes fail badly in the reconstruction, despite showing high performance under cross-validation, likely indicating problems when extrapolation occurs. We found that the classical approaches are generally more robust, although they could also generate reconstructions which have modest statistical significance and therefore may be unreliable. Given these conclusions, we consider that cross-validation is not a sufficient measure of transfer function performance, and we recommend that the results of statistical significance tests are provided alongside the downcore reconstructions based on fossil assemblages.en_US
dc.language.isoengen_US
dc.publisherCopernicus Publicationsen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleCan machine-learning algorithms improve upon classical palaeoenvironmental reconstruction models?en_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionpublishedVersionen_US
dc.rights.holderCopyright 2024 The Author(s)en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.doi10.5194/cp-20-2373-2024
dc.identifier.cristin2319598
dc.source.journalClimate of the Pasten_US
dc.source.pagenumber2373-2398en_US
dc.identifier.citationClimate of the Past. 2024, 20 (10), 2373-2398.en_US
dc.source.volume20en_US
dc.source.issue10en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Navngivelse 4.0 Internasjonal
Except where otherwise noted, this item's license is described as Navngivelse 4.0 Internasjonal