Show simple item record

dc.contributor.authorKvalheim, Olav Martin
dc.contributor.authorGrung, Bjørn
dc.contributor.authorRajalahti, Tarja
dc.date.accessioned2020-03-27T12:36:29Z
dc.date.available2020-03-27T12:36:29Z
dc.date.issued2019
dc.PublishedKvalheim OM, Grung B, Rajalahti T. Number of components and prediction error in partial least squares regression determined by Monte Carlo resampling strategies. Chemometrics and Intelligent Laboratory Systems. 2019;188:79-86eng
dc.identifier.issn1873-3239en_US
dc.identifier.issn0169-7439en_US
dc.identifier.urihttps://hdl.handle.net/1956/21613
dc.description.abstractUsing a metabolomics data set with 1057 serum samples, we designed and assessed different procedures based on Monte Carlo resampling schemes to determine the optimal number of components to be included in partial least squares (PLS) regression models. Corresponding estimates of prediction error were calculated and compared in a single algorithm comprising i) a single loop Monte Carlo approach repeatedly and randomly splitting samples into calibration and validation samples, ii) a double loop validation splitting samples into calibration/validation and prediction sets, and, iii) independent sample sets in a third loop. In order to mimic the common situation with only a moderate number of samples available for building the model, only a fraction of the 1057 samples analyzed was randomly selected from the total sample set and used in the algorithm. The results show that if the samples available for modelling are representative for the future samples to be predicted from the model, the single loop Monte Carlo procedure consistently provides the same estimates of prediction errors as double loop resampling procedures and for 75% of the cases these estimates are the same as for independent prediction sets. This has important implications for optimal use of a training set for component selection and estimation of prediction error. Two methods were developed and compared for selecting the optimal number of PLS components defined as the number where no statistically significant improvement in prediction error is observed when additional components are included in the model. Both methods determine a probability measure and provide similar results for model selection in this application.en_US
dc.language.isoengeng
dc.publisherElsevieren_US
dc.rightsAttribution CC BYeng
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/eng
dc.titleNumber of components and prediction error in partial least squares regression determined by Monte Carlo resampling strategiesen_US
dc.typePeer reviewed
dc.typeJournal article
dc.date.updated2020-01-22T14:01:10Z
dc.description.versionpublishedVersionen_US
dc.rights.holderCopyright 2019 The Author(s)en_US
dc.identifier.doihttps://doi.org/10.1016/j.chemolab.2019.03.006
dc.identifier.cristin1700093
dc.source.journalChemometrics and Intelligent Laboratory Systems


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution CC BY
Except where otherwise noted, this item's license is described as Attribution CC BY