Vis enkel innførsel

dc.contributor.authorPapoutsoglou, Georgios
dc.contributor.authorTarazona, Sonia
dc.contributor.authorLopes, Marta B.
dc.contributor.authorKlammsteiner, Thomas
dc.contributor.authorIbrahimi, Eliana
dc.contributor.authorEckenberger, Julia
dc.contributor.authorNovielli, Pierfrancesco
dc.contributor.authorTonda, Alberto
dc.contributor.authorSimeon, Andrea
dc.contributor.authorShigdel, Rajesh
dc.contributor.authorBéreux, Stéphane
dc.contributor.authorVitali, Giacomo
dc.contributor.authorTangaro, Sabina
dc.contributor.authorLahti, Leo
dc.contributor.authorTemko, Andriy
dc.contributor.authorClaesson, Marcus J.
dc.contributor.authorBerland, Magali
dc.date.accessioned2024-05-13T11:52:52Z
dc.date.available2024-05-13T11:52:52Z
dc.date.created2023-11-09T08:11:21Z
dc.date.issued2023
dc.identifier.issn1664-302X
dc.identifier.urihttps://hdl.handle.net/11250/3130137
dc.description.abstractMicrobiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.en_US
dc.language.isoengen_US
dc.publisherFrontiersen_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleMachine learning approaches in microbiome research: challenges and best practicesen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionpublishedVersionen_US
dc.rights.holderCopyright 2023 The Author(s)en_US
dc.source.articlenumber1261889en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2
dc.identifier.doi10.3389/fmicb.2023.1261889
dc.identifier.cristin2194305
dc.source.journalFrontiers in Microbiologyen_US
dc.identifier.citationFrontiers in Microbiology. 2023, 14, 1261889.en_US
dc.source.volume14en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal