Vis enkel innførsel

dc.contributor.authorWood, Leslie Romelia Eucedaeng
dc.date.accessioned2013-09-09T12:00:58Z
dc.date.available2013-09-09T12:00:58Z
dc.date.issued2013-05-21eng
dc.date.submitted2013-05-21eng
dc.identifier.urihttps://hdl.handle.net/1956/7080
dc.description.abstractVariable selection is an important step in multivariate calibration in which the number of variables in the independent variable matrix is reduced by eliminating those that are not related to the response. Many methods based on different criteria have been developed for this purpose. Some of them include competitive adaptive reweighted sampling (CARS), subwindow permutation analysis (SPA) and random forest (RF) which can be implemented prior to the construction of both regression and classification models. When applied to metabolomics datasets, variable selection can aid in the discovery of potential biomarkers for a particular disorder. In this study, the mechanism of the three abovementioned methods described in the literature has been investigated and compared. Their performance when applied to three different metabolomics datasets for multivariate classification was also studied. Although the most favorable method varied for each dataset, model prediction performance was found to improve when variable selection was carried by means of any of the methods. However, because the parameter settings for the methods were set by default for this comparison, an optimization of these is recommended to obtain a more appropriate comparison. In an attempt to optimize the variable selection stage for the creation of classification models for the three metabolomics datasets of interest, the original CARS algorithm was modified to simultaneously optimize three different parameters. Although promising results were obtained with this modification, a discrepancy was detected in terms of the validation process embedded in the algorithm. A new variable selection method based on the separate optimization of identity and number of informative variables was developed. However, its implementation did not prove to increase model prediction performance when compared to the results obtained when using the original or modified CARS, or when using all variables in the original dataset. Some of the aspects identified as possible pathways to improve the method's performance were tested, only to be discarded. Further study regarding other untested pathways is needed for the improvement of this method.en_US
dc.format.extent2310346 byteseng
dc.format.mimetypeapplication/pdfeng
dc.language.isoengeng
dc.publisherThe University of Bergen and Central South University, PR Chinaen_US
dc.titleVariable selection optimization for multivariate classification of metabolomics dataen_US
dc.typeMaster thesis
dc.rights.holderCopyright the author. All rights reserveden_US
dc.description.localcodeJMAMN-QAL
dc.description.localcodeQAL399
dc.subject.nus752299eng
fs.subjectcodeQAL399


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel