Vis enkel innførsel

dc.contributor.authorAndreassen, Lloyd
dc.date.accessioned2021-09-09T23:54:49Z
dc.date.available2021-09-09T23:54:49Z
dc.date.issued2021-06-01
dc.date.submitted2021-09-09T22:00:10Z
dc.identifier.urihttps://hdl.handle.net/11250/2775026
dc.description.abstractThis study attempts to find good predictive biomarkers for recurrence in colon cancer between two data sources of both mRNA and miRNA expression from frozen tumor samples. In total four datasets, two data sources and two data types, were examined; mRNA TCGA (n=446), miRNA TCGA (n=416), mRNA HDS (n=79), and miRNA HDS (n=128). The intersection of the feature space of both data sources was used in the analysis such that models trained on one data source could be tested on the other. A set of wrapper and filter methods were applied to each dataset separately to perform feature selection, and from each model the k best number of features was selected, where k is taken from a list of set numbers between 2 and 250. A randomized grid search was used to optimize four classifiers over their hyperparameter space where an additional hyperparameter was the feature selection method used. All models were trained with cross validation and tested on the other data source to determine generalization. Most models failed to generalize to the other data source, showing clear signs of overfitting. Furthermore, there was next to no overlap between selected features from one data source to the other, indicating that the underlying feature distribution was different between the two sources, which is shown to be the case in a few examples. The best generalizing models where based on clinical information and second best was on the combined feature space of mRNA and miRNA data.
dc.language.isoeng
dc.publisherThe University of Bergen
dc.rightsCopyright the Author. All rights reserved
dc.subjectwrapper methods
dc.subjectbiomarkers
dc.subjectfeature selection
dc.subjectcolon cancer
dc.subjectmachine learning
dc.subjectfilter methods
dc.titleFeature Selection for Identification of Transcriptome and Clinical Biomarkers for Relapse in Colon Cancer
dc.typeMaster thesis
dc.date.updated2021-09-09T22:00:10Z
dc.rights.holderCopyright the Author. All rights reserved
dc.description.degreeMaster's Thesis in Informatics
dc.description.localcodeINF399
dc.description.localcodeMAMN-PROG
dc.description.localcodeMAMN-INF
dc.subject.nus754199
fs.subjectcodeINF399
fs.unitcode12-12-0


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel