Modern Variable Selection Methods with Empirical Analysis
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3074215Utgivelsesdato
2023-06-01Metadata
Vis full innførselSamlinger
- Master theses [130]
Sammendrag
In the realm of modeling with big data including high-dimensional datasets, the challenge lies in extracting the most relevant and informative information while avoiding overfitting of general models, especially when it comes to prediction based on the given dataset. This thesis focuses on utilizing sparse methods especially sparse Bayesian learning methods to construct models that mitigate the risk of overfitting by utilizing only the most crucial aspects of the data in the framework of supervised learning. By employing these well-developed techniques, the most informative observations or variables can be extracted to reveal the systematic pattern of the dataset as well as further prediction. Six methods are examined, including well-known techniques such as LASSO, Ridge Regression, Bayesian Lasso, and the relevance vector machine (RVM), as well as two recently developed methods: $\text{RVM}_{BLS}$ and $\text{RVM}_{BLSX}$. The latter, $\text{RVM}_{BLSX}$ is proposed by the author of this thesis.