Data Mining in Norwegian Level-of-Living Survey Data

Vestby, Magnus Lyngseth

Vestby, Magnus Lyngseth

Master thesis

Åpne

master thesis (1.346Mb)

Permanent lenke

https://hdl.handle.net/1956/23346

Utgivelsesdato

2020-07-11

Metadata

Vis full innførsel

Samlinger

Master theses [247]

Sammendrag

This thesis analyses how level-of-living survey data can be explored using data mining techniques and how well the resulting patterns can be visualized to inform non-experts. The project utilized the design science research framework for the project structure and methodology, and the the knowledge discovery in databases(KDD) methodology for developing the models and visualizations. To answer the research questions several machine learning methods were tested on a data set with selected variables describing education, disability, health, age, and marital status over a period of 50 years (1973-2017) . Scikit-learn was used to employ the machine learning models. Ridge regression was found to be optimal model for the goals of this thesis. The patterns found by the Ridge regression were visualized in graphs and bar charts. The visualizations were then evaluated using semi-structured interviews, tasks, and a visualizations usability scale. The results show that visualizations based on the patterns found during data mining, were informative and interesting to the participants in the evaluation. The visualizations scored highly on the visualizations usability scale, with an average score of 87.5. This meant that the group had little to no problems interpreting the graphs and figures. The participants were surprised by some of discovered patterns regarding inequalities related to gender and level of education. It shows that interesting patterns in the Norwegian level of living surveys can be found with the use data mining techniques. It also shows that these patterns can be visualized so that non-experts can retrieve information. This thesis represents a proof by construction. It shows that patterns in the Norwegian level of living surveys can be found with the use of data mining techniques. The model developed here can be reused for similar projects and data mining tasks, but future developers need to pay attention to all steps of the KDD-process including the data cleaning. A proper user interface should be designed to help different kind of user groups.