Machine learning vs logistic regression in credit scoring: A trade-off between accuracy and interpretability?

Hovdenakk, Arne Hesjedal

Hovdenakk, Arne Hesjedal

Master thesis

Åpne

master thesis (974.1Kb)

Permanent lenke

https://hdl.handle.net/11250/2762661

Utgivelsesdato

2021-06-15

Metadata

Vis full innførsel

Samlinger

Master theses [118]

Sammendrag

In this thesis, I compare logistic regression to the machine learning models k-nearest neighbor, decision trees, random forest, and gradient booster by creating different credit models. By using data from an anonymous Norwegian bank for consumer loan borrowers, I compare the models when continuous variables are split into intervals by using weight of evidence, and when they are kept in their raw form. By using Area under Receiver Operating Characteristic (AUROC) and Brier score as performance measures, I find that logistic regression and gradient booster are the most accurate models for this dataset, and logistic regression is recommended because of its interpretability.

Utgiver

The University of Bergen