A Simulation Study of Goodness-of-Fit Tests for Binary Regression with Applications to Norwegian Intensive Care Registry Data

Nygaard, Ellisif

Nygaard, Ellisif

Master thesis

Åpne

master thesis (1.209Mb)

Permanent lenke

https://hdl.handle.net/1956/19047

Utgivelsesdato

2019-01-31

Metadata

Vis full innførsel

Samlinger

Department of Mathematics [939]

Sammendrag

When using statistical methods to fit a model, the consensus is that it is possible to represent a complex reality in the form of a simpler model. It is helpful to systematically measure a model’s ability to capture the underlying system which controls the data generation in the population being examined. One of the possible tools we can apply to evaluate model adequacy is goodness-offit (GOF) tests. Summary GOF statistics are computed for a specific fitted model, then attributed an asymptotic distribution, and finally the null hypothesis that the model fits the data adequately is tested. A great challenge, when the model is a binary regression model and it has one or several continuous covariates, is to verify which asymptotic distributions the GOF statistics in fact have (Hosmer et al., 1997). In this thesis, we will evaluate the validity of the distributions of some established GOF test statistics mentioned in the literature. We have chosen so-called global GOF tests, where user input is not necessary. Tests demanding user input, such as the Hosmer-Lemeshow test, have been shown to have some considerable disadvantages. Hosmer et al. (1997) states that number of groups (which are determined by user discretion) can influence whether the GOF test rejects the model fit or not. Binary regression models present a specific set of challenges with regards to GOF measures, especially in situations where at least one covariate is continuous. There appears to be no broad general agreement on which GOF statistics are reliable options when fitting such models. This thesis aims to extend the current knowledge in this area. A modified version of one of the statistics is introduced. The GOF tests studied are later applied in a data analysis on real data set from the Norwegian Intensive Care Registry (NIR). An exploration was performed in the attempt to suggest a suitable tool to evaluate the discrepancies between the estimated logistic probabilities and the outcome variable, and how different GOF tests will behave for different categories of discrepancies.

Utgiver

The University of Bergen