Multiple Imputation in Predictive Modeling of Arthroplasty Database
MetadataShow full item record
- Master theses 
This thesis presents a method of imputing missing values in data, creating a simple data mining tool and using data mining to see whether such imputed data can be used to predict failures of hip prosthetics in smaller databases. The data set used for this thesis is based on explanted prosthetics from total hip arthroplasty revision surgeries. It is in the early phases and is rather a small data set with many missing values. Multiple imputation was used to estimate missing values in an attempt to build a more complete dataset to perform predictive modelling. A simple linear regression and multiple linear regression were used with a prediction function for linear models. While the initial results of the imputation looked promising, comparisons with the original data and the imputed data did not show much improvement. The data was also used in a prototype application for data mining that allows the users to input their data and select the variables for analysis, and present a plot and summary of the model. The application, which is the artefact of this research is fully functional, but simple. Creating larger and more general applications in R can get complex, and other technologies might be more suitable. However, it is a very powerful statistical tool for special tasks and modelling. Data mining was used to explore the potential to make predictions with the data. Using linear regression on both the original and imputed data showed that the results were similar overall, but with some significant differences. The methods of validation indicated that, while the model was not great, there was something to gain from it. Predictions were run by a multiple linear regression model on both sets of data, displaying some difference but not enough to draw conclusions about the effect and contribution of data imputation. Currently, the methods in question will have to be further refined, preferably in collaboration with experts. The application can be expanded, but a different approach should be considered based on the scope of any future research. The data mining, even when applied on limited data sets, shows potential and encourages applications of data mining methods from an early stage of research when data collection begins.