Using Natural Language Processing with Deep Learning to Explore Clinical Notes
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2770432Utgivelsesdato
2021-06-02Metadata
Vis full innførselSamlinger
- Department of Informatics [1013]
Sammendrag
In recent years, the deep learning community and technology have grown substantially, both in terms of research and applications. However, some application areas have lagged behind. The medical domain is an example of a field with a lot of untapped potential, partly caused by complex issues related to privacy and ethics. Still, deep learning is a very powerful tool to utilize structured and unstructured data, and could help save lives. In this thesis, we use natural language processing to interpret clinical notes and predict the mortality rate of subjects. We explore if language models trained on a specific domain would become more performant, and we compared them to language models trained on an intermediate data set. We found that our language model trained on an intermediate data set that had some resemblance to our target data set performed slightly better than its counterpart language model. We found that text classifiers built on top of the language models were capable of correctly predicting if a subject would die or not. Furthermore, we extracted the free-text features from the text classifiers and combined them, using stacking, with heterogeneous data as an attempt to increase the efficacy of the classifiers and to explore the relative performance boost gained by including free-text features. We found a correlation between the quality of text classifiers that produced the text features and the stacking classifiers' performances. The classifier that was trained on a data set without text features performed the worst, and the classifier trained on a data set with the best text features performed the best. We also discuss the central concerns that come with applying deep learning in a medical domain with regards to privacy and ethics. It is our intention that this thesis serves as a contribution to the advancement of deep learning within the medical domain, and as a testament as to what can be achieved with today's technology.