Using Natural Language Processing with Deep Learning to Explore Clinical Notes

Grinde, Anders Benjamin; Johansen, Bendik Mathias

Grinde, Anders Benjamin; Johansen, Bendik Mathias

Master thesis

Åpne

master thesis (2.216Mb)

Permanent lenke

https://hdl.handle.net/11250/2770432

Utgivelsesdato

2021-06-02

Metadata

Vis full innførsel

Samlinger

Department of Informatics [928]

Sammendrag

In recent years, the deep learning community and technology have grown substantially, both in terms of research and applications. However, some application areas have lagged behind. The medical domain is an example of a field with a lot of untapped potential, partly caused by complex issues related to privacy and ethics. Still, deep learning is a very powerful tool to utilize structured and unstructured data, and could help save lives. In this thesis, we use natural language processing to interpret clinical notes and predict the mortality rate of subjects. We explore if language models trained on a specific domain would become more performant, and we compared them to language models trained on an intermediate data set. We found that our language model trained on an intermediate data set that had some resemblance to our target data set performed slightly better than its counterpart language model. We found that text classifiers built on top of the language models were capable of correctly predicting if a subject would die or not. Furthermore, we extracted the free-text features from the text classifiers and combined them, using stacking, with heterogeneous data as an attempt to increase the efficacy of the classifiers and to explore the relative performance boost gained by including free-text features. We found a correlation between the quality of text classifiers that produced the text features and the stacking classifiers' performances. The classifier that was trained on a data set without text features performed the worst, and the classifier trained on a data set with the best text features performed the best. We also discuss the central concerns that come with applying deep learning in a medical domain with regards to privacy and ethics. It is our intention that this thesis serves as a contribution to the advancement of deep learning within the medical domain, and as a testament as to what can be achieved with today's technology.

Utgiver

The University of Bergen