Natural Language Processing of fiscal yearly reports for use in risk assessment
Master thesis
View/ Open
Date
2019-12-09Metadata
Show full item recordCollections
- Master theses [246]
Abstract
The stability and accuracy of products in the financial sector is maintained by various measures within each organisation in the field they operate in. After a meeting with DNB Livsforsikring, which offers insurance products, it was identified that the current processes of risk assessment applied in this context could benefit from the language processing technologies. Consequently, this could lead to profit optimization for the company and decreased costs of human labour, and potentially in reduction of error, depending on the accuracy of the implemented technology. This research is conducted in cooperation with DNB with an aim of developing an application, which utilises the functionalities of existing libraries for Natural Language Processing (NLP) to perform the task of text extraction and topic modelling of the fiscal reports, provided by DNB. Design science research has been used to create an artifact that use text extraction for analytics of fiscal yearly reports. Other textual visualisations are implemented, such as word clouds and Latent Dirichlet Allocation (LDA). The implementation utilizes a variety of technologies, including the NLTK library, as well as other common data science libraries, such as sci-kit learn. The main functionalities of the resulting artifact are text extraction and visualisation of topic modelling, TF-IDF, wordcloud generation and frequency distribution of which were fully functional as separate components. As part of the development process, a number of subject-specific methods have been used and implemented, such as agile development and minimum viable product. The evaluation of the prototype has shown perceived usefulness, relevance to the intended application, understandability, practicality and the ability to produce some relevant results.