Binary domain classification for Norwegian language in task-oriented dialogue systems
Master thesis
View/ Open
Date
2022-09-01Metadata
Show full item recordCollections
- Master theses [218]
Abstract
Dialogue systems have gained more attention in recent years and have been called “the new app”. This is much due to the advancement in deep learning, more precisely in Natural Language Processing (NLP). An additional factor to the growing popularity of dialogue systems has also been the enabling of integration of task-oriented dialogue systems with social media platforms. The original purpose of this thesis was to take the first steps in developing such a task-oriented dialogue system. One crucial component in a task-oriented dialogue system is the Natural Language Understanding (NLU) component. The NLU aims at capturing a semantic representation of a user’s utterance. It achieves this by classifying the domain and intent of the utterance, in addition to extracting potential slots in the utterance. Our focus for the thesis revolved around the domain and intent classification of the NLU component. We were given a collection of utterances conveyed to a driving school via their social media account. Due to the condition of the dataset we received, we simplified the domain and intent classification problem to a binary domain classification. The binary classification task was to determine if an utterance should be handled by a human or the dialogue system. We trained a selection of binary classification models, combining different sentence representations with different machine learning models. We explored the sentence representations Bag-of-Word (BoW), Word2vec, Doc2vec and embeddings created with Bidirectional Encoder Representations from Transformers (BERT), in combination with the machine learning algorithms Logistic regression, Random forest, Feedforward Neural Network (FFNN), Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM). The models were evaluated using accuracy. Given the poor result of the binary classification task, we did not proceed the development of the NLU component, but instead shifted our focus towards understanding the reasons behind this result. We observed that increasing the complexity of the model gave better results for the binary classification problem, while changing the sentence representation had little impact, beside BERT’s embeddings. The best performing model was an FFNN with BERT’s classification token. However, none of the models showed any remarkable results. We concluded that the main reason for this was the lack of data and the unsatisfactory quality of the data labeling. In addition to this, the utterances in the dataset were quite long and not narrowed down to specific intents, which made them harder to classify. In summary, we experienced that the data played a big part in holding back the machine learning model’s performance. This shows the importance of both good quality data, and proper labeling in the development of a well-functioning dialogue system.