Named Entity Recognition in Speech-to-Text Transcripts
Master thesis

View/ Open
Date
2023-06-02Metadata
Show full item recordCollections
- Master theses [254]
Abstract
Traditionally, named entity recognition (NER) research use properly capitalized data for training and testing give little insight to how these models may perform in scenarios where proper capitalization is not in place. In this thesis, I explore the capabilities of five fine-tuning BERT based models for NER in all lowercase text. Furthermore, I aim to measure the performance for classifying named entity types correctly, as well as just simply detecting that a named entity is present, so that capitalization errors may be corrected. The performance is assessed using all lowercase data from the NorNE dataset, and the Norwegian Parliamentary Speech Corpus. Findings suggest that the fine-tuned BERT models are highly capable of detecting non-capitalized named entities, but do not perform as well as traditional NER models that are trained and tested on properly capitalized text.