dc.contributor.author | Aarnes, Peter Røysland | |
dc.date.accessioned | 2023-06-22T00:03:43Z | |
dc.date.available | 2023-06-22T00:03:43Z | |
dc.date.issued | 2023-06-02 | |
dc.date.submitted | 2023-06-21T22:01:20Z | |
dc.identifier.uri | https://hdl.handle.net/11250/3072564 | |
dc.description.abstract | Traditionally, named entity recognition (NER) research use properly capitalized data for training and testing give little insight to how these models may perform in scenarios where proper capitalization is not in place. In this thesis, I explore the capabilities of five fine-tuning BERT based models for NER in all lowercase text. Furthermore, I aim to measure the performance for classifying named entity types correctly, as well as just simply detecting that a named entity is present, so that capitalization errors may be corrected. The performance is assessed using all lowercase data from the NorNE dataset, and the Norwegian Parliamentary Speech Corpus. Findings suggest that the fine-tuned BERT models are highly capable of detecting non-capitalized named entities, but do not perform as well as traditional NER models that are trained and tested on properly capitalized text. | |
dc.language.iso | nob | |
dc.publisher | The University of Bergen | |
dc.rights | Copyright the Author. All rights reserved | |
dc.subject | Natural Language Processing | |
dc.title | Named Entity Recognition in Speech-to-Text Transcripts | |
dc.type | Master thesis | |
dc.date.updated | 2023-06-21T22:01:20Z | |
dc.rights.holder | Copyright the Author. All rights reserved | |
dc.description.degree | Masteroppgave i informasjonsvitenskap | |
dc.description.localcode | INFO390 | |
dc.description.localcode | MASV-INFO | |
dc.subject.nus | 735115 | |
fs.subjectcode | INFO390 | |
fs.unitcode | 15-17-0 | |