Vis enkel innførsel

dc.contributor.authorAarnes, Peter Røysland
dc.date.accessioned2023-06-22T00:03:43Z
dc.date.available2023-06-22T00:03:43Z
dc.date.issued2023-06-02
dc.date.submitted2023-06-21T22:01:20Z
dc.identifier.urihttps://hdl.handle.net/11250/3072564
dc.description.abstractTraditionally, named entity recognition (NER) research use properly capitalized data for training and testing give little insight to how these models may perform in scenarios where proper capitalization is not in place. In this thesis, I explore the capabilities of five fine-tuning BERT based models for NER in all lowercase text. Furthermore, I aim to measure the performance for classifying named entity types correctly, as well as just simply detecting that a named entity is present, so that capitalization errors may be corrected. The performance is assessed using all lowercase data from the NorNE dataset, and the Norwegian Parliamentary Speech Corpus. Findings suggest that the fine-tuned BERT models are highly capable of detecting non-capitalized named entities, but do not perform as well as traditional NER models that are trained and tested on properly capitalized text.
dc.language.isonob
dc.publisherThe University of Bergen
dc.rightsCopyright the Author. All rights reserved
dc.subjectNatural Language Processing
dc.titleNamed Entity Recognition in Speech-to-Text Transcripts
dc.typeMaster thesis
dc.date.updated2023-06-21T22:01:20Z
dc.rights.holderCopyright the Author. All rights reserved
dc.description.degreeMasteroppgave i informasjonsvitenskap
dc.description.localcodeINFO390
dc.description.localcodeMASV-INFO
dc.subject.nus735115
fs.subjectcodeINFO390
fs.unitcode15-17-0


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel