Named Entity Recognition in Speech-to-Text Transcripts

Aarnes, Peter Røysland

dc.contributor.author	Aarnes, Peter Røysland
dc.date.accessioned	2023-06-22T00:03:43Z
dc.date.available	2023-06-22T00:03:43Z
dc.date.issued	2023-06-02
dc.date.submitted	2023-06-21T22:01:20Z
dc.identifier.uri	https://hdl.handle.net/11250/3072564
dc.description.abstract	Traditionally, named entity recognition (NER) research use properly capitalized data for training and testing give little insight to how these models may perform in scenarios where proper capitalization is not in place. In this thesis, I explore the capabilities of five fine-tuning BERT based models for NER in all lowercase text. Furthermore, I aim to measure the performance for classifying named entity types correctly, as well as just simply detecting that a named entity is present, so that capitalization errors may be corrected. The performance is assessed using all lowercase data from the NorNE dataset, and the Norwegian Parliamentary Speech Corpus. Findings suggest that the fine-tuned BERT models are highly capable of detecting non-capitalized named entities, but do not perform as well as traditional NER models that are trained and tested on properly capitalized text.
dc.language.iso	nob
dc.publisher	The University of Bergen
dc.rights	Copyright the Author. All rights reserved
dc.subject	Natural Language Processing
dc.title	Named Entity Recognition in Speech-to-Text Transcripts
dc.type	Master thesis
dc.date.updated	2023-06-21T22:01:20Z
dc.rights.holder	Copyright the Author. All rights reserved
dc.description.degree	Masteroppgave i informasjonsvitenskap
dc.description.localcode	INFO390
dc.description.localcode	MASV-INFO
dc.subject.nus	735115
fs.subjectcode	INFO390
fs.unitcode	15-17-0

Tilhørende fil(er)

Filnavn:: Named_Entity_Recognition_in_Sp ...
Størrelse:: 2.381Mb
Format:: PDF
Beskrivelse:: master thesis

Åpne

Denne innførselen finnes i følgende samling(er)

Master theses [247]

Vis enkel innførsel