Semantics driven anaphora resolution

Skaugen, Håvar

dc.contributor.author	Skaugen, Håvar
dc.date.accessioned	2016-01-05T12:10:13Z
dc.date.available	2016-01-05T12:10:13Z
dc.date.issued	2015-11-20
dc.date.submitted	2015-11-20	eng
dc.identifier.uri	https://hdl.handle.net/1956/10867
dc.description.abstract	This thesis describes a method for generating semantically motivated antecedent candidates for use in pronominal anaphora resolution. Predicate-argument structures are extracted from a large corpus of text parsed by the NorGram grammar and used as the basis for a fuzzy classification model. Given a pronominal anaphor, the model generates antecedent candidates ranked by the frequency by which they co-occur in the same lexical context as the anaphor. This set of candidates is intersected with the set of nouns gathered from the anaphor's recent context. A selection basic heuristics are then introduced to the model in a permutational fashion to gauge their individual and combined effect on the model's accuracy. The model reached an accuracy of 56.22% correct predictions. Additionally, in a slightly modified model the correct antecedent was found within the antecedent candidate list for 87.12% of the anaphora.	en_US
dc.description.abstract	I denne oppgaven beskriver jeg en metode for å generere semantisk motiverte antesedentkandidater til bruk i anaforoppløsning. Predikat-argument strukturer blir ekstrahert fra et stort korpus med tekst tagget med NorGram-grammatikken og brukt som basis i en "fuzzy" klassifikasjonsmodell. Modellen genererer antesedentkandidater for pronominelle anaforer rangert etter hvilken frekvens de forekommer i samme leksikale kontekst som anaforen. Et snitt blir foretatt mellom dette settet av kandidater og settet av substantiver i anaforens foregående kontekst. Et utvalg enkle heuristikker blir tilført modellen i forskjellige permutasjoner for å måle deres samlede og individuelle effekt på modellens treffsikkerhet. Modellen nådde en treffsikkerhet på 56.22% korrekte klassifiserte antesedenter. For en delvis modifisert versjon av modellen finnes den korrekte antesedenten blant antesedentkandidatene i 87.12% av tilfellene.	en_US
dc.format.extent	710198 bytes	eng
dc.format.mimetype	application/pdf	eng
dc.language.iso	eng	eng
dc.publisher	The University of Bergen	eng
dc.subject	anaphora resolution	eng
dc.subject	semantics	eng
dc.subject	real-world knowledge	eng
dc.subject	anaphora	eng
dc.subject	antecedents	eng
dc.title	Semantics driven anaphora resolution	eng
dc.type	Master thesis
dc.rights.holder	Copyright the author. All rights reserved	eng
dc.description.degree	Master i Datalingvistikk og språkteknologi
dc.description.localcode	MAHF-DASP
dc.description.localcode	DASP350
dc.subject.nus	711726	eng
fs.subjectcode	DASP350

Tilhørende fil(er)

Filnavn:: 141259970.pdf
Størrelse:: 693.5Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Department of Linguistics, Literary and Aestetic Studies [984]

Vis enkel innførsel