Vis enkel innførsel

dc.contributor.authorBarnes, Jeremy Claude
dc.contributor.authorTouileb, Samia
dc.contributor.authorMæhlum, Petter
dc.contributor.authorLison, Pierre
dc.date.accessioned2023-08-14T09:25:21Z
dc.date.available2023-08-14T09:25:21Z
dc.date.created2023-06-27T13:16:25Z
dc.date.issued2023
dc.identifier.isbn978-99-1621-999-7
dc.identifier.urihttps://hdl.handle.net/11250/3083739
dc.description.abstractDialectal variation is present in many human languages and is attracting a growing interest in NLP. Most previous work concentrated on either (1) classifying dialectal varieties at the document or sentence level or (2) performing standard NLP tasks on dialectal data. In this paper, we propose the novel task of token-level dialectal feature prediction. We present a set of fine-grained annotation guidelines for Norwegian dialects, expand a corpus of dialectal tweets, and manually annotate them using the introduced guidelines. Furthermore, to evaluate the learnability of our task, we conduct labeling experiments using a collection of baselines, weakly supervised and supervised sequence labeling models. The obtained results show that, despite the difficulty of the task and the scarcity of training data, many dialectal features can be predicted with reasonably high accuracy.en_US
dc.language.isoengen_US
dc.relation.ispartofProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleIdentifying Token-Level Dialectal Features in Social Mediaen_US
dc.typeChapteren_US
dc.description.versionpublishedVersionen_US
cristin.ispublishedtrue
cristin.fulltextoriginal
dc.identifier.cristin2158637
dc.source.pagenumber146-158en_US
dc.relation.projectNorges forskningsråd: 309834en_US
dc.subject.nsiVDP::Datateknologi: 551en_US
dc.subject.nsiVDP::Computer technology: 551en_US
dc.identifier.citationIn: Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 146–158.en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal