Automated Moderation: Detecting Irony in a Norwegian Facebook Comment Section using a Longformer Transformer Model with a Context Encoded Dataset
Abstract
Irony is a complex phenomenon of human communication and due to its contextual nature has been notoriously difficult for machine learning algorithms to detect. With an established practical definition of irony based in the environment of Facebook comment sections. Used together with a Norwegian language pre-trained BERT model converted to a long version that supports longer text inputs, and a Norwegian Facebook comment dataset with contextual article and reply comment text included. It was found that the long BERT model trained on the context included inputs dataset outperformed the short BERT models trained on datasets of the same and more comments, but without the contextual information encoded.