Show simple item record

dc.contributor.authorJahr, Oskar Emilius Buserud
dc.date.accessioned2023-06-22T00:00:05Z
dc.date.available2023-06-22T00:00:05Z
dc.date.issued2023-06-01
dc.date.submitted2023-06-21T22:01:16Z
dc.identifier.urihttps://hdl.handle.net/11250/3072548
dc.description.abstractGDELT is a project with a large scale, continuously updated databank that provides a real-time image of the global news picture by outputting these as files that can be downloaded and used by anyone. However, this data is of low granularity, and each source of data does not provide much information on its own. This thesis attempts to leverage the large amount of data available by utilizing a Hierarchical Agglomerative Cluster method to identify news articles that report about the same real life event. To do this, the thesis also explores if the GDELT data is granular enough to be used without extensive preprocessing, and if a distance metric for the cluster algorithm can be created. The findings show promising results when regarded with qualitative measures, but the quantitative measures are not yet optimized. Inherent flaws in GDELT and clustering algorithms are a hurdle to be overcome before the real potential of GDELT’s data can be unleashed, and this thesis will explore some of these difficulties and make recommendations for how to circumvent them in future works.
dc.language.isoeng
dc.publisherThe University of Bergen
dc.rightsCopyright the Author. All rights reserved
dc.subjectclustering
dc.subjecthierarchical agglomerative clustering
dc.subjectpython
dc.subjectmachine learning
dc.subjectGDELT
dc.titleCreating an Agglomerative Clustering Approach Using GDELT
dc.typeMaster thesis
dc.date.updated2023-06-21T22:01:16Z
dc.rights.holderCopyright the Author. All rights reserved
dc.description.degreeMasteroppgave i informasjonsvitenskap
dc.description.localcodeINFO390
dc.description.localcodeMASV-INFO
dc.subject.nus735115
fs.subjectcodeINFO390
fs.unitcode15-17-0


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record