dc.contributor.author | Jahr, Oskar Emilius Buserud | |
dc.date.accessioned | 2023-06-22T00:00:05Z | |
dc.date.available | 2023-06-22T00:00:05Z | |
dc.date.issued | 2023-06-01 | |
dc.date.submitted | 2023-06-21T22:01:16Z | |
dc.identifier.uri | https://hdl.handle.net/11250/3072548 | |
dc.description.abstract | GDELT is a project with a large scale, continuously updated databank that provides a real-time image of the global news picture by outputting these as files that can be downloaded and used by anyone. However, this data is of low granularity, and each source of data does not provide much information on its own. This thesis attempts to leverage the large amount of data available by utilizing a Hierarchical Agglomerative Cluster method to identify news articles that report about the same real life event. To do this, the thesis also explores if the GDELT data is granular enough to be used without extensive preprocessing, and if a distance metric for the cluster algorithm can be created. The findings show promising results when regarded with qualitative measures, but the quantitative measures are not yet optimized. Inherent flaws in GDELT and clustering algorithms are a hurdle to be overcome before the real potential of GDELT’s data can be unleashed, and this thesis will explore some of these difficulties and make recommendations for how to circumvent them in future works. | |
dc.language.iso | eng | |
dc.publisher | The University of Bergen | |
dc.rights | Copyright the Author. All rights reserved | |
dc.subject | clustering | |
dc.subject | hierarchical agglomerative clustering | |
dc.subject | python | |
dc.subject | machine learning | |
dc.subject | GDELT | |
dc.title | Creating an Agglomerative Clustering Approach Using GDELT | |
dc.type | Master thesis | |
dc.date.updated | 2023-06-21T22:01:16Z | |
dc.rights.holder | Copyright the Author. All rights reserved | |
dc.description.degree | Masteroppgave i informasjonsvitenskap | |
dc.description.localcode | INFO390 | |
dc.description.localcode | MASV-INFO | |
dc.subject.nus | 735115 | |
fs.subjectcode | INFO390 | |
fs.unitcode | 15-17-0 | |