Detecting Out-of-Context Image-Caption Pairs in News: A Counter-Intuitive Method

Moholdt, Eivind

Moholdt, Eivind

Master thesis

Åpne

master thesis (138.7Mb)

Permanent lenke

https://hdl.handle.net/11250/3072574

Utgivelsesdato

2023-06-02

Metadata

Vis full innførsel

Samlinger

Master theses [213]

Sammendrag

The growth of misinformation and re-contextualized media in social media and news along leads to an increasing need for fact-checking methods. Concurrently, the recent advancement in generative models enables a new method of spreading misinformation via AI manipulated images and videos. With the advent of deep-learning technologies used to create ‘deepfakes’ and generative models that can produce near realistic images from text prompts, fabricated media is getting increasingly hard to detect. The media covered in this project is re-contextualised media referred to as ‘cheapfakes’, which consists of image and its associated captions. For example, an image might appear in different online sources with different caption claims. The field of generative models is currently experiencing rapid growth. While text-to-image generative models can potentially be misused for ‘deepfake’ creation and spreading misinformation, this thesis present a positive application. In this thesis we present a novel approach using generative image models to our advantage for cheapfake detection. We present two new datasets with a total of 6800 images generated using two different generative models including (1) DALL-E 2, and (2) Stable-Diffusion. We propose text-to-image generative models can be employed to detect out-of-context media. The similarity or dissimilarity of the generated images versus the original image may serve as an indicator of opposing or misleading captions containing out-of-context news. We evaluate and employ a handful of methods for computing image similarity. Our cheapfake detection model is the first of its kind that utilizes generative models, and we achieve a 68% accuracy score for cheapfake detection, proving that image generation models can be utilized to efficiently detect cheapfake media. Our models similarity measures also align with human perception of image similarity. Moreover, we outline several opportunities for optimization. We are confident that the method proposed in this thesis can further research on generative models in the field of cheapfake detection. We are confident that the resulting datasets can be used to train and evaluate new models aimed at detecting cheapfake media and would further research in this area.

Utgiver

The University of Bergen