Investigating Biases in Rules Extracted from Language Models

Blum, Sophie Martina

dc.contributor.author	Blum, Sophie Martina
dc.date.accessioned	2023-08-29T23:39:53Z
dc.date.available	2023-08-29T23:39:53Z
dc.date.issued	2023-08-01
dc.date.submitted	2023-08-29T22:00:23Z
dc.identifier.uri	https://hdl.handle.net/11250/3086287
dc.description.abstract	We investigate an approach for extracting occupational gender bias in the form of logical rules from Large Language Models (LLM)s based on Angluin's exact learning model with membership and equivalence queries to an oracle. In our approach, the oracle is a LLM and we show the changes that are necessary to use Angluin's algorithm with such an oracle. In our experiments, we extract occupational gender bias with the adapted algorithm from BERT and roBERTa models and compare our results to an established bias extraction method, which is template-based probing. Our goal is to use a new method to combine multiple attributes in a template sentence and to study their relationship to the gender in a sentence. We achieve this by using our rule extraction approach with a variable template containing multiple attributes. The extracted rules show a similar bias as previous bias extraction methods but also give insight into more complex relationships between attributes.
dc.language.iso	eng
dc.publisher	The University of Bergen
dc.rights	Copyright the Author. All rights reserved
dc.subject	exact learning
dc.subject	natural language processing
dc.subject	machine learning
dc.title	Investigating Biases in Rules Extracted from Language Models
dc.type	Master thesis
dc.date.updated	2023-08-29T22:00:23Z
dc.rights.holder	Copyright the Author. All rights reserved
dc.description.degree	Master's Thesis in Informatics
dc.description.localcode	INF399
dc.description.localcode	MAMN-PROG
dc.description.localcode	MAMN-INF
dc.subject.nus	754199
fs.subjectcode	INF399
fs.unitcode	12-12-0

Tilhørende fil(er)

Filnavn:: Master_Thesis_Sophie_Blum.pdf
Størrelse:: 1.249Mb
Format:: PDF
Beskrivelse:: master thesis

Åpne

Denne innførselen finnes i følgende samling(er)

Master theses [197]

Vis enkel innførsel