Investigating Biases in Rules Extracted from Language Models

Blum, Sophie Martina

Blum, Sophie Martina

Master thesis

Åpne

master thesis (1.249Mb)

Permanent lenke

https://hdl.handle.net/11250/3086287

Utgivelsesdato

2023-08-01

Metadata

Vis full innførsel

Samlinger

Master theses [205]

Sammendrag

We investigate an approach for extracting occupational gender bias in the form of logical rules from Large Language Models (LLM)s based on Angluin's exact learning model with membership and equivalence queries to an oracle. In our approach, the oracle is a LLM and we show the changes that are necessary to use Angluin's algorithm with such an oracle. In our experiments, we extract occupational gender bias with the adapted algorithm from BERT and roBERTa models and compare our results to an established bias extraction method, which is template-based probing. Our goal is to use a new method to combine multiple attributes in a template sentence and to study their relationship to the gender in a sentence. We achieve this by using our rule extraction approach with a variable template containing multiple attributes. The extracted rules show a similar bias as previous bias extraction methods but also give insight into more complex relationships between attributes.

Utgiver

The University of Bergen