Investigating Biases in Rules Extracted from Language Models
MetadataShow full item record
- Master theses 
We investigate an approach for extracting occupational gender bias in the form of logical rules from Large Language Models (LLM)s based on Angluin's exact learning model with membership and equivalence queries to an oracle. In our approach, the oracle is a LLM and we show the changes that are necessary to use Angluin's algorithm with such an oracle. In our experiments, we extract occupational gender bias with the adapted algorithm from BERT and roBERTa models and compare our results to an established bias extraction method, which is template-based probing. Our goal is to use a new method to combine multiple attributes in a template sentence and to study their relationship to the gender in a sentence. We achieve this by using our rule extraction approach with a variable template containing multiple attributes. The extracted rules show a similar bias as previous bias extraction methods but also give insight into more complex relationships between attributes.