Learning Horn envelopes via queries from language models

Blum, Sophie; Koudijs, Raoul; Ozaki, Ana; Touileb, Samia

dc.contributor.author	Blum, Sophie
dc.contributor.author	Koudijs, Raoul
dc.contributor.author	Ozaki, Ana
dc.contributor.author	Touileb, Samia
dc.date.accessioned	2024-08-09T08:07:01Z
dc.date.available	2024-08-09T08:07:01Z
dc.date.created	2023-09-21T16:20:19Z
dc.date.issued	2024
dc.identifier.issn	0888-613X
dc.identifier.uri	https://hdl.handle.net/11250/3145503
dc.description.abstract	We present an approach for systematically probing a trained neural network to extract a symbolic abstraction of it, represented as a Boolean formula. We formulate this task within Angluin's exact learning framework, where a learner attempts to extract information from an oracle (in our work, the neural network) by posing membership and equivalence queries. We adapt Angluin's algorithm for Horn formula to the case where the examples are labelled w.r.t. an arbitrary Boolean formula in CNF (rather than a Horn formula). In this setting, the goal is to learn the smallest representation of all the Horn clauses implied by a Boolean formula—called its Horn envelope—which in our case correspond to the rules obeyed by the network. Our algorithm terminates in exponential time in the worst case and in polynomial time if the target Boolean formula can be closely approximated by its envelope. We also show that extracting Horn envelopes in polynomial time is as hard as learning CNFs in polynomial time. To showcase the applicability of the approach, we perform experiments on BERT based language models and extract Horn envelopes that expose occupation-based gender biases.	en_US
dc.language.iso	eng	en_US
dc.publisher	Elsevier	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Learning Horn envelopes via queries from language models	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	Copyright 2023 The Author(s)	en_US
dc.source.articlenumber	109026	en_US
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2
dc.identifier.doi	10.1016/j.ijar.2023.109026
dc.identifier.cristin	2177673
dc.source.journal	International Journal of Approximate Reasoning	en_US
dc.relation.project	Norges forskningsråd: 309339	en_US
dc.relation.project	Norges forskningsråd: 316022	en_US
dc.identifier.citation	International Journal of Approximate Reasoning. 2024, 171, 109026.	en_US
dc.source.volume	171	en_US

Files in this item

Name:: 1-s2.0-S0888613X23001573-main.pdf
Size:: 901.8Kb
Format:: PDF
Description:: PDF

View/Open

This item appears in the following Collection(s)

Department of Informatics [978]
Registrations from Cristin [10237]

Show simple item record

Except where otherwise noted, this item's license is described as Navngivelse 4.0 Internasjonal