Show simple item record

dc.contributor.authorBlum, Sophie
dc.contributor.authorKoudijs, Raoul
dc.contributor.authorOzaki, Ana
dc.contributor.authorTouileb, Samia
dc.date.accessioned2024-08-09T08:07:01Z
dc.date.available2024-08-09T08:07:01Z
dc.date.created2023-09-21T16:20:19Z
dc.date.issued2024
dc.identifier.issn0888-613X
dc.identifier.urihttps://hdl.handle.net/11250/3145503
dc.description.abstractWe present an approach for systematically probing a trained neural network to extract a symbolic abstraction of it, represented as a Boolean formula. We formulate this task within Angluin's exact learning framework, where a learner attempts to extract information from an oracle (in our work, the neural network) by posing membership and equivalence queries. We adapt Angluin's algorithm for Horn formula to the case where the examples are labelled w.r.t. an arbitrary Boolean formula in CNF (rather than a Horn formula). In this setting, the goal is to learn the smallest representation of all the Horn clauses implied by a Boolean formula—called its Horn envelope—which in our case correspond to the rules obeyed by the network. Our algorithm terminates in exponential time in the worst case and in polynomial time if the target Boolean formula can be closely approximated by its envelope. We also show that extracting Horn envelopes in polynomial time is as hard as learning CNFs in polynomial time. To showcase the applicability of the approach, we perform experiments on BERT based language models and extract Horn envelopes that expose occupation-based gender biases.en_US
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleLearning Horn envelopes via queries from language modelsen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionpublishedVersionen_US
dc.rights.holderCopyright 2023 The Author(s)en_US
dc.source.articlenumber109026en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2
dc.identifier.doi10.1016/j.ijar.2023.109026
dc.identifier.cristin2177673
dc.source.journalInternational Journal of Approximate Reasoningen_US
dc.relation.projectNorges forskningsråd: 309339en_US
dc.relation.projectNorges forskningsråd: 316022en_US
dc.identifier.citationInternational Journal of Approximate Reasoning. 2024, 171, 109026.en_US
dc.source.volume171en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Navngivelse 4.0 Internasjonal
Except where otherwise noted, this item's license is described as Navngivelse 4.0 Internasjonal