Assessing the Quality of Counterfactual Explanations with Large Language Models

Name
Julius Välja
Abstract
With the accelerating spread of machine learning models, the complexity and lack of transparency of current models has become a major source of concern. The field of Explainable AI focuses on finding methods that can uncover the inner logic of these models. One such method is counterfactual explanations, which seek to answer the question "How would the original situation need to be different to achieve a different prediction from the model?". However, the qualities that make a counterfactual explanation good are not fully understood and are difficult to quantify. In this thesis, a survey was used to gather a dataset of human-evaluated counterfactual explanations, with an array of qualities defined based on previous literature. This dataset was used to explore Large Language Models' (LLMs) ability to evaluate subjective qualities of counterfactual explanations with and without fine-tuning. The results showed that large LLMs exhibit 70\\% to 95\\% accuracy at this task, depending on the specific model and testing dataset. While smaller LLMs could be fine-tuned to achieve acceptable accuracy, they were generally significantly less capable. In addition, the effect of correlations between metrics was tested for and experiments performed to assess the feasibility of predicting user satisfaction as well as modelling individual preferences. These results pave the way for future research regarding the automatic evaluation of counterfactual explanations and the development of new search algorithms.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Marharyta Domnich, Raul Vicente, Eduard Barbu
Defence year
2024
 
PDF