Measuring Human Preferences in Counterfactual Explanations

Name
Rasmus Moorits Veski
Abstract
With modern machine learning models’ decision-making process growingincreasingly complex, the reasoning behind their decisions becomes more opaque. Aneffective method to understand why a model made a specific choice is through counterfactual explanations. However, this raises another challenge: how to produce explanations that are most useful for humans. One possible approach is to incrementally integrate human cognitive biases into counterfactual search algorithms. To investigate which biases are relevant, this thesis conducts a survey in which respondents rate counterfactual explanations based on overall satisfaction and adherence to seven explanatory criteria. The measured biases provided insights into the role of sub-criteria in assessing the subjective measure of overall satisfaction. However, data analysis on the responses indicated, that the biases are strongly interwoven, with fewer underlying factors possibly accounting for human biases. Overall, humans seemed to place the most emphasis on the feasibility of the explanation. The findings in this thesis and the dataset generated from the questionnaire help pave the way towards developing more human-like explainable systems.
Graduation Thesis language
English
Graduation Thesis type
Bachelor - Computer Science
Supervisor(s)
Marharyta Domnich, Kadi Tulver, Raul Vicente
Defence year
2024
 
PDF