Summary
With explainable artificial intelligence in it's infancy, the overarching domain has limited documentation for those less familiar with the field. This article aims to aid nonspecialists in developing a high-level understanding of the field. This summarization will also contain links and descriptions of important papers to provide direction to those entering the research space.
What is Explainable Artificial Intelligence (XAI)?
At a high level, XAI is defined by IBM as "a set of processes and methods that allow humans to comprehend and trust the results and output created by machine learning algorithms" [1]. This is an important step in overcoming the limitations imposed by the black-box nature of machine learning. Without understanding the underlying mechanisms that lead to the decisions of these models, we, as a population, become apprehensive about relying on these tools for matters of health, security, and safety -- and rightfully so. According to Derek Doran et al. [2], all of these models can be divided into four tiers of explainability: Opaque, interpretable, comprehensible, and explainable.
Opaque Systems
These are models in which the inner workings are entirely invisible to the user. In [2], the author humorously described this category of models as "an oracle that makes predictions over an input, without indicating how and why predictions are made." This category also applies to models in which the weights and biases decided by the model are visible, but carry no translatable meaning that betters model interpretability. The majority of neural networks are within this group of systems.
Interpretable Systems
This category describes models that not only have visible weights and biases, but the input has an understood and measurable importance to the output. This grouping spans rudimentary ML models such as linear regression, supported vector machines, and decision trees, but also much more complex models, including NNs equipped with post-hoc explainers (SHAP, LIME, Saliency Maps)[3 - 6]. It's because of this transparency that interpretable and opaque systems are mutually exclusive.
Comprehensible Systems
Comprehensible systems, in contrast to the post-hoc explainers described previously, are models that were designed with explainability in mind. These models often output a method of comprehension alongside the desired output, many times in the form of words or images. These extra outputs, termed as symbols, allow the user to draw connections between the properties of the input and its corresponding output [7]. For example, imagine a comprehensible image classifier model. Not only would it classify an image of a dog as such, but it would indicate the relevant features of the input that led to that decision: ears, tail, collar, etc. One last thing to note - comprehensible systems, despite outputting symbols for model interpretability, are not defined with the same model transparency as interpretable systems and thus can be either interpretable or transparent.
Explainable Systems
While comprehensible systems leave interpretations of the symbols up to the user, explainable systems take it a step further and introduce a "reasoner" model, which explains the exact reasoning of the model given the symbols. This reasoner model comes equipped with a knowledge base similar to that of a human evaluator, to standardize the evaluation process, no longer dependent on how informed the human is. The evaluator draws connections directly from the input to the output, leaving no gray area in interpretation. It is important to note that the human element of this type of model still exists by checking the interpretation of the reasoner.
The Human Element
All of these systems have one (generally welcome) bottleneck; They all leave the "trustworthiness" up to the user. It is the user's responsibility to evaluate whether the internal weights of an interpretable model or the symbols provided by the comprehensible model are reasonable and morally justifiable explanations for the output. This human element acts as a final check for the model, resulting in a system (in theory) as trustworthy as the model evaluator.
How Do We Improve Explainability?
XAI is the field of study pertaining to the upward mobility of models between tiers. Now that we understand the objective of XAI, we can strive to comprehend the strategies researchers employ to accomplish this.
Citations
[1] https://www.ibm.com/think/topics/explainable-ai
[2] https://arxiv.org/pdf/1710.00794
[3] https://arxiv.org/abs/2303.08806
[4] https://dl.acm.org/doi/10.5555/3295222.3295230
[5] https://dl.acm.org/doi/10.1145/2939672.2939778
[6] https://arxiv.org/abs/1312.6034
[7] Michie, D.: Machine learning in the next five years. In: Proc. of the Third European Working Session on Learning. pp. 107–122. Pitman (1988)