ECCCos from the Black Box

Faithful Model Explanations through Energy-Constrained Conformal Counterfactuals

Delft University of Technology

Mojtaba Farmanbar
Arie van Deursen
Cynthia C. S. Liem

May 9, 2024

Pick your Poison

All of these counterfactuals are valid explanations for the model’s prediction.

Which one would you pick?

Figure 1: Turning a 9 into a 7: Counterfactual explanations for an image classifier produced using Wachter (Wachter, Mittelstadt, and Russell 2017), Schut (Schut et al. 2021) and REVISE (Joshi et al. 2019).

Faithfulness first, plausibility second.

We propose ECCCo: a new way to generate faithful model explanations that are as plausible as the underlying model permits.

Summary

  • Idea: generate counterfactuals that are consistent with what the model has learned about the data.
  • Method: constrain the model’s energy and predictive uncertainty for the counterfactual.
  • Result: faithful counterfactuals that are as plausible as the model permits.
  • Benefits: enable us to distinguish trustworthy from unreliable models.

Counterfactual Explanations

\[ \begin{aligned} \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)} + \lambda {\text{cost}(f(\mathbf{Z}^\prime)) } \} \end{aligned} \]

Counterfactual Explanations (CE) explain how inputs into a model need to change for it to produce different outputs.

Figure 2: Gradient-based counterfactual search.

Reconciling Faithfulness and Plausibility

Plausibility

Definition 1 (Plausible Counterfactuals) Let \(\mathcal{X}|\mathbf{y}^+= p(\mathbf{x}|\mathbf{y}^+)\) denote the true conditional distribution of samples in the target class \(\mathbf{y}^+\). Then for \(\mathbf{x}^{\prime}\) to be considered a plausible counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}|\mathbf{y}^+\).

Why Plausibility?

Plausibility is positively associated with actionability, robustness (Artelt et al. 2021) and causal validity (Mahajan, Tan, and Sharma 2020).

Figure 3: Kernel density estimate (KDE) for the conditional distribution, \(p(\mathbf{x}|\mathbf{y}^+)\), based on observed data. Counterfactual path as in Figure 2.

Faithfulness

Definition 2 (Faithful Counterfactuals) Let \(\mathcal{X}_{\theta}|\mathbf{y}^+ = p_{\theta}(\mathbf{x}|\mathbf{y}^+)\) denote the conditional distribution of \(\mathbf{x}\) in the target class \(\mathbf{y}^+\), where \(\theta\) denotes the parameters of model \(M_{\theta}\). Then for \(\mathbf{x}^{\prime}\) to be considered a faithful counterfactual, we need: \(\mathbf{x}^{\prime} \sim \mathcal{X}_{\theta}|\mathbf{y}^+\).

Trustworthy Models

If the model posterior approximates the true posterior (\(p_{\theta}(\mathbf{x}|\mathbf{y}^+) \rightarrow p(\mathbf{x}|\mathbf{y}^+)\)), faithful counterfactuals are also plausible.

Figure 4: KDE for learned conditional distribution, \(p_{\theta}(\mathbf{x}|\mathbf{y}^+)\). Yellow stars indicate conditional samples generated through SGLD for a joint energy model (JEM).

ECCCo

Key Idea

Use the hybrid objective of joint energy models (JEM) and a model-agnostic penalty for predictive uncertainty: Energy-Constrained (\(\mathcal{E}_{\theta}\)) Conformal (\(\Omega\)) Counterfactuals (ECCCo).

ECCCo objective1:

\[ \begin{aligned} & \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {L_{\text{clf}}(f(\mathbf{Z}^\prime);M_{\theta},\mathbf{y}^+)}+ \lambda_1 {\text{cost}(f(\mathbf{Z}^\prime)) } \\ &+ \lambda_2 \mathcal{E}_{\theta}(f(\mathbf{Z}^\prime)|\mathbf{y}^+) + \lambda_3 \Omega(C_{\theta}(f(\mathbf{Z}^\prime);\alpha)) \} \end{aligned} \]

Figure 5: Gradient fields and counterfactual paths for different generators.

Results

Visual Evidence

Figure 6: Turning a 9 into a 7. ECCCo applied to MLP (a), Ensemble (b), JEM (c), JEM Ensemble (d).

ECCCo generates counterfactuals that

  • faithfully represent model quality (Figure 6).
  • achieve state-of-the-art plausibility (Figure 7).
Figure 7: Results for different generators (from 3 to 5).

The Numbers

  • Large benchmarks on a variety of models and datasets from various domains.
  • ECCCo achieves state-of-the-art faithfulness across models and datasets and approaches state-of-the-art plausibility for more trustworthy models.

Questions?

With thanks to my co-authors Mojtaba Farmanbar, Arie van Deursen and Cynthia C. S. Liem.

Code

The code used to run the analysis for this work is built on top of CounterfactualExplanations.jl.

There is also a corresponding paper, Explaining Black-Box Models through Counterfactuals, which has been published in JuliaCon Proceedings.

Trustworthy AI in Julia: github.com/JuliaTrustworthyAI

Trustworthy AI in Julia: github.com/JuliaTrustworthyAI

References

Artelt, André, Valerie Vaquet, Riza Velioglu, Fabian Hinder, Johannes Brinkrolf, Malte Schilling, and Barbara Hammer. 2021. “Evaluating Robustness of Counterfactual Explanations.” In 2021 IEEE Symposium Series on Computational Intelligence (SSCI), 01–09. IEEE.
Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.” In International Conference on Learning Representations.
Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” https://arxiv.org/abs/1907.09615.
Mahajan, Divyat, Chenhao Tan, and Amit Sharma. 2020. “Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers.” https://arxiv.org/abs/1912.03277.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Stutz, David, Krishnamurthy, Dvijotham, Ali Taylan Cemgil, and Arnaud Doucet. 2022. “Learning Optimal Conformal Classifiers.” https://arxiv.org/abs/2110.09192.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.