Literature

This is a collection of interesting papers and thoughts around Trustworthy AI that I have been gradually compiling during the early stages of my PhD. Descriptions of papers are very brief. If you’d like to have access to more detailed handwritten notes, please just drop me a line. A list of all references linked here can be found at the bottom.

Explainability

Surrogate Explainers

Ribeiro, Singh, and Guestrin (2016) propose Local Interpretable Model-Agnostic Explanations (LIME): the approach involves generating local perturbations in the input space, deriving predictions from the original classifier and than fitting a white box model (e.g. linear regression) on this synthetic data set.
Lundberg and Lee (2017) propose SHAP as a provably unified approach to additive feature attribution methods (including LIME) with certain desiderata. Contrary to LIME, this approach involves permuting through the feature space and checking how different features impact model predictions when they are included in the permutations.

Criticism (Surrogate Explainers)

“Explanatory models by definition do not produce 100% reliable explanations, because they are approximations. This means explanations can’t be fully trusted, and so neither can the original model.” – causaLens, 2021

Mittelstadt, Russell, and Wachter (2019) points out that there is a gap in the understanding of what explanations are between computer scientists and explanation scientists (social scientists, cognitive scientists, pyschologists, …). Current methods produce at best locally reliable explanations. There needs to be shift towards interactive explanations.
Rudin (2019) argues that instead of bothering with explanations for black box models we should focus on designing inherently interpretable models. In her view the trade-off between (intrinsic) explainability and performance is not as clear-cut as people claim.
Lakkaraju and Bastani (2020) show how misleading black box explanations can manipulate users into trusting an untrustworthy model.
Slack et al. (2020) demonstrate that both LIME and SHAP are not reliable: their reliance on feature perturbations makes them susceptible to adversarial attacks.

Can we quantify robustness of surrogate explainers?
Comparison of different complexity measures.
Design surrogate explainer that incorporates causality.
Surrogate explainers by definition are approximations of black box models, so can never be 100% trusted.
Adversarially robust surrogate explainers by minimizing divergence between observed and perturbed data. (Slack et al. 2020)

Counterfactual Explanations (CE)

Wachter, Mittelstadt, and Russell (2017) were among the first to propose counterfactual explanations that do not require knowledge about the inner workings of a black box model.
Ustun, Spangher, and Liu (2019) propose a framework for actionable recourse in the context of linear classifiers.
Joshi et al. (2019) extend the framework of Ustun, Spangher, and Liu (2019). Their proposed REVISE method is applicable to a broader class of models including black box classifiers and structural causal models. For a summary see here and for a set of slides see here.
Poyiadzi et al. (2020) propose FACE: feasible and actionable counterfactual explanations. The premise is that the shortest distance to the decision boundary may not be a desirable counterfactual.
Schut et al. (2021) introduce Bayesian modelling to the context of CE: their approach implicitly minimizes aleatoric and epistemic uncertainty to generate a CE that us unambiguous and realistic, respectively.

Test what counterfactual explanations are most desirable through user study at ING
Design counterfactual explanations that incorporate causality
- There are in fact several very recent papers (including Joshi et al. (2019)) that link CGMs to counterfactual explanations (see below)
Time series: what is the link to counterfactual analysis in multivariate time series? (e.g. chapter 4 in Kilian and Lütkepohl (2017))
What about the continuous outcome variables? (e.g. target inflation rate, … ING cases?)
How do counterfactual explainers fare where LIME/SHAP fail? (Slack et al. 2020)
Can counterfactual explainers be fooled much like LIME/SHAP?
Can we establish a link between counterfactual and surrogate explainers? Important attributes identified by LIME/SHAP should play a prominent role in counterfactuals.
Can counterfactual explainers be used to detect adversarial examples?
Limiting behaviour: what happens if all individual with negative outcome move across the original decision boundary?

Bayesian Deep Learning

Background

Jospin et al. (2020) provide a detailed and hands-on introduction to Bayesian Deep Learning.
Murphy (2022) is a text book that treats machine learning from a probabilistic perspective. It includes sections dedicated to deep learning.

Interpretability

Ish-Horowicz et al. (2019) proposes an entropy-based measure for interpreting Bayesian Neural Networks. For a summary see here.

Uncertainty quantification and applications

Gal and Ghahramani (2016) demonstrate that a dropout neural network is equivalent to approximate inference in Bayesian modelling of deep Gaussian processes. This makes it straight-forward to quantify uncertainty in deep learning through simple Monte-Carlo methods.
Gal, Islam, and Ghahramani (2017) propose a way towards deep active Bayesian learning that plays with the ideas of aleatoric and epsitemic uncertainty: a structured approach to human-in-the-loop deep learning that can work with small data sets.
- Kirsch, Van Amersfoort, and Gal (2019) extend these ideas.

Computational efficiency

Quantum computing likely to make probabilistic modelling more computationally efficient. Kehoe et al. (2021) propose a Bayesian approach to DL using quantum processors that promises to be more robust than conventional DNNs.
Using simple concentration inequalities Maxim Panov proposes a measure for total uncertainty of Deep Neural Networks (no numerical methods needed) – missing a paper references here.

Compare explainability in Bayesian setting (e.g. RATE (Ish-Horowicz et al. 2019)) to surrogate (and counterfactual) explainers? (ING models)
Link to AFR track on quantum ML.
Link to uncertainty quantification for Deep Vector Autoregression (agusti2021deep?).

Causal AI

Background

There is an emerging view that that current efforts towards interpretability and robustness are fruitless and only an incorporation of causality can provide answers (Pearl and Mackenzie 2018).
Pearl (2019) argues that AI is current stuck at the association level: models are limited to learning \(P(y|X)\) (“glorified curve fitting”). Starting from causal graphical models (CGM) improves transparency and domain adaptability.

Structure learning

Zheng et al. (2018) proposes to cast the combinatorial problem of learning a CGM into a continuous problem that can be learned through standard non-convex constrained optimization for linear structural equation models (SEM).
Lachapelle et al. (2019) extend this idea to the non-linear case.
Bussmann, Nys, and Latré (2020) propose Neural Additive Vector Autoregression (NAVAR) for (Granger) causal discovery in time series setting. The model can be seen as a Generalised Additive Model and is therefore inherently (somewhat) interpretable. It is based on the assumption that contemporary dependencies between variables are linear, only dependencies through time require non-linear model.

Link to CE and algorithmic recourse

Joshi et al. (2019) make an interesting link between CGM and counterfactual explanations: they draw an analogy between hidden confounders in CGMs and the latent manifold which REVISE traverses to propose recourse. Run a single experiment on TWIN dataset and show that recommended recourse changes qualitatively as confounding is introduced.
Karimi et al. (2020) develop two probabilistic approaches to algorithmic recourse in the case of limited causal knowledge.
- In essence the probabilistic approach boils down to assuming a Gaussian Process prior for the causal mapping parent nodes to node \(X\). This yields a posterior noise distribution, which in turn can be used to draw from a counterfactual distribution.
Karimi, Schölkopf, and Valera (2021) demonstrate how to go from counterfactuals to interventions in the case of complete knowledge of the CGM. Propose a shift of paradigm from recourse via counterfactuals to recourse through minimal interventions.

Can explore the link between CGM and CE further, perhaps in the context of Bayesian classifier (Schut et al. 2021).
- In particular, it might be possible to draw an analogue between Schut et al. (2021) (low epistemic + aleatoric uncertainty) and the counterfactual distribution proposed by Karimi et al. (2020). Could further try to account for hidden confounders as in Joshi et al. (2019).
Karimi, Schölkopf, and Valera (2021) can be solved through by building on existing frameworks for generating nearest counterfactual explanations - could try to apply Schut et al. (2021)?
Applications at ING?
- Apply to loan application decision system (if exists)
- Apply to credit scoring (perhaps even Dutch government scandal)
- …

Robustness

Background

Szegedy et al. (2013) were the first to point out the existence of adversarial examples in the image classification domain.
Goodfellow, Shlens, and Szegedy (2014) argue that the existence of adversarial examples can be explained solely by the locally-linear nature of artificial neural networks. They show how simple linear perturbation through their fast gradient sign method can consistently fool many state-of-the-art neural networks. Adversarial training can improve robustness to some extent, but DNNs are still highly confident with respect to misclassified labels.
Carlini and Wagner (2017) show that an initially promising method for robustifying DNNs, namely defensive distillation, is in fact insufficient. They argue that their adversarial attacks should serve as a benchmark for evaluating the robustness of DNNs.

Thoughts

Link to anomaly detection (ING)
Out-of-distribution detection for time series models (e.g. avoid Covid scenarios leading to model failures (Bholat, Gharbawi, and Thew 2020)).
If adversarial training affects the success of adversarial attacks, does it also affect success of CE?
Can we penalize instability much like we penalize complexity in empirical risk minimization?

Applications

Credit and risk scoring

The literature on algorithmic recourse commonly relates the methodology to the example of loan approval (e.g. Karimi, Schölkopf, and Valera 2021; Upadhyay, Joshi, and Lakkaraju 2021)
Kuiper et al. (2021) present a number of example use cases of AI in the context of consumer credit.

Financial Stability

AI robustness and the Covid crisis - negative impact of ML on model performance (Bholat, Gharbawi, and Thew 2020).
Potential for herding behaviour if large share of market participants uses off-the-shelf ML tools (OECD 2021).

Market Microstructure

ML model collusion hard to detect (OECD 2021).
Lack of explainability inhibits timely model adjustments (OECD 2021).
Intentional lack of transparency for proprietary trading (OECD 2021).

SupTech and RegTech

Market participants may start using AI to self-regulate in a transparent, trustworthy way (OECD 2021).
Simalarly, financial regulators are already employing AI for the purpose of supervision.

Monetary policy and forecasting

In Altmeyer, Agusti, and Vidal-Quadras Costa (2021) we show how to incorporate deep learning in the context of Vector Autoregression for macroeconomic data.

References

Altmeyer, Patrick, Marc Agusti, and Ignacio Vidal-Quadras Costa. 2021. “Deep Vector Autoregression for Macroeconomic Data.” https://thevoice.bse.eu/wp-content/uploads/2021/07/ds21-project-agusti-et-al.pdf.

Bholat, D, M Gharbawi, and O Thew. 2020. “The Impact of Covid on Machine Learning and Data Science in UK Banking.” Bank of England Quarterly Bulletin, Q4.

Bussmann, Bart, Jannes Nys, and Steven Latré. 2020. “Neural Additive Vector Autoregression Models for Causal Discovery in Time Series Data.” arXiv Preprint arXiv:2010.09429.

Carlini, Nicholas, and David Wagner. 2017. “Towards Evaluating the Robustness of Neural Networks.” In 2017 Ieee Symposium on Security and Privacy (Sp), 39–57. IEEE.

Gal, Yarin, and Zoubin Ghahramani. 2016. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” In International Conference on Machine Learning, 1050–59. PMLR.

Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. 2017. “Deep Bayesian Active Learning with Image Data.” In International Conference on Machine Learning, 1183–92. PMLR.

Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” arXiv Preprint arXiv:1412.6572.

Ish-Horowicz, Jonathan, Dana Udwin, Seth Flaxman, Sarah Filippi, and Lorin Crawford. 2019. “Interpreting Deep Neural Networks Through Variable Importance.” arXiv Preprint arXiv:1901.09839.

Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” arXiv Preprint arXiv:1907.09615.

Jospin, Laurent Valentin, Wray Buntine, Farid Boussaid, Hamid Laga, and Mohammed Bennamoun. 2020. “Hands-on Bayesian Neural Networks–a Tutorial for Deep Learning Users.” arXiv Preprint arXiv:2007.06823.

Karimi, Amir-Hossein, Bernhard Schölkopf, and Isabel Valera. 2021. “Algorithmic Recourse: From Counterfactual Explanations to Interventions.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 353–62.

Karimi, Amir-Hossein, Julius Von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. “Algorithmic Recourse Under Imperfect Causal Knowledge: A Probabilistic Approach.” arXiv Preprint arXiv:2006.06831.

Kehoe, Aidan, Peter Wittek, Yanbo Xue, and Alejandro Pozas-Kerstjens. 2021. “Defence Against Adversarial Attacks Using Classical and Quantum-Enhanced Boltzmann Machines.” Machine Learning: Science and Technology.

Kilian, Lutz, and Helmut Lütkepohl. 2017. Structural Vector Autoregressive Analysis. Cambridge University Press.

Kirsch, Andreas, Joost Van Amersfoort, and Yarin Gal. 2019. “Batchbald: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning.” Advances in Neural Information Processing Systems 32: 7026–37.

Kuiper, Ouren, Martin van den Berg, Joost van den Burgt, and Stefan Leijnen. 2021. “Exploring Explainable AI in the Financial Sector: Perspectives of Banks and Supervisory Authorities.” arXiv Preprint arXiv:2111.02244.

Lachapelle, Sébastien, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. 2019. “Gradient-Based Neural Dag Learning.” arXiv Preprint arXiv:1906.02226.

Lakkaraju, Himabindu, and Osbert Bastani. 2020. “" How Do i Fool You?" Manipulating User Trust via Misleading Black Box Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79–85.

Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–77.

Mittelstadt, Brent, Chris Russell, and Sandra Wachter. 2019. “Explaining Explanations in AI.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 279–88.

Murphy, Kevin P. 2022. Probabilistic Machine Learning: An Introduction. MIT Press.

OECD. 2021. “Artificial Intelligence, Machine Learning and Big Data in Finance: Opportunities, Challenges and Implications for Policy Makers.” OECD. 2021. https://www.oecd.org/finance/financial-markets/Artificial-intelligence-machine-learning-big-data-in-finance.pdf.

Pearl, Judea. 2019. “The Seven Tools of Causal Inference, with Reflections on Machine Learning.” Communications of the ACM 62 (3): 54–60.

Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.

Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. “FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “"Why Should i Trust You?" Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44.

Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.

Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations by Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.

Slack, Dylan, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. “Fooling Lime and Shap: Adversarial Attacks on Post Hoc Explanation Methods.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180–86.

Szegedy, Christian, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. “Intriguing Properties of Neural Networks.” arXiv Preprint arXiv:1312.6199.

Upadhyay, Sohini, Shalmali Joshi, and Himabindu Lakkaraju. 2021. “Towards Robust and Reliable Algorithmic Recourse.” arXiv Preprint arXiv:2102.13620.

Ustun, Berk, Alexander Spangher, and Yang Liu. 2019. “Actionable Recourse in Linear Classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 10–19.

Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841.

Zheng, Xun, Bryon Aragam, Pradeep Ravikumar, and Eric P Xing. 2018. “Dags with No Tears: Continuous Optimization for Structure Learning.” arXiv Preprint arXiv:1803.01422.