[PhD topic] Study of explanation methodologies to promote fairness and transparency in deep neural environments

1. Motivation and context

Recent advances in Machine Learning (ML) are mostly due to the success of deep learning methods in recognition and decision aiding tasks. However, deep learning and other subsymbolic [12] ML approaches produce complex models whose outputs cannot be easily interpreted nor explained to the layman. This concern is particularly important since such subsymbolic models are increasingly used in complex tasks such as decision making with a strong impact for human users. This raises multiple concerns regarding user privacy, transparency, fairness, and trustfulness of these models, and which contributed to the 2016 European Union’s GDPR law, that entitles European citizens the right to have a basic knowledge and explanations regarding the inner workings of automated decision models and to question their results. These explanations may be difficult to provide, and they may depend on the ML task being tackled. Thus there is an emerging research trend whose goal is to provide explanations and to ensure fairness in ML models in order to promote transparency and trust in ML and AI.
In the Orpailleur team, we are particularly interested in neuro-symbolic approaches that combine complex subsymbolic models (numerical and/or statistical) with explainable symbolic models and utility metrics, to improve interpretability [18], explainability [3, 9], and fairness [1, 21]. In fact, these notions are many times tightly related. For instance, we showed recently that algorithmic fairness can be addressed by explanations based on feature importance [1, 5]. This led to the tool FixOut  that is being developed in the team, and that uses “feature dropout” followed by “ensemble learning” to improve process fairness without compromising model performance. Despite the promising preliminary results, it only operates on tabular data and it requires human subjective reasoning on several subtasks, e.g., to select sensitive features, to decide suitable explainers, and to define the discriminatory setting. There are many ways to assess algorithmic fairness, for instance, through suitable fairness metrics [17]. This thesis aims to provide a unifying framework capable of identifying a suitable representation of explanations (based on feature importance or rules, and model specific or agnostic) to be extended to other settings such as textual and biomedical decision support applications.

2. Thesis work

We are interested in understanding the different ways of providing explanations and facilitating interpretation of the outputs of subsymbolic ML models [9, 14, 19]. More precisely, we will explore the generation of deep neural networks explanations that are suitable to state of the art neural approaches, but that take into account the human understanding needs (e.g., what is a good and significant explanation for a human). We will continue our efforts to combining symbolic and subsymbolic approaches as in [15, 3] which is based on a combination of subsymbolic learning methods and Formal Concept Analysis [8], as well as other emerging “neural-symbolic learning and reasoning” combinations [2, 22]. Moreover, we will continue to pursue connections between explanations and algorithmic fairness and transparency [10, 9, 5].
Accordingly, the PhD student will be asked to contribute to the development of the FixOut platform. Firstly, the candidate will study the state of the art about the combination of symbolic and subsymbolic learning models for providing explanations and then ensuring fairness based on discovered explanations. He or she will also take into account the compromise between the explanation types and model specificity and the inherent complexity. Secondly, the candidate will carry out experiments on real-world data for practically testing new classifiers and combinations of classifiers, their explanation power and their potential in addressing fairness issues in various empirical tasks and data driven scenarios (see, e.g., [7, 16, 4]).
The expected results of this thesis work will be integrated in the ongoing projects the IPL HyAIAI (where we are investigating different means for integrating knowledge and reasoning in learning processes) and the H2020 Tailor (where we study AI systems that integrate safeguards to make them reliable, transparent, trustworthy, and respectful of human rights and expectations).

3. The organization of the thesis work

Task 1. State of the art of existing explanation methods (both model agnostic and model specific) and their potential for detecting bias, and recent adversarial approaches and metrics for assessing fairness. Proposition of different use case scenarios and formulation of corresponding fairness and utilitarian criteria.

Task 2. Consolidation of the developed explanation systems for different neural architecturesand data types, not only based on feature importance, but also other local rule based explanation methods such as anchors [20]. Explore their adaptability for other complex data types such as sequential and textual data, taking into account recurrent and attention-based models.

Task 3. Bias identification in classification models depending on the context, data and explanation methodology. Proposition of a framework to ensure fairness through optimization and feature dropout based methods.

Task 4. A transfer learning framework for model and process fairness in different empirical scenarios. We are particularly interested to see how certain subjective notions such as “bias” may transfer across different ML settings and tasks.

Task 5. Writing of the doctoral thesis and work dissemination (related research papers).

4. Environment

Team: Orpailleur (LORIA)

Supervision and contacts: Miguel Couceiro and Yannick Toussaint (co-directors), and Amedeo Napoli and Malika Smaïl-Tabbone (co-supervisors).

Keywords: Machine Learning, Data structure, Explainability, Bias, Fairness optimization

Skills and profile: A Master in computer science or equivalent. Elements of machine learning and deep learning algorithms. Knowledge in explainers and multicritreria decision making is appreciated.

Links: https://members.loria.fr/mcouceiro/, https://fixout.loria.fr

References

[1] G. Alves, et al. Making ML models fairer through explanations: the case of LimeOut. In Proc. AIST 2020, Springer (to appear).
[2] A. S. d’Avila Garcez, et al. Neural-symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning. FLAP 6(4): 611-632 (2019)
[3] A. Bazin, et al. Explaining Multicriteria Decision Making with Formal Concept Analysis. In Proc. CLA 2020, vol. 2668 of CEUR Workshop Proc., pages 119–130.
[4] J. Bento, et al. TimeSHAP: Explaining Recurrent Models through Sequence Perturbations. CoRR abs/2012.00073, 2020.
[5] V. Bhargava, et al. LimeOut: An Ensemble Approach To Improve Process Fairness. In Proc. PKDD/ECML Workshop XKDD 2020 pages 475–491.
[6] M. Bilal Zafar,et al. Fairness Constraints: Mechanisms for Fair Classification. In Proc. AISTATS 2017, vol. 54 of PMLR, pages 962–970
[7] B. Dimanov, et al. You Shouldn’t Trust Me: Learning Models Which Conceal Unfairness from Multiple Explanation Methods. In Proc. ECAI 2020, pages 2473–2480.
[8] Benhard Ganter and Rudolf Wille. Formal Concept Analysis – Mathematical Foundations. Springer, 1999.
[9] D. Garreau, et al. Explaining the Explainer: A First Theoretical Analysis of LIME. In Proc. AISTATS 2020, volume 108 PMLR, pages 1287–1296.
[10] N. Grgic-Hlaca, et al. Beyond Distributive Fairness in Algorithmic Decision Making: Feature Selection for Procedurally Fair Learning. In Proc. AAAI 2018, pages 51–60.
[11] N. Grgic-Hlaca, et al. The case for process fairness in learning: Feature selection for fair decision making, In Proc. NIPS Symposium on Machine Learning and the Law, 2016.
[12] E. Ilkou, et al. Symbolic Vs Sub-symbolic AI Methods: Friends or Enemies? In Proc. CIKM 2020 Workshops, volume 2699 of CEUR Workshop Proceedings, 2020.
[13] I. V. D. Linden, et al. Global Aggregations of Local Explanations for Black Box models, CoRR abs/1907.03039, 2019.
[14] S. M. Lundberg, et al. A Unified Approach to Interpreting Model Predictions, In Proc. NIPS 2017, pages 4765–4774.
[15] Dhouha Grissa, Blandine Comte, Estelle Pujos-Guillot, and Amedeo Napoli. A Hybrid Knowledge Discovery Approach for Mining Predictive Biomarkers in Metabolomic Data. In Proc. ECML-PKDD 2016 (Part I), pages 572–587. Springer, 2016.
[16] S. M. Jesus, et al. How can I choose an explainer?: An Application-grounded Evaluation of Post-hoc Explanations. In Proc. FAccT 2021, pages 805–815.
[17] K. Makhlouf et al., On the Applicability of ML Fairness Notions, CoRR abs/2006.16745, 2020.
[18] Ch. Molnar. Interpretable machine learning: A guide for making black box models explainable (2018). https://christophm.github.io/interpretable-ml-book/
[19] M. T. Ribeiro,et al. “why should I trust you?”: Explaining the predictions of any classifier. In Proc. ACM SIGKDD 2016, pages 1135–1144.
[20] M. T. Ribeiro, et al. Anchors: High-precision model-agnostic explanations. In Proc. AAAI 2018, pages 1527–1535.
[21] T. Speicher, et al. A unified approach to quantifying algorithmic unfairness: Measuring individual & group unfairness via inequality indices. In Proc. SIGKDD 2018, pages 2239-–2248.
[22] Son N. Tran and Artur S. d’Avila Garcez. Deep logic networks: Inserting and extracting knowledge from deep belief networks. IEEE Trans. Neural Netw. Learning Syst., 29(2):246–258, 2018.

Logo d'Inria