A View on Vulnerabilites: The Security Challenges of XAI (Academic Track)

Authors Elisabeth Pachl , Fabian Langer, Thora Markert, Jeanette Miriam Lorenz



PDF
Thumbnail PDF

File

OASIcs.SAIA.2024.12.pdf
  • Filesize: 0.97 MB
  • 23 pages

Document Identifiers

Author Details

Elisabeth Pachl
  • Fraunhofer Institute for Cognitive Systems, Munich, Germany
Fabian Langer
  • TÜV Informationstechnik GmbH, Artificial Intelligence, Essen, Germany
Thora Markert
  • TÜV Informationstechnik GmbH, Artificial Intelligence, Essen, Germany
Jeanette Miriam Lorenz
  • Fraunhofer Institute for Cognitive Systems, Munich, Germany

Cite As Get BibTex

Elisabeth Pachl, Fabian Langer, Thora Markert, and Jeanette Miriam Lorenz. A View on Vulnerabilites: The Security Challenges of XAI (Academic Track). In Symposium on Scaling AI Assessments (SAIA 2024). Open Access Series in Informatics (OASIcs), Volume 126, pp. 12:1-12:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/OASIcs.SAIA.2024.12

Abstract

Modern deep learning methods have long been considered as black-boxes due to their opaque decision-making processes. Explainable Artificial Intelligence (XAI), however, has turned the tables: it provides insight into how these models work, promoting transparency that is crucial for accountability. Yet, recent developments in adversarial machine learning have highlighted vulnerabilities in XAI methods, raising concerns about security, reliability and trustworthiness, particularly in sensitive areas like healthcare and autonomous systems. Awareness of the potential risks associated with XAI is needed as its adoption increases, driven in part by the need to enhance compliance to regulations. This survey provides a holistic perspective on the security and safety landscape surrounding XAI, categorizing research on adversarial attacks against XAI and the misuse of explainability to enhance attacks on AI systems, such as evasion and privacy breaches. Our contribution includes identifying current insecurities in XAI and outlining future research directions in adversarial XAI. This work serves as an accessible foundation and outlook to recognize potential research gaps and define future directions. It identifies data modalities, such as time-series or graph data, and XAI methods that have not been extensively investigated for vulnerabilities in current research.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Machine learning
Keywords
  • Explainability
  • XAI
  • Transparency
  • Adversarial Machine Learning
  • Security
  • Vulnerabilities

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. E. Abdukhamidov, M. Abuhamad, F. Juraev, E. Chan-Tin, and T. AbuHmed. AdvEdge: Optimizing Adversarial Perturbations Against Interpretable Deep Learning. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 13116 LNCS, pages 93-105, 2021. URL: https://doi.org/10.1007/978-3-030-91434-9_9.
  2. E. Abdukhamidov, F. Juraev, M. Abuhamad, and T. Abuhmed. Black-box and Target-specific Attack Against Interpretable Deep Learning Systems. In ASIA CCS 2022 - Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security, pages 1216-1218, 2022. URL: https://doi.org/10.1145/3488932.3527283.
  3. Amina Adadi and Mohammed Berrada. Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE access, 6:52138-52160, 2018. URL: https://doi.org/10.1109/ACCESS.2018.2870052.
  4. U. Aïvodji, H. Arai, O. Fortineau, S. Gambs, S. Hara, and A. Tapp. Fairwashing: The risk of rationalization. In 36th International Conference on Machine Learning, ICML 2019, volume 2019-June, pages 240-252, 2019. Google Scholar
  5. Ulrich Aïvodji, Alexandre Bolot, and Sébastien Gambs. Model extraction from counterfactual explanations, September 2020. https://arxiv.org/abs/2009.01884, URL: https://doi.org/10.48550/arXiv.2009.01884.
  6. H. Ali, M.S. Khan, A. Al-Fuqaha, and J. Qadir. Tamp-X: Attacking explainable natural language classifiers through tampered activations. Computers and Security, 120, 2022. URL: https://doi.org/10.1016/j.cose.2022.102791.
  7. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang. Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998, 2018. URL: https://arxiv.org/abs/1804.07998.
  8. Abderrahmen Amich and Birhanu Eshete. EG-Booster: Explanation-Guided Booster of ML Evasion Attacks. In Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, CODASPY '22, pages 16-28, New York, NY, USA, April 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3508398.3511510.
  9. C.J. Anders, P. Pasliev, A.-K. Dombrowski, K.-R. Muller, and P. Kessel. Fairwashing explanations with off-manifold detergent. In 37th International Conference on Machine Learning, ICML 2020, volume PartF168147-1, pages 291-300, 2020. Google Scholar
  10. Vijay Arya, Rachel KE Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, et al. One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. arXiv preprint arXiv:1909.03012, 2019. Google Scholar
  11. Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015. Google Scholar
  12. Vincent Ballet, Xavier Renard, Jonathan Aigrain, Thibault Laugel, Pascal Frossard, and Marcin Detyniecki. Imperceptible adversarial attacks on tabular data. arXiv preprint arXiv:1911.03274, 2019. URL: https://arxiv.org/abs/1911.03274.
  13. H. Baniecki and P. Biecek. Manipulating SHAP via Adversarial Data Perturbations (Student Abstract). In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, volume 36, pages 12907-12908, 2022. Google Scholar
  14. H. Baniecki, W. Kretowicz, and P. Biecek. Fooling Partial Dependence via Data Poisoning. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 13715 LNAI, pages 121-136, 2023. URL: https://doi.org/10.1007/978-3-031-26409-2_8.
  15. Hubert Baniecki and Przemyslaw Biecek. Adversarial attacks and defenses in explainable artificial intelligence: A survey. Information Fusion, page 102303, 2024. URL: https://doi.org/10.1016/J.INFFUS.2024.102303.
  16. Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, pages 387-402. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-40994-3_25.
  17. Clara Bove, Thibault Laugel, Marie-Jeanne Lesot, Charles Tijus, and Marcin Detyniecki. Why do explanations fail? a typology and discussion on failures in xai. arXiv preprint arXiv:2405.13474, 2024. URL: https://doi.org/10.48550/arXiv.2405.13474.
  18. D. Brown and H. Kvinge. Making Corgis Important for Honeycomb Classification: Adversarial Attacks on Concept-based Explainability Tools. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, volume 2023-June, pages 620-627, 2023. URL: https://doi.org/10.1109/CVPRW59228.2023.00069.
  19. Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. arXiv preprint arXiv:1712.09665, 2017. URL: https://arxiv.org/abs/1712.09665.
  20. Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39-57. Ieee, 2017. URL: https://doi.org/10.1109/SP.2017.49.
  21. Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. Machine learning interpretability: A survey on methods and metrics. Electronics, 8(8):832, 2019. Google Scholar
  22. Y. Chai, R. Liang, S. Samtani, H. Zhu, M. Wang, Y. Liu, and Y. Jiang. Additive Feature Attribution Explainable Methods to Craft Adversarial Attacks for Text Classification and Text Regression. IEEE Transactions on Knowledge and Data Engineering, pages 1-14, 2023. URL: https://doi.org/10.1109/TKDE.2023.3270581.
  23. Fabien Charmet, Harry Chandra Tanuwidjaja, Solayman Ayoubi, Pierre-François Gimenez, Yufei Han, Houda Jmila, Gregory Blanc, Takeshi Takahashi, and Zonghua Zhang. Explainable artificial intelligence for cybersecurity: a literature survey. Annals of Telecommunications, 77(11):789-812, 2022. URL: https://doi.org/10.1007/S12243-022-00926-7.
  24. L. Chen, N. Yan, B. Zhang, Z. Wang, Y. Wen, and Y. Hu. A General Backdoor Attack to Graph Neural Networks Based on Explanation Method. In Proceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022, pages 759-768, 2022. URL: https://doi.org/10.1109/TrustCom56396.2022.00107.
  25. Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017. URL: https://arxiv.org/abs/1712.05526.
  26. Yu-Neng Chuang, Guanchu Wang, Fan Yang, Zirui Liu, Xuanting Cai, Mengnan Du, and Xia Hu. Efficient xai techniques: A taxonomic survey. arXiv preprint arXiv:2302.03225, 2023. Google Scholar
  27. Antonio Emanuele Cinà, Kathrin Grosse, Ambra Demontis, Sebastiano Vascon, Werner Zellinger, Bernhard A Moser, Alina Oprea, Battista Biggio, Marcello Pelillo, and Fabio Roli. Wild patterns reloaded: A survey of machine learning security against training data poisoning. ACM Computing Surveys, 55(13s):1-39, 2023. URL: https://doi.org/10.1145/3585385.
  28. Microsoft Corporation. Microsoft responsible ai standard, v2, 2022. accessed 29 July 2024. URL: https://blogs.microsoft.com/wp-content/uploads/prod/sites/5/2022/06/Microsoft-Responsible-AI-Standard-v2-General-Requirements-3.pdf.
  29. E.J. De Aguiar, M.V.L. Costa, C. Traina, and A.J.M. Traina. Assessing Vulnerabilities of Deep Learning Explainability in Medical Image Analysis under Adversarial Settings. In Proceedings - IEEE Symposium on Computer-Based Medical Systems, volume 2023-June, pages 13-16, 2023. URL: https://doi.org/10.1109/CBMS58004.2023.00184.
  30. Botty Dimanov, Umang Bhatt, Mateja Jamnik, and Adrian Weller. You Shouldn't Trust Me: Learning Models Which Conceal Unfairness From Multiple Explanation Methods. In Huáscar Espinoza, José Hernández-Orallo, Xin Cynthia Chen, Seán S. ÓhÉigeartaigh, Xiaowei Huang, Mauricio Castillo-Effen, Richard Mallah, and John A. McDermid, editors, Proceedings of the Workshop on Artificial Intelligence Safety, Co-Located with 34th AAAI Conference on Artificial Intelligence, SafeAI@AAAI 2020, New York City, NY, USA, February 7, 2020, volume 2560 of CEUR Workshop Proceedings, pages 63-73. CEUR-WS.org, 2020. URL: https://ceur-ws.org/Vol-2560/paper8.pdf.
  31. Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. Explanations can be manipulated and geometry is to blame. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. Google Scholar
  32. V. Duddu and A. Boutet. Inferring Sensitive Attributes from Model Explanations. In International Conference on Information and Knowledge Management, Proceedings, pages 416-425, 2022. URL: https://doi.org/10.1145/3511808.3557362.
  33. Julien Ferry, Ulrich Aïvodji, Sébastien Gambs, Marie-José Huguet, and Mohamed Siala. Sok: Taming the triangle-on the interplays between fairness, interpretability and privacy in machine learning. arXiv preprint arXiv:2312.16191, 2023. URL: https://doi.org/10.48550/arXiv.2312.16191.
  34. A. Galli, S. Marrone, V. Moscato, and C. Sansone. Reliability of eXplainable Artificial Intelligence in Adversarial Perturbation Scenarios. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 12663 LNCS, pages 243-256, 2021. URL: https://doi.org/10.1007/978-3-030-68796-0_18.
  35. A. Ghorbani, A. Abid, and J. Zou. Interpretation of neural networks is fragile. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, pages 3681-3688, 2019. Google Scholar
  36. Sofie Goethals, Kenneth Sörensen, and David Martens. The Privacy Issue of Counterfactual Explanations: Explanation Linkage Attacks. ACM Transactions on Intelligent Systems and Technology, 14(5):83:1-83:24, August 2023. URL: https://doi.org/10.1145/3608482.
  37. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. Google Scholar
  38. Jan Philip Göpfert, Heiko Wersing, and Barbara Hammer. Recovering localized adversarial attacks. In Artificial Neural Networks and Machine Learning-ICANN 2019: Theoretical Neural Computation: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17-19, 2019, Proceedings, Part I 28, pages 302-311. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-30487-4_24.
  39. Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068, 2014. Google Scholar
  40. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):1-42, 2018. URL: https://doi.org/10.1145/3236009.
  41. S. Guo, S. Geng, T. Xiang, H. Liu, and R. Hou. ELAA: An efficient local adversarial attack using model interpreters. International Journal of Intelligent Systems, 37(12):10598-10620, 2022. URL: https://doi.org/10.1002/int.22680.
  42. S.S. Hada, M.Á. Carreira-Perpiñán, and A. Zharmagambetov. Sparse oblique decision trees: A tool to understand and manipulate neural net features. Data Mining and Knowledge Discovery, 2023. URL: https://doi.org/10.1007/s10618-022-00892-7.
  43. Lena Heidemann, Maureen Monnet, and Karsten Roscher. Concept correlation and its effects on concept-based models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4780-4788, 2023. Google Scholar
  44. J. Heo, S. Joo, and T. Moon. Fooling neural network interpretations via adversarial model manipulation. In Advances in Neural Information Processing Systems, volume 32, 2019. Google Scholar
  45. K. Hoedt, V. Praher, A. Flexer, and G. Widmer. Constructing adversarial examples to investigate the plausibility of explanations in deep audio and image classifiers. Neural Computing and Applications, 35(14):10011-10029, 2023. URL: https://doi.org/10.1007/s00521-022-07918-7.
  46. Andreas Holzinger, Anna Saranti, Christoph Molnar, Przemyslaw Biecek, and Wojciech Samek. Explainable ai methods-a brief overview. In International workshop on extending explainable AI beyond deep models and classifiers, pages 13-38. Springer, 2022. Google Scholar
  47. Weronika Hryniewska, Przemysław Bombiński, Patryk Szatkowski, Paulina Tomaszewska, Artur Przelaskowski, and Przemysław Biecek. Checklist for responsible deep learning modeling of medical images based on covid-19 detection studies. Pattern Recognition, 118:108035, 2021. URL: https://doi.org/10.1016/J.PATCOG.2021.108035.
  48. Q. Huang, L. Chiang, M. Chiu, and H. Sun. Focus-Shifting Attack: An Adversarial Attack That Retains Saliency Map Information and Manipulates Model Explanations. IEEE Transactions on Reliability, pages 1-12, 2023. URL: https://doi.org/10.1109/TR.2023.3303923.
  49. International Standardization Organization (ISO). Iso/iec 22989:2022 artificial intelligence concepts and terminology, 2022. Google Scholar
  50. A. Ivankay, I. Girardi, C. Marchiori, and P. Frossard. FOOLING EXPLANATIONS IN TEXT CLASSIFIERS. In ICLR 2022 - 10th International Conference on Learning Representations, 2022. Google Scholar
  51. Rahul Iyer, Yuezhang Li, Huao Li, Michael Lewis, Ramitha Sundar, and Katia Sycara. Transparency and explanation in deep reinforcement learning neural networks. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 144-150, 2018. URL: https://doi.org/10.1145/3278721.3278776.
  52. José Jiménez-Luna, Francesca Grisoni, and Gisbert Schneider. Drug discovery with explainable artificial intelligence. Nature Machine Intelligence, 2(10):573-584, 2020. URL: https://doi.org/10.1038/S42256-020-00236-4.
  53. H. Jing, C. Meng, X. He, and W. Wei. Black Box Explanation Guided Decision-Based Adversarial Attacks. In 2019 IEEE 5th International Conference on Computer and Communications, ICCC 2019, pages 1592-1596, 2019. URL: https://doi.org/10.1109/ICCC47050.2019.9064243.
  54. Margot E Kaminski. The right to explanation, explained (june 15, 2018). university of colorado law legal studies research paper no. 18-24. Berkeley Technology Law Journal, 34(1), 2019. Google Scholar
  55. Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (Un)reliability of Saliency Methods. In Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller, editors, Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Lecture Notes in Computer Science, pages 267-280. Springer International Publishing, Cham, 2019. URL: https://doi.org/10.1007/978-3-030-28954-6_14.
  56. Zico Kolter and Aleksander Madry. Adversarial robustness: Theory and practice. Tutorial at NeurIPS, page 3, 2018. Google Scholar
  57. Satyapriya Krishna, Jiaqi Ma, and Himabindu Lakkaraju. Towards bridging the gaps between the right to explanation and the right to be forgotten. In International Conference on Machine Learning, pages 17808-17826. PMLR, 2023. URL: https://proceedings.mlr.press/v202/krishna23a.html.
  58. A. Kuppa and N.-A. Le-Khac. Black Box Attacks on Explainable Artificial Intelligence(XAI) methods in Cyber Security. In Proceedings of the International Joint Conference on Neural Networks, 2020. URL: https://doi.org/10.1109/IJCNN48605.2020.9206780.
  59. Gabriel Laberge, Ulrich Aïvodji, Satoshi Hara, Mario Marchand, and Foutse Khomh. Fool SHAP with Stealthily Biased Sampling. In International Conference on Learning Representations (ICLR), Kigali, Rwanda, May 2023. Google Scholar
  60. H. Lakkaraju and O. Bastani. "how do i fool you?": Manipulating user trust via misleading black box explanations. In AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 79-85, 2020. URL: https://doi.org/10.1145/3375627.3375833.
  61. Thi-Thu-Huong Le, Hyoeun Kang, and Howon Kim. Robust Adversarial Attack Against Explainable Deep Classification Models Based on Adversarial Images With Different Patch Sizes and Perturbation Ratios. IEEE Access, 9:133049-133061, 2021. URL: https://doi.org/10.1109/ACCESS.2021.3115764.
  62. Erwan Le Merrer and Gilles Trédan. Remote explainability faces the bouncer problem. Nature Machine Intelligence, 2(9):529-539, September 2020. URL: https://doi.org/10.1038/s42256-020-0216-z.
  63. Yiqiao Li, Sunny Verma, Shuiqiao Yang, Jianlong Zhou, and Fang Chen. Are graph neural network explainers robust to graph noises? In Australasian Joint Conference on Artificial Intelligence, pages 161-174. Springer, 2022. URL: https://doi.org/10.1007/978-3-031-22695-3_12.
  64. Haohan Liu, Xingquan Zuo, Hai Huang, and Xing Wan. Saliency map-based local white-box adversarial attack against deep neural networks. In CAAI International Conference on Artificial Intelligence, pages 3-14. Springer, 2022. URL: https://doi.org/10.1007/978-3-031-20500-2_1.
  65. Mingting Liu, Xiaozhang Liu, Anli Yan, Yuan Qi, and Wei Li. Explanation-Guided Minimum Adversarial Attack. In Yuan Xu, Hongyang Yan, Huang Teng, Jun Cai, and Jin Li, editors, Machine Learning for Cyber Security, Lecture Notes in Computer Science, pages 257-270, Cham, 2023. Springer Nature Switzerland. URL: https://doi.org/10.1007/978-3-031-20096-0_20.
  66. Ninghao Liu, Mengnan Du, Ruocheng Guo, Huan Liu, and Xia Hu. Adversarial attacks and defenses: An interpretation perspective. ACM SIGKDD Explorations Newsletter, 23(1):86-99, 2021. URL: https://doi.org/10.1145/3468507.3468519.
  67. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Google Scholar
  68. Daniel D Lundstrom, Tianjian Huang, and Meisam Razaviyayn. A rigorous study of integrated gradients method and extensions to internal neuron attributions. In International Conference on Machine Learning, pages 14485-14508. PMLR, 2022. URL: https://proceedings.mlr.press/v162/lundstrom22a.html.
  69. X. Luo, Y. Jiang, and X. Xiao. Feature Inference Attack on Shapley Values. In Proceedings of the ACM Conference on Computer and Communications Security, pages 2233-2247, 2022. URL: https://doi.org/10.1145/3548606.3560573.
  70. Gabriel Resende Machado, Eugênio Silva, and Ronaldo Ribeiro Goldschmidt. Adversarial machine learning in image classification: A survey toward the defender’s perspective. ACM Computing Surveys (CSUR), 55(1):1-38, 2021. Google Scholar
  71. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. URL: https://arxiv.org/abs/1706.06083.
  72. Smitha Milli, Ludwig Schmidt, Anca D. Dragan, and Moritz Hardt. Model Reconstruction from Model Explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* '19, pages 1-9, New York, NY, USA, January 2019. Association for Computing Machinery. URL: https://doi.org/10.1145/3287560.3287562.
  73. Saumitra Mishra, Sanghamitra Dutta, Jason Long, and Daniele Magazzeni. A survey on the robustness of feature importance and counterfactual explanations. arXiv preprint arXiv:2111.00358, 2021. URL: https://arxiv.org/abs/2111.00358.
  74. Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Thanh Toan Nguyen, Phi Le Nguyen, Hongzhi Yin, and Quoc Viet Hung Nguyen. A survey of privacy-preserving model explanations: Privacy risks, attacks, and countermeasures. arXiv preprint arXiv:2404.00673, 2024. URL: https://doi.org/10.48550/arXiv.2404.00673.
  75. Truc Nguyen, Phung Lai, Hai Phan, and My T Thai. Xrand: Differentially private defense against explanation-guided attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 11873-11881, 2023. URL: https://doi.org/10.1609/AAAI.V37I10.26401.
  76. Maximilian Noppel, Lukas Peter, and Christian Wressnegger. Disguising Attacks with Explanation-Aware Backdoors. In 2023 IEEE Symposium on Security and Privacy (SP), page 664, 2023. URL: https://doi.org/10.1109/SP46215.2023.10179308.
  77. Maximilian Noppel and Christian Wressnegger. Sok: Explainable machine learning in adversarial environments. In 2024 IEEE Symposium on Security and Privacy (SP), pages 21-21. IEEE Computer Society, 2023. Google Scholar
  78. Daryna Oliynyk, Rudolf Mayer, and Andreas Rauber. I know what you trained last summer: A survey on stealing machine learning models and defences. ACM Computing Surveys, 55(14s):1-41, 2023. URL: https://doi.org/10.1145/3595292.
  79. M.A. Pandya, P.C. Siddalingaswamy, and S. Singh. Explainability of Image Classifiers for Targeted Adversarial Attack. In INDICON 2022 - 2022 IEEE 19th India Council International Conference, 2022. URL: https://doi.org/10.1109/INDICON56171.2022.10039871.
  80. Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael P Wellman. Sok: Security and privacy in machine learning. In 2018 IEEE European symposium on security and privacy (EuroS&P), pages 399-414. IEEE, 2018. URL: https://doi.org/10.1109/EUROSP.2018.00035.
  81. Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582-597. IEEE, 2016. URL: https://doi.org/10.1109/SP.2016.41.
  82. The European Parliament and The Council of the European Union. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation), 2016. Google Scholar
  83. The European Parliament and The Council of the European Union. Artificial intelligence act, 2024. accessed 29 July 2024. URL: https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138-FNL-COR01_EN.pdf.
  84. Martin Pawelczyk, Himabindu Lakkaraju, and Seth Neel. On the Privacy Risks of Algorithmic Recourse. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 9680-9696. PMLR, April 2023. URL: https://proceedings.mlr.press/v206/pawelczyk23a.html.
  85. H. Rasaee and H. Rivaz. Explainable AI and susceptibility to adversarial attacks: A case study in classification of breast ultrasound images. In IEEE International Ultrasonics Symposium, IUS, 2021. URL: https://doi.org/10.1109/IUS52206.2021.9593490.
  86. J. Renkhoff, W. Tan, A. Velasquez, W.Y. Wang, Y. Liu, J. Wang, S. Niu, L.B. Fazlic, G. Dartmann, and H. Song. Exploring Adversarial Attacks on Neural Networks: An Explainable Approach. In Conference Proceedings of the IEEE International Performance, Computing, and Communications Conference, volume 2022-November, pages 41-42, 2022. URL: https://doi.org/10.1109/IPCCC55026.2022.9894322.
  87. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135-1144, 2016. Google Scholar
  88. Maria Rigaki and Sebastian Garcia. A survey of privacy attacks in machine learning. ACM Computing Surveys, 56(4):1-34, 2023. URL: https://doi.org/10.1145/3624010.
  89. Ishai Rosenberg, Asaf Shabtai, Yuval Elovici, and Lior Rokach. Adversarial machine learning attacks and defense methods in the cyber security domain. ACM Computing Surveys (CSUR), 54(5):1-36, 2021. URL: https://doi.org/10.1145/3453158.
  90. Barbara Rychalska, Dominika Basaj, Alicja Gosiewska, and Przemysław Biecek. Models in the wild: On corruption robustness of neural nlp systems. In Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12-15, 2019, Proceedings, Part III 26, pages 235-247. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-36718-3_20.
  91. Maresa Schröder, Alireza Zamanian, and Narges Ahmidi. Post-hoc saliency methods fail to capture latent feature importance in time series data. In International Workshop on Trustworthy Machine Learning for Healthcare, pages 106-121. Springer, 2023. URL: https://doi.org/10.1007/978-3-031-39539-0_10.
  92. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1-59, 2023. Google Scholar
  93. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618-626, 2017. URL: https://doi.org/10.1109/ICCV.2017.74.
  94. G. Severi, J. Meyer, S. Coull, and A. Oprea. Explanation-guided backdoor poisoning attacks against malware classifiers. In Proceedings of the 30th USENIX Security Symposium, pages 1487-1504, 2021. Google Scholar
  95. Rucha Shinde, Shruti Patil, Ketan Kotecha, Vidyasagar Potdar, Ganeshsree Selvachandran, and Ajith Abraham. Securing ai-based healthcare systems using blockchain technology: A state-of-the-art systematic literature review and future research directions. Transactions on Emerging Telecommunications Technologies, 2024. Google Scholar
  96. R. Shokri, M. Strobel, and Y. Zick. On the Privacy Risks of Model Explanations. In AIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 231-241, 2021. URL: https://doi.org/10.1145/3461702.3462533.
  97. N. Si, H. Chang, and Y. Li. A Simple and Effective Method to Defend Against Saliency Map Attack. In ACM International Conference Proceeding Series, 2021. URL: https://doi.org/10.1145/3474198.3478141.
  98. Sanchit Sinha, Hanjie Chen, Arshdeep Sekhon, Yangfeng Ji, and Yanjun Qi. Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 420-434, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2021.blackboxnlp-1.33.
  99. D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In AIES 2020 - Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pages 180-186, 2020. URL: https://doi.org/10.1145/3375627.3375830.
  100. D. Slack, S. Hilgard, H. Lakkaraju, and S. Singh. Counterfactual Explanations Can Be Manipulated. In Advances in Neural Information Processing Systems, volume 1, pages 62-75, 2021. Google Scholar
  101. Qianqian Song, Xiangwei Kong, and Ziming Wang. Fooling Neural Network Interpretations: Adversarial Noise to Attack Images. In Lu Fang, Yiran Chen, Guangtao Zhai, Jane Wang, Ruiping Wang, and Weisheng Dong, editors, Artificial Intelligence, Lecture Notes in Computer Science, pages 39-51, Cham, 2021. Springer International Publishing. URL: https://doi.org/10.1007/978-3-030-93049-3_4.
  102. Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5):828-841, 2019. URL: https://doi.org/10.1109/TEVC.2019.2890858.
  103. A. Subramanya, V. Pillai, and H. Pirsiavash. Fooling network interpretation in image classification. In Proceedings of the IEEE International Conference on Computer Vision, volume 2019-October, pages 2020-2029, 2019. URL: https://doi.org/10.1109/ICCV.2019.00211.
  104. T. Viering, Ziqi Wang, M. Loog, and E. Eisemann. How to Manipulate CNNs to Make Them Lie: The GradCAM Case. ArXiv, July 2019. Google Scholar
  105. Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017. Google Scholar
  106. Y. Wang, H. Qian, and C. Miao. DualCF: Efficient Model Extraction Attack from Counterfactual Explanations. In ACM International Conference Proceeding Series, pages 1318-1329, 2022. URL: https://doi.org/10.1145/3531146.3533188.
  107. J. Xu and Q. Du. Adversarial attacks on text classification models using layer-wise relevance propagation. International Journal of Intelligent Systems, 35(9):1397-1415, 2020. URL: https://doi.org/10.1002/int.22260.
  108. J. Xu, M. Xue, and S. Picek. Explainability-based Backdoor Attacks against Graph Neural Networks. In WiseML 2021 - Proceedings of the 3rd ACM Workshop on Wireless Security and Machine Learning, pages 31-36, 2021. URL: https://doi.org/10.1145/3468218.3469046.
  109. A. Yan, R. Hou, X. Liu, H. Yan, T. Huang, and X. Wang. Towards explainable model extraction attacks. International Journal of Intelligent Systems, 37(11):9936-9956, 2022. URL: https://doi.org/10.1002/int.23022.
  110. A. Yan, R. Hou, H. Yan, and X. Liu. Explanation-based data-free model extraction attacks. World Wide Web, 26(5):3081-3092, 2023. URL: https://doi.org/10.1007/s11280-023-01150-6.
  111. A. Yan, T. Huang, L. Ke, X. Liu, Q. Chen, and C. Dong. Explanation leaks: Explanation-guided model extraction attacks. Information Sciences, 632:269-284, 2023. URL: https://doi.org/10.1016/j.ins.2023.03.020.
  112. Y. Zhan, B. Zheng, Q. Wang, N. Mou, B. Guo, Q. Li, C. Shen, and C. Wang. Towards Black-Box Adversarial Attacks on Interpretable Deep Learning Systems. In Proceedings - IEEE International Conference on Multimedia and Expo, volume 2022-July, 2022. URL: https://doi.org/10.1109/ICME52920.2022.9859856.
  113. H. Zhang, J. Gao, and L. Su. Data Poisoning Attacks against Outcome Interpretations of Predictive Models. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2165-2173, 2021. URL: https://doi.org/10.1145/3447548.3467405.
  114. X. Zhang, N. Wang, H. Shen, S. Ji, X. Luo, and T. Wang. Interpretable deep learning under fire. In Proceedings of the 29th USENIX Security Symposium, pages 1659-1676, 2020. Google Scholar
  115. Yu Zhang, Peter Tiňo, Aleš Leonardis, and Ke Tang. A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5):726-742, 2021. URL: https://doi.org/10.1109/TETCI.2021.3100641.
  116. Zhibo Zhang, Hussam Al Hamadi, Ernesto Damiani, Chan Yeob Yeun, and Fatma Taher. Explainable artificial intelligence applications in cyber security: State-of-the-art in research. IEEE Access, 10:93104-93139, 2022. URL: https://doi.org/10.1109/ACCESS.2022.3204051.
  117. X. Zhao, W. Zhang, X. Xiao, and B. Lim. Exploiting Explanations for Model Inversion Attacks. In Proceedings of the IEEE International Conference on Computer Vision, pages 662-672, 2021. URL: https://doi.org/10.1109/ICCV48922.2021.00072.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail