Human-Centered Approaches for Provenance in Automated Data Science (Dagstuhl Seminar 23372)

Authors Anamaria Crisan, Lars Kotthoff, Marc Streit, Kai Xu and all authors of the abstracts in this report

Thumbnail PDF


  • Filesize: 5.66 MB
  • 21 pages

Document Identifiers

Author Details

Anamaria Crisan
  • Tableau Software - Seattle, US
Lars Kotthoff
  • University of Wyoming - Laramie, US
Marc Streit
  • Johannes Kepler Universität Linz, AT
Kai Xu
  • University of Nottingham, GB
and all authors of the abstracts in this report

Cite AsGet BibTex

Anamaria Crisan, Lars Kotthoff, Marc Streit, and Kai Xu. Human-Centered Approaches for Provenance in Automated Data Science (Dagstuhl Seminar 23372). In Dagstuhl Reports, Volume 13, Issue 9, pp. 116-136, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


The scope of automated machine learning (AutoML) technology has extended beyond its initial boundaries of model selection and hyperparameter tuning and towards end-to-end development and refinement of data science pipelines. These advances, both theoretical and realized, make the tools of data science more readily available to domain experts that rely on low- or no-code tooling options to analyze and make sense of their data. To ensure that automated data science technologies are applied both effectively and responsibly, it becomes increasingly urgent to carefully audit the decisions made both automatically and with guidance from humans. This Dagstuhl Seminar examines human-centered approaches for provenance in automated data science. While prior research concerning provenance and machine learning exists, it does not address the expanded scope of automated approaches and the consequences of applying such techniques at scale to the population of domain experts. In addition, most of the previous works focus on the automated part of this process, leaving a gap on the support for the sensemaking tasks users need to perform, such as selecting the datasets and candidate models and identifying potential causes for poor performance. The seminar brought together experts from across provenance, information visualization, visual analytics, machine learning, and human-computer interaction to articulate the user challenges posed by AutoML and automated data science, discuss the current state of the art, and propose directions for new research. More specifically, this seminar: - articulates the state of the art in AutoML and automated data science for supporting the provenance of decision making, - describes the challenges that data scientists and domain experts face when interfacing with automated approaches to make sense of an automated decision, - examines the interface between data-centric, model-centric, and user-centric models of provenance and how they interact with automated techniques, and - encourages exploration of human-centered approaches; for example leveraging visualization.

Subject Classification

ACM Subject Classification
  • Human-centered computing → Visualization
  • Information systems → Data provenance
  • Theory of computation → Mathematical optimization
  • Human-centered computing → Human computer interaction (HCI)
  • Computing methodologies → Machine learning
  • Computing methodologies → Search methodologies
  • Dagstuhl Seminar
  • Provenance
  • AutoML
  • Data Science
  • Information Visualisation
  • Visual Analytics
  • Machine Learning
  • Human-Computer Interaction


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail