Human-Centered Approaches for Provenance in Automated Data Science (Dagstuhl Seminar 23372)

Crisan, Anamaria; Kotthoff, Lars; Streit, Marc; Xu, Kai

doi:10.4230/DagRep.13.9.116

File

DagRep.13.9.116.pdf

Filesize: 5.66 MB
21 pages

Author Details

Anamaria Crisan

Tableau Software - Seattle, US

Lars Kotthoff

University of Wyoming - Laramie, US

Marc Streit

Johannes Kepler Universität Linz, AT

Kai Xu

University of Nottingham, GB

and all authors of the abstracts in this report

Cite AsGet BibTex

Anamaria Crisan, Lars Kotthoff, Marc Streit, and Kai Xu. Human-Centered Approaches for Provenance in Automated Data Science (Dagstuhl Seminar 23372). In Dagstuhl Reports, Volume 13, Issue 9, pp. 116-136, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/DagRep.13.9.116

Abstract

The scope of automated machine learning (AutoML) technology has extended beyond its initial boundaries of model selection and hyperparameter tuning and towards end-to-end development and refinement of data science pipelines. These advances, both theoretical and realized, make the tools of data science more readily available to domain experts that rely on low- or no-code tooling options to analyze and make sense of their data. To ensure that automated data science technologies are applied both effectively and responsibly, it becomes increasingly urgent to carefully audit the decisions made both automatically and with guidance from humans. This Dagstuhl Seminar examines human-centered approaches for provenance in automated data science. While prior research concerning provenance and machine learning exists, it does not address the expanded scope of automated approaches and the consequences of applying such techniques at scale to the population of domain experts. In addition, most of the previous works focus on the automated part of this process, leaving a gap on the support for the sensemaking tasks users need to perform, such as selecting the datasets and candidate models and identifying potential causes for poor performance. The seminar brought together experts from across provenance, information visualization, visual analytics, machine learning, and human-computer interaction to articulate the user challenges posed by AutoML and automated data science, discuss the current state of the art, and propose directions for new research. More specifically, this seminar: - articulates the state of the art in AutoML and automated data science for supporting the provenance of decision making, - describes the challenges that data scientists and domain experts face when interfacing with automated approaches to make sense of an automated decision, - examines the interface between data-centric, model-centric, and user-centric models of provenance and how they interact with automated techniques, and - encourages exploration of human-centered approaches; for example leveraging visualization.

Subject Classification

ACM Subject Classification

Human-centered computing → Visualization
Information systems → Data provenance
Theory of computation → Mathematical optimization
Human-centered computing → Human computer interaction (HCI)
Computing methodologies → Machine learning
Computing methodologies → Search methodologies

Keywords

Dagstuhl Seminar
Provenance
AutoML
Data Science
Information Visualisation
Visual Analytics
Machine Learning
Human-Computer Interaction

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Human-Centered Approaches for Provenance in Automated Data Science (Dagstuhl Seminar 23372)

Authors Anamaria Crisan, Lars Kotthoff, Marc Streit, Kai Xu and all authors of the abstracts in this report

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics