Abstract 1 Executive Summary 2 Table of Contents 3 Overview of Talks 4 Working groups 5 Open problems 6 Participants

Artificial Intelligence and Formal Methods Join Forces for Reliable Autonomy

Report from Dagstuhl Seminar 24361
Nils Jansen111Editor / Organizer Ruhr-Universität Bochum, DE Mykel Kochenderfer222Editor / Organizer Stanford University, US Jan Kretinsky333Editor / Organizer Masaryk University – Brno, CZ
Jana Tumova444Editor / Organizer
KTH Royal Institute of Technology – Stockholm, SE
Maris Galesloot555Editorial Assistant / Collector Radboud University – Nijmegen, NL
Abstract

This report documents the program and outcomes of Dagstuhl Seminar 24361, “Artificial Intelligence and Formal Methods Join Forces for Reliable Autonomy.” AI is a disruptive force with growing applications in everyday life. Therefore, AI systems require serious safety, correctness, and reliability considerations. Recently, the field of safety in AI has triggered a vast amount of research. This seminar brought together experts from the fields of artificial intelligence, formal methods, and robotics. Via a diverse program with ample space for open yet guided discussion, a common understanding of problems was developed. Consequently, the seminar provided a means to identify key challenges and open problems in the research areas that underpin reliable autonomy.

Keywords and phrases:
artificial intelligence, autonomous systems, formal verification, machine learning, robotics
Seminar:
September 1–6, 2024 – https://www.dagstuhl.de/24361
2012 ACM Subject Classification:
Computing methodologies Artificial intelligence
Funding:
Partially supported by the ERC Starting Grant 101077178 (DEUCE).
Copyright and License:
[Uncaptioned image] Except where otherwise noted, content of this report is licensed under a Creative Commons BY 4.0 International license

1 Executive Summary

Nils Jansen (Ruhr-Universität Bochum – DE)
Mykel Kochenderfer (Stanford University, US)
Jan Kretinsky (Masaryk University – Brno, CZ)
Jana Tumova (KTH Royal Institute of Technology – Stockholm, SE)
Maris Galesloot (Radboud University – Nijmegen, NL)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Nils Jansen, Mykel Kochenderfer, Jan Kretinsky, Jana Tumova, and Maris Galesloot

Introduction

Artificial intelligence (AI) is a disruptive force. With growing applications in fields like healthcare, transportation, game playing, finance, and robotics, AI systems and methods are entering our everyday lives. Such tight interaction with AI requires serious safety, correctness, and reliability considerations. Recently, the field of safety in AI has triggered a vast amount of research.

The area of formal methods (FM) offers structured and rigorous ways to reason about the correctness of a system. Techniques range from model learning, over testing to formal verification. As an example of the application of verification in AI, solving techniques like SAT or SMT solving help to assess the robustness of neural networks. Model checking is a prominent verification technique that proves the system’s correctness with respect to formal specifications. Therefore, it provides ample grounds to address the safety concerns of and provide guarantees in AI systems.

Via a diverse program with ample space for open yet guided discussion, this Dagstuhl Seminar brought together experts from the fields of artificial intelligence, formal methods, and robotics. The predecessor was the Dagstuhl Seminar 18121 Machine Learning and Model Checking Join Forces, on the interaction of AI (specifically machine learning) and FM (specifically model checking), mainly from the theoretical perspective. In contrast, this seminar broadened the scope methodologically, including AI and FM generally, but focused more narrowly on the application of reliable robotics in real-world robotics, which then drives the concrete questions towards increased applicability of combined research in AI and FM.

Formal Methods, AI and Robotics

Formal methods (FM) and AI differ not only in the type of problems they tackle but also in their focus on the desirable properties of the methods. While AI focuses primarily on effectiveness and efficiency, formal methods focus on reliability and safety. Obviously, in both cases, the other properties are not negligible for actual real applications. Consequently, over the last decade, achieving the best of both worlds has become even more important. On the one hand, ML experienced a huge boom accompanied by challenges orthogonal to the tradition, such as provable reliability or explainability; on the other hand, FM was transformed by this process, with AI becoming both an object of FM and a useful technique to be used within FM.

In particular, the problem of controlling complex systems is so fundamental that many approaches have evolved independently in several communities, depending on the particular setting and type of the system, e.g., in artificial intelligence, formal methods, control theory, or robotics. Recently, many of the specifics such as uncertainty, safety-criticality, cyber-physical nature, or complex dynamics (combining continuous and discrete aspects) are present all at once, such as in robots. This rich combination of challenges calls for combining the techniques of the respective disciplines in fundamentally new ways so that real complex systems can be efficiently and reliably controlled and deployed in society.

In addition to the challenges related to integrating AI and formal methods in general, several application-specific challenges arise, reflecting that robots are physical systems: Robot models are highly complex, and often, it is not even possible to derive or use an analytic model (e.g., in contact-rich manipulation). Consequently, the dimensionality and the degree of uncertainty in these systems are very high. These challenges were the inspiration for the topics discussed in this seminar.

Seminar Structure

Tutorials

The first half of the week aimed at creating a mutual theoretical understanding and an interdisciplinary view on the topics of the seminar. To this end, the seminar had five plenary tutorials, each focusing on a different aspect of the subject areas relevant to the seminar. In particular, the following topics were presented:

  • Model-Based Reinforcement Learning, by Wendelin Böhmer,

  • An Overview of Probabilistic Verification Techniques, by Sebastian Junges,

  • Formal Methods & Verification in Robotics & Autonomous Systems Engineering @ Bosch, by Michaela Klauck,

  • Robot Learning, by Georgia Chalvatzaki, and,

  • Conformal Prediction, by Lars Lindemann.

Working Groups

In the latter half of the week, there was ample time for focused discussions among participants in breakout sessions. These sessions had various aims, such as further increasing interdisciplinary understanding of the relevant concepts, discussing new research ideas, and identifying open problems. The working groups (breakout rooms) consisted of the following topics, from which a subset is elaborated in more detail in Section 4:

  • Human Models: Modelling Humans Towards AI Decision-Making,

  • Models and Programmatic Reinforcement Learning,

  • Partially Observable Markov Decision Processes (POMDPs),

  • Robust Markov Decision Processes (Robust MDPs),

  • Hierarchies of Specifications,

  • Runtime Monitoring (with a particular focus on latent verification),

  • Reliability,

  • Explainability.

Talks

In addition to the tutorials and working groups, many participants presented their lines of work, blue-sky ideas, open problems, and big challenges in dedicated presentation slots. The abstracts can be found in the overview section of the talks.

The seminar provided a platform for early-career researchers among the participants to present their research in so-called spotlight talks throughout the week. In these talks, the participants presented recent work and their general research direction to the seminar audience. Thanks to the spotlights, participants got an impression of the research directions of the other participants.

In addition to the spotlight talks, the seminar participants disseminated open problems, blue-sky ideas, and big challenges. Many of these talks resulted in topics for further discussion.

2 Table of Contents

Executive Summary

Nils Jansen, Mykel Kochenderfer, Jan Kretinsky, Jana Tumova, and Maris Galesloot

Overview of Talks

Teaching Robots To “Get It Right”

Joydeep Biswas

Model-based RL: Approaches and Challenges

Wendelin Böhmer

Decision Diagrams for Scalability and Explainability

Clemens Dubslaff

Lifelong, autonomous learning of planning strategies by abstracting individual experiences

Khen Elimelech

Robust Policies for Single-Agent POMDPs and Scalability in Multi-Agent POMDPs

Maris Galesloot

Risk-Aware Planning in Constrained Multi-Agent Systems

Anna Gautier

A brief tutorial on Robot Learning

Georgia Chalvatzaki

Fuzzy Path Logic

Kush Grover

Data-Driven Model-Based Reasoning for Long-Term Robot Deployments

Nick Hawes and Bruno Lacerda

Recent Trends in Probabilistic Verification

Sebastian Junges

Incorporating Logic in Learning for Safe Personalization of Autonomous Vehicles

Ruya Karagulle, Nikos Arechiga, Andrew Best, Jonathan DeCastro, and Necmiye Ozay

Algorithms for Validation

Sydney Katz, Anthony Corso, Mykel Kochenderfer, and Robert Moss

Formal Methods & Verification in Robotics & Autonomous Systems Engineering @ Bosch: CONVINCE-ing Robust Robot Deliberation & More

Michaela Klauck

Scalable POMDPs: Challenges & Current Capabilities

Hanna Kurniawati

Learning in optimization-based control

Johannes Köhler

Formal Verification and Control with Conformal Prediction

Lars Lindemann

How to learn provably correct concepts from raw data?

Sara Magliacane

Learning the How and Why from Experience: Combining Interpretable and Explainable Methods in Robot Decision-Making

Karinne Ramirez Amaro

Using AI Methods to Make Robots More Versatile

Philipp Schillinger

New Safe Practices in Reinforcement Learning

Thiago D. Simão

Trading off safety/risk for performance

Stephen Smith

Formal Analysis of Autonomous Systems: A Runtime Assurance Perspective

Hazem Torfah

Reinforcement Learning with LTL Specifications

Abhinav Verma

Towards Safe and Generalizable Autonomy under Uncertainty

Esen Yel

Working groups

Verifying Sensori-Motor Controllers Via Latent Dynamics

Joydeep Biswas, Jonathan DeCastro, Khen Elimelech, Sydney Katz, Lars Lindemann, Sara Magliacane, Dimitra Panagou, Christian Schilling, Hazem Torfah, Cristian Ioan Vasile, and Esen Yel

Hierarchies of specifications

Jan Kretinsky

Open problems

Correctness, Safety, and Alignment in Human-Centric Applications

Jonathan DeCastro

Safe Machine Learning-Based Perception and Open Problems in Scaling Formal Methods

Sydney Katz, Anthony Corso, Chris Strong, and Mykel Kochenderfer

Participants

3 Overview of Talks

3.1 Teaching Robots To “Get It Right”

Joydeep Biswas (University of Texas – Austin, US)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Joydeep Biswas

We are interested in building and deploying service mobile robots to assist with arbitrary end-user tasks in everyday environments. In such open-world settings, how can we ensure that robots 1) are able to perceive relevant details of the world; 2) robustly anticipate and overcome errors; and 3) correctly complete the tasks expected of them? In this talk, I survey these technical challenges and present several promising directions to address them. To “get it right”, robots will have to glean appropriate contextual information from perception to understand how to navigate in the world; reason about unexpected sources of failures in the real world and learn to overcome them; and infer what correct task execution actually entails.

3.2 Model-based RL: Approaches and Challenges

Wendelin Böhmer (TU Delft, NL)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Wendelin Böhmer

While model-free reinforcement learning (RL) remains the method of choice for most practitioners, model-based RL has recently shown promising results. I give an overview on current deep model-based RL approaches and discuss why they do not surpass model-free RL approaches when applied naively. This includes some thoughts on which properties might change that, and what else one could do with the learned models.

3.3 Decision Diagrams for Scalability and Explainability

Clemens Dubslaff (TU Eindhoven, NL)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Clemens Dubslaff

Joint work of: Clemens Dubslaff, Christel Baier, Jan Kretínsky, Verena Klös, Juliane Päßler, Kallistos Weis, Sven Apel, Kai Ding, Andrey Morozov, Klaus Janschek, Debraj Chakraborty, Christoph Weinhuber, Sudeep Kanav

Decision diagrams are a versatile data structure for representing functions, most prominently binary decision diagrams (BDDs) that concisely represent boolean functions and have many desirable algebraic properties. In this talk, I illustrate how decision diagrams can be exploited for both, scalable formal verification through (probabilistic) model checking and explainable variants of control strategies. Scalability is showcased by verifying the reliability of more than 1010 variants of an aircraft velocity controller with redundant components [1]. With applying several advanced techniques from symbolic model checking, BDDs enable a verification within less than two hours on a standard computer. Towards explainable formal methods, decision diagrams can be used to concisely represent controllers where current state-of-the-art explanations are provided as decision trees. By dropping the tree constraint, merging common decision making, and encapsulating sub-diagrams in so-called templates, we can obtain much smaller controller representations by means of template decision diagrams [2]. Following Occam’s Razor, the best explanation is the simplest amongst all possible explanations. Hence, all the presented size reductions also improve explainability of control strategies. Connecting both, scalability and explainability through decision diagrams, I show how causal reasoning can be used to pinpoint reasons for verification results [3] and close the talk by sketching future challenges in the field.

References

  • [1] C. Dubslaff. Quantitative Analysis of Configurable and Reconfigurable Systems. PhD thesis, TU Dresden, Institute for Theoretical Computer Science, 2021.
  • [2] C. Dubslaff, V. Klös, and J. Päßler. Template decision diagrams for meta control and explainability. In Proceedings of the 2nd World Conference on eXplainable Artificial Intelligence (XAI), pages 219–242, Cham, 2024. Springer Nature Switzerland.
  • [3] C. Dubslaff, K. Weis, C. Baier, and S. Apel. Feature causality. Journal of Systems and Software (JSS), 209:111915, 2024.

3.4 Lifelong, autonomous learning of planning strategies by abstracting individual experiences

Khen Elimelech (Rice University – Houston, US)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Khen Elimelech

Joint work of: Moshe Vardi, Lydia Kavraki

To perform autonomously tasks such as object rearrangement, manipulation, and navigation, robots must be able to plan their actions over long horizons. Such planning is usually computationally challenging, especially considering complex robots and task specifications, or large planning domains, with many irrelevant objects and distractions. Hence, to support long-lived, multi-purpose planning agents, it is important to allow them to continually improve their planning capabilities from successful planning experience. To this end, our recent work, which will be covered in this talk, introduced a novel computational framework for online, automatic encoding of individual planning experiences as generalizable “abstract strategies;” such a strategy can later be adapted for and reused in new contexts, to accelerate the solution of new planning problems. This domain-independent approach is categorically different from standard techniques for “skill learning” or “learning from demonstration”, which consider offline policy learning from numerous demonstrations. In contrast, our suggested technique allows us to perform strategy extraction in real time from only a single experience, without human intervention.

3.5 Robust Policies for Single-Agent POMDPs and Scalability in Multi-Agent POMDPs

Maris Galesloot (Radboud University Nijmegen, NL)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Maris Galesloot

Joint work of: Marnix Suilen, Thiago D. Simão, Sebastian Junges, Steven Carr, Matthijs Spaan, Ufuk Topcu, Nils Jansen

Partially observable Markov decision processes (POMDPs) capture sequential decision-making in uncertain, partially observable environments. Since state observations are not Markovian, high-quality policies require reasoning about the history of the system’s observations, i.e., they require memory. In this talk, we highlight recent advances in computing policies for robust and multi-agent extensions of POMDPs.

Firstly, we presented recent work on robust POMDPs. A policy for a robust POMDP is called robust when it accounts for the worst-case among an infinite set of POMDPs encapsulated by the so-called uncertainty set of the robust POMDP. We introduce the pessimistic iterative planning (PIP) framework to compute robust policies by iteratively selecting the worst-case POMDP for the current policy from the uncertainty set [2]. We implement PIP in rFSCNet, an algorithm that optimizes a recurrent neural network to compute such robust policies. Empirical results show that the approach is promising and can outperform various baselines and a state-of-the-art planner for robust POMDPs.

Finally, we summarized recent work on computing policies for centralized multi-agent POMDPs with many agents. Multi-agent POMDPs are typically large POMDPs with an agent-wise factorization in their action and observation spaces. We consider the case where the agents share their belief distribution over the state and value function, but interactions between the agents adhere to or can be approximated by a sparse graphical structure. We utilize the sparse interactions among agents in sampled-based online planning approaches, exploiting it in the Monte Carlo estimates of both the value function and the belief distributions maintained by the agents. Thereby, we extend existing sample-based planners to operate in settings where many agents are involved [1].

References

  • [1] Galesloot, M. F. L., Simão, T. D., Junges, S., & Jansen, N. (2024). Factored Online Planning in Many-Agent POMDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17407-17415. https://doi.org/10.1609/aaai.v38i16.29689
  • [2] Galesloot, M. F. L., Suilen, M., Simão, T. D., Carr, S., Spaan, M. T. J., Topcu, U., & Jansen, N. (2024). Pessimistic Iterative Planning for Robust POMDPs. arXiv preprint arXiv:2408.08770.

3.6 Risk-Aware Planning in Constrained Multi-Agent Systems

Anna Gautier (KTH Royal Institute of Technology – Stockholm, SE)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Anna Gautier

This talk addresses the challenges that arise in multi-agent systems that are faced both with uncertainty and constraints. We first discuss the problem of resource allocation in factored, multi-agent Markov Decision Processes by enumerating a few different risk-constrained optimization objectives. We then propose a solution to one such objective, chance-constrained optimization, via an auction mechanism that allocates both resources and uncertainty. Finally, we consider how humans in these systems might perceive the risk-taking carried out by AI agents. We present a user study that tests humans’ tolerance for risk in a collaborative block stacking task.

3.7 A brief tutorial on Robot Learning

Georgia Chalvatzaki

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Georgia Chalvatzaki

This tutorial provides an in-depth exploration of robot learning, focusing on how machine learning techniques enable robots to learn and adapt through interaction with their environments. It covers core methods in imitation learning (IL), such as behavior cloning and addressing distribution shifts, and extends to advanced topics like generative models and privileged teachers. Additionally, the tutorial delves into reinforcement learning (RL) strategies, highlighting approaches such as Deep Q-learning, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). The ultimate goal discussed is what are the techniques that we currently investigate in the field of robot learning for equipping robots with general-purpose embodied intelligence, empowering them to autonomously handle dynamic, real-world tasks through continuous learning.

3.8 Fuzzy Path Logic

Kush Grover (TU München, DE)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Kush Grover

Joint work of: Jan Kretinsky, Kush Grover

Defining soft preferences for motion planning (MP) in Signal Temporal Logic works to some extent by using robustness. However, undesired paths may still be considered optimal in terms of robustness. To address this, we introduce a new family of temporal logics specifically designed for soft preferences. This logic naturally expresses path segments and handles degrees of satisfaction more flexibly. Built on fuzzy, time-varying signal constraints, it offers greater expressivity, making it (i) more intuitive for human-given specifications in MP, and (ii) more suitable for learning specifications from demonstrations compared to other logics.

3.9 Data-Driven Model-Based Reasoning for Long-Term Robot Deployments

Nick Hawes (University of Oxford, GB) and Bruno Lacerda (University of Oxford, GB)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Nick Hawes and Bruno Lacerda

Mission planning for long-term robot deployments is an extremely challenging problem that requires close integration between machine learning and model-based reasoning techniques. We propose explicitly considering the long-term learning and the mission-level behaviour planning and execution as two tightly coupled components. Each of these components has specific roles in enabling intelligent and adaptive robots. We describe a framework where the long-term autonomy component uses gathered data to learn possible models of the environment dynamics, and the mission-planning component synthesises plans that explicitly consider that the learnt models are underspecified and uncertain. These plans should be able to use online observations to refine their estimates over the underlying true model, thus adapting to deployment-time conditions. We describe recent research related to each of these components, and discuss how frameworks that encompass both in an integrated fashion can be developed.

3.10 Recent Trends in Probabilistic Verification

Sebastian Junges (Radboud University Nijmegen, NL)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Sebastian Junges

In this tutorial talk, we consider probabilistic verification of Markov decision processes (MDPs). In short, this means that consider approaches that take an MDP in a suitable format and a temporal specification on that MDP and ask whether the specification holds on the MDP. In this talk, we cover three areas.

(1) Probabilistic model checking (PMC). PMC has emerged as a toolbox with algorithmic support for a wide variety of specifications. Under the hood, PMC relies on solving undiscounted Bellman equations. The PMC community provides a variety of mutually compatible tools, most prominently Storm and PRISM. (2) Novel methods in probabilistic verification. We discuss more recent ideas whose methods have different performance characteristics. In particular, we consider (a) inductive synthesis for inductive invariants in probabilistic models that search for a proof that a property holds, (b) methods from probabilistic inference, in particular using BDD-based model counting, and (c) compositional reasoning in MDPs. (3) Uncertainty on top of probabilistic models, such as parametric MDPs and their application in Bayesian reasoning as well as sensitivity analysis as well as the simultaneous verification of many MDPs with the tool PAYNT.

3.11 Incorporating Logic in Learning for Safe Personalization of Autonomous Vehicles

Ruya Karagulle (University of Michigan – Ann Arbor, US), Nikos Arechiga, Andrew Best, Jonathan DeCastro (Toyota Research Institute – Cambridge, US), and Necmiye Ozay

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Ruya Karagulle, Nikos Arechiga, Andrew Best, Jonathan DeCastro, and Necmiye Ozay

This talk introduces an offline and an online preference learning method that ensures adherence to given specifications, with an application to autonomous vehicles. Our approach incorporates the priority ordering of Signal Temporal Logic (STL) formulas describing traffic rules into a learning framework. By leveraging Weighted Signal Temporal Logic (WSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons and propose two approaches to solve this learning problem. The learned valuations lead to a WSTL formula that can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with two pilot human subject studies in different simulated driving scenarios.

3.12 Algorithms for Validation

Sydney Katz (Stanford University, US), Anthony Corso, Mykel Kochenderfer (Stanford University, US), and Robert Moss

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Sydney Katz, Anthony Corso, Mykel Kochenderfer, and Robert Moss

This talk outlines the content of a new textbook called Algorithms for Validation. This book provides a broad introduction to algorithms for validating safety-critical systems. We cover a wide variety of topics related to validation, introducing the underlying mathematical problem formulations and the algorithms for solving them.

3.13 Formal Methods & Verification in Robotics & Autonomous Systems Engineering @ Bosch: CONVINCE-ing Robust Robot Deliberation & More

Michaela Klauck (Robert Bosch GmbH – Stuttgart, DE)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Michaela Klauck

Joint work of: Michaela Klauck, Ralph Lange, Christian Heinzemann

The first part of this talk is an experience report on how Bosch brings formal methods to autonomous systems engineering. We strive at making formal methods efficiently applicable in industry, especially in autonomous driving and robotics. We report on our experiences and findings gained in industrial research projects in which we developed and analyzed formal methods tooling for the application in industrial case studies. These works provided insights how formal methods can be made attractive for industry. At the same time, they raised challenging research questions to overcome recurring problems which often still prevent practical applicability of formal methods in industry. In more detail, we report on how we applied search-based testing in automated driving control applications in several use cases, how we developed a formally verifiable DSL, called vTSL, for the specification of robot tasks by end-users, and how we automatically generate a formal model from the system code of a behavior planner of an autonomous car to apply model checking during development.

The second part of the talk is about the EU Horizon project CONVINCE, where we show that formal methods are also of great help to achieve robust deliberation capabilities of autonomous robots. We discuss the challenges we are currently facing, ideas on how to even better integrate formal verification, learning, and robotics, and potential topics for collaborations.

3.14 Scalable POMDPs: Challenges & Current Capabilities

Hanna Kurniawati (Australian National University – Canberra, AU)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Hanna Kurniawati

This talk presents some of the main challenges in making POMDPs practical for robotics and a brief overview of our work in alleviating those challenges. I hope this talk elucidates what we can now do with POMDPs.

3.15 Learning in optimization-based control

Johannes Köhler (ETH Zürich, CH)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Johannes Köhler

Autonomous systems require reliable controllers that address complex specification and complex system dynamics. Optimization-based control methods, such as model predictive control, provide a systematic method that can ensure safe operation for such challenging environments. In this talk, I show how machine learning methods can address the following challenges: uncertain environments; uncertain dynamics; and efficient implementation. All methods leverage non-parametric function regression, corresponding error bounds, and guarantee safe operation (with high probability) for the resulting closed-loop system. In a second part, I discuss large open questions, such as fragility of online learning in control.

3.16 Formal Verification and Control with Conformal Prediction

Lars Lindemann

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Lars Lindemann

Learning-enabled autonomous systems promise to enable many future technologies such as autonomous driving, intelligent transportation, and robotics. Accelerated by algorithmic and computational advances in machine learning and AI, there has been tremendous success in the design of learning-enabled autonomous systems. However, these exciting developments are accompanied by new fundamental challenges that arise regarding the safety and reliability of these increasingly complex control systems in which sophisticated algorithms interact with unknown dynamic environments. Imperfect learning and unknowns in the environment require control techniques to rigorously account for such uncertainties. I advocate for the use of conformal prediction (CP) – a statistical tool for uncertainty quantification – due to its simplicity, generality, and efficiency as opposed to existing optimization techniques that are either conservative or subject to scalability issues. I first provide an accessible introduction to CP for the non-expert. My goal is then to show how we can use CP to design probabilistically safe motion planning algorithms in dynamic environments. Specifically, we will design a model predictive controller in conjunction with (i) learning-enabled trajectory predictors to obtain predictions of the environment, and (ii) conformal prediction regions quantifying uncertainty of these predictions. We will also discuss how to deal with distribution shifts that arise when the deployed learning-enabled system changes. While existing approaches quantify uncertainty heuristically, we quantify uncertainty in a distribution-free manner with probabilistic safety guarantees. Finally, we provide an extension that enables the consideration of high-level formal system specifications via mixed integer linear programing.

3.17 How to learn provably correct concepts from raw data?

Sara Magliacane (University of Amsterdam, NL)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Sara Magliacane

In this talk, I will present a blue-sky idea that builds towards learning provably correct concepts from raw data based on causal representation learning. In many settings, we might have datasets that have spurious correlations between the concept we are interested in learning and other irrelevant features, which might result in a machine learning method showing perfect performance on the dataset, despite learning the wrong concept, which is often referred to as shortcut learning. Marconato et al. [1] show that this issue can be even worse in neurosymbolic predictors, when we do not have access to the labels of the concepts, but we only have access to a label of a constraint on the concepts, e.g. a safety constraint. In safety-critical settings learning these concepts wrongly could potentially lead to misplaced confidence in the safety of neurosymbolic systems. As one of the mitigation strategies in this case Marconato et al. [1] propose the use of disentangled representation learning methods, which are based on the assumptions that the underlying concepts are independent in the underlying data.

Inspired by this work, in this talk I will describe a few similar ideas, focusing first on the simpler case of learning concepts when we have labels for the individual concepts, but allowing for concepts that might have (spurious) correlations in the data. In particular, I will describe a potential pipeline that builds on the recent advances in causal representation learning in interactive temporal settings. Causal representation learning (CRL) aims at identifying high level causal variables (e.g. concepts) from high-dimensional raw data, e.g. images, up to a class of transformations. BISCUIT [2] is a promising CRL approach that leverages actions in an environment to learn a disentangled representation of the causal variables, even when they are correlated with each other. In this representation, each set of latent variables that is assigned to an individual causal variable or concept is guaranteed to have no information about any other causal variable, allowing us to manipulate the reconstructed causal variables as independent factors of variation.

Assuming that the concepts we want to learn are represented by the causal variables in this disentangled representation, we can now split the problem in two phases: a first phase with unlabelled data in which we learn a causal representation with disentanglement guarantees, and a second phase in which we use a small number of labeled data to learn simple alignment functions for each individual concept. This allows us to avoid learning shortcuts in which we wrongly use an irrelevant concept instead of the correct one. Additionally, since for each concept we are learning simpler functions in low-dimensional spaces, instead of learning a potentially complex function directly from the high-dimensional data, this requires less labeled data and it might ideally simplify verifying the correctness of each of these functions or reliably characterize the uncertainty in the prediction.

Further interesting directions include devising active learning strategies to select the most informative actions in the environment to learn these alignment functions, as well as selecting actions that maximally improve the joint learning of causal representations and the alignment functions, progressively learning more and more disentangled representations. Finally, in this active learning setting, we also consider the challenging case in which we do not have (all) concept labels, but only labels on a downstream task involving the concepts, e.g. a constraint.

References

  • [1] Marconato, E., Teso, S., Vergari, A., & Passerini, A. (2024). Not all neuro-symbolic concepts are created equal: Analysis and mitigation of reasoning shortcuts. Advances in Neural Information Processing Systems, 36.
  • [2] Lippe, P., Magliacane, S., Löwe, S., Asano, Y. M., Cohen, T., & Gavves, E. (2023, July). Biscuit: Causal representation learning from binary interactions. In Uncertainty in Artificial Intelligence (pp. 1263-1273). PMLR.

3.18 Learning the How and Why from Experience: Combining Interpretable and Explainable Methods in Robot Decision-Making

Karinne Ramirez Amaro (Chalmers University of Technology – Göteborg, SE)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Karinne Ramirez Amaro

Autonomous robots should efficiently and reliably learn new skills while reusing experiences. What is limiting the advancement of robotic autonomy? Autonomy has rapidly increased with the development of Interpretable and Explainable methods. Interpretable methods focus on understanding how the learned model reaches decisions by examining its structure and relationships. Explainable methods reveal why a model made specific decisions without requiring an understanding of the model itself. Combining these methods in robotic systems enhances the transparency of decision-making processes. While challenging, Interpretable and Explainable capabilities are crucial for deploying Robots in real and dynamic environments. In this talk, I will first introduce our interpretable AI methods that generate compact and general semantic models to infer human activities, enabling robots to gain a high-level understanding of human movements. Next, I will present our causal-based approach, which empowers robots to rapidly predict and prevent both immediate and future failures. This method helps robots understand why failures occurred, allowing them to learn from their mistakes, thus improving their future performances. Finally, I will discuss strategies for combining these methods into a single framework by integrating symbolic planning with hierarchical Reinforcement Learning. This integration allows us to learn flexible and reusable robot policies for manipulation tasks, creating holistic sequences of actions that can be executed independently. Interpretable and Explainable AI are key to developing general-purpose robots. These approaches enable robots to make complex decisions in dynamic and unpredictable environments by learning “how” and “why”, ultimately improving robotic autonomy.

3.19 Using AI Methods to Make Robots More Versatile

Philipp Schillinger (Bosch Center for AI – Renningen, DE)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Philipp Schillinger

Developments in AI methods continuously motivate to advance existing approaches in robotics towards increased versatility. This talk looks into different concrete examples at Bosch from recent years, ranging from learning-based task allocation in multi-agent systems over motion skill optimization and sequencing in robotic assembly, model-free grasping and life-long learning in robotic bin picking, up to language-based planning agents with active perception.

3.20 New Safe Practices in Reinforcement Learning

Thiago D. Simão (TU Eindhoven, NL)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Thiago D. Simão

Joint work of: Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan

We confront the limitations inherent in specifying the behavior expected from a reinforcement learning agent solely via a reward function. We introduce a model that mitigates this issue using constraints, and we discuss two novel ways to learn an optimal policy while minimizing the safety constraint violation. The first is a transfer approach that can efficiently and safely explore the environment [1]. The second is a safe curriculum generation framework, which reduces the safety violations of a constrained agent by gradually increasing the difficulty of the underlying tasks.

References

  • [1] Q. Yang, T. D. Simão, N. Jansen, S. H. Tindemans, and M. T. J. Spaan, “Reinforcement learning by guided safe exploration,” in ECAI, 2023, pp. 2858–2865.

3.21 Trading off safety/risk for performance

Stephen Smith (University of Waterloo, CA)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Stephen Smith

In risk-aware motion planning for autonomous systems, it is common to pose an optimization problem that captures both system performance and safety/risk. A typical formulation is to maximize system performance subject to safety constraints. However, typically, one cannot guarantee absolute safety, and thus the constraints contain user-specified thresholds or parameters. In this talk, we will discuss whether this is the “right” way to solve risk-aware planning, or if tools from multi-objective optimization might offer alternatives that allow us better explore the trade-off between performance and risk.

3.22 Formal Analysis of Autonomous Systems: A Runtime Assurance Perspective

Hazem Torfah (Chalmers University of Technology – Göteborg, SE)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Hazem Torfah

In recent years, there has been an increase in autonomous systems operating in complex environments and relying on machine learning (ML) components to perform challenging decision-making tasks. However, ML models are brittle and susceptible to failures that can compromise the safety of autonomous systems. There is a pressing need for a systematic methodology that can identify the conditions in which components in the autonomy pipeline can fail at design time and detect such failures at runtime.

This talk introduces safety assurance approach for autonomous systems based on runtime verification. Runtime verification is an analysis technique based on extracting information from a system at runtime and evaluating this information to determine whether an execution of the system satisfies or violates a given property. Specifically, we will report on recent results, presenting runtime verification methods for capturing the operational design domains, i.e., the conditions under which the system or a component thereof is designed to operate safely, and for evaluating the safety of a system in noisy and unpredictable environments.

3.23 Reinforcement Learning with LTL Specifications

Abhinav Verma (Pennsylvania State University – University Park, US)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Abhinav Verma

Joint work of: Cameron Voloshin, Abhinav Verma, Yisong Yue

Specifications in linear temporal logic (LTL) offer a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard RL frameworks can be too myopic to find maximally satisfying policies. In this talk we will discuss eventual discounting, a value-function based proxy under which one can find policies that satisfy a specification with highest achievable probability. To improve the efficiency of learning from specifications we combine eventual discounting with LTL-guided Counterfactual Experience Replay, a method for generating off-policy data from on-policy rollouts via counterfactual reasoning. Finally, we will discuss a mechanism for exploiting the compositionality of a specification to provide formal guarantees on the behavior of learnt policies for reach-avoid tasks.

3.24 Towards Safe and Generalizable Autonomy under Uncertainty

Esen Yel (Rensselaer Polytechnic Institute, US)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Esen Yel

Joint work of: Esen Yel, Nicholas Rober, Sydney Katz, Chelsea Sidrane, Michael Everett, Mykel Kochenderfer, Anil Yildiz, Anthony Corso, Kyle Wray, Stefan Witwicki, Taylor Carpenter, Carmelo di Franco, Radoslav Ivanov, Yiannis Kantaros, Insup Lee, James Weimer, Nicola Bezzo

With the increased use of autonomy in safety-critical applications, it is crucial to develop safe and adaptable systems. This talk first introduces an assured runtime safety monitoring technique that draws ideas from machine learning, reachability analysis, and verification techniques to assess the safety of an autonomous robot under disturbance. It then continues with a neural network controller verification approach that uses backward reachability to verify the closed-loop system safety of the system. The talk then touches on adaptability and introduces a technique to relate an unseen condition for an autonomous vehicle to previous experiences to generate policies for novel conditions. Finally, it concludes with a discussion of future research directions and challenges.

4 Working groups

4.1 Verifying Sensori-Motor Controllers Via Latent Dynamics

Joydeep Biswas (University of Texas – Austin, US), Jonathan DeCastro (Toyota Research Institute – Cambridge, US), Khen Elimelech (Rice University – Houston, US), Sydney Katz (Stanford University, US), Lars Lindemann (USC – Los Angeles, US), Sara Magliacane (University of Amsterdam, NL), Dimitra Panagou (University of Michigan – Ann Arbor, US), Christian Schilling (Aalborg University, DK), Hazem Torfah (Chalmers University of Technology – Göteborg, SE), Cristian Ioan Vasile (Lehigh University – Bethlehem, US), and Esen Yel (Rensselaer Polytechnic Institute, US)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Joydeep Biswas, Jonathan DeCastro, Khen Elimelech, Sydney Katz, Lars Lindemann, Sara Magliacane, Dimitra Panagou, Christian Schilling, Hazem Torfah, Cristian Ioan Vasile, and Esen Yel

There is significant interest in learning sensori-motor controllers with applications ranging from whole-body humanoid control, to tabletop manipulation, to autonomous driving. In all such applications, there is a critical need for formal guarantees on the behavior of the learned controllers. Unfortunately, verifying sensori-motor controllers is challenging for two key reasons: First, state of the art formal methods approaches are ill-equipped to handle high-dimensional inputs such as images, which are common inputs for such controllers. Second, the learning problem from such high-dimensional inputs is often ill-posed, due to visual artefacts and the difficulty of modeling the input distribution.

In this working group, we proposed a novel class of approaches to verifying sensori-motor controllers that leverages recent developments in representation learning, conformal prediction, and formal methods. Given either a black-box or grey-box controller distilled from a black-box version, we first learn an encoder from the high-dimensional sensor inputs to a low-dimensional disentangled latent representation space such that the latent space disentangles the controller-relevant states from perceptual nuisances (e.g., lighting conditions, occlusions or changes in texture). We use conformal prediction to quantify the uncertainty in the learned representation, specifically the uncertainty in the invariance to the nuisance factors. We then learn the closed-loop dynamics of the original sensori-motor controller as latent dynamics in this latent space, such that the dynamics of the original controller are reflected in the latent space through the encoder. Finally, we use established formal methods tools to verify the latent dynamics in the latent space, which is easier than in the original space.

Building on the proposed approach, we discussed challenges and opportunities, and presented a roadmap for future research in this promising new direction to enable formal guarantees for learned sensori-motor controllers.

4.2 Hierarchies of specifications

Jan Kretinsky (Masaryk University – Brno, CZ)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Jan Kretinsky

Complex relationships between specifications are omnipresent, yet not discussed explicitly in full detail. Examples include the following:

  • Midas problem: a robot should fetch a cup of coffee and to ensure satisfaction it eliminates human race so that nobody can switch it off and thus prevent it from fetching the coffee; indeed, the possibility to switch a robot off is implicitly present and of higher priority than the specification itself.

  • Implicatures: task of finding an empty lecture hall implicitly implies an even more important task of reporting on the result of the findings.

  • Context-specific preferences: being tasked to check Nick’s office whether he is ready for lunch and not asking when meeting him in a corridor having an important discussion can be both right or wrong. We discuss the possibility of a framework to structure the specifications into a hierarchy. Such explicit handling allows for posing clear questions such as:

  • What is the precise semantics (lexicographic preference or trade-offs)?

  • How and what specifications to learn from user’s preferences, which to hardcode by the engineers, which to set as a legally binding package checked by the certification authority?

  • How to approach the synthesis algorithmically in an efficient way? E.g., an off button is installed in the hardware to simplify the synthesis of the rest.

5 Open problems

5.1 Correctness, Safety, and Alignment in Human-Centric Applications

Jonathan DeCastro (Toyota Research Institute – Cambridge, US)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Jonathan DeCastro

Humans are involved in both specifying what they want an autonomous system to do, and how they want the system to interact with other human agents. In the first portion of my talk, I introduce work done in the human-centric areas of full autonomy and shared autonomy. I outline examples of progress in various applications of design and verification of systems for autonomous driving and human-AI teaming. In the second portion, I cover, at a high-level, some of the prevailing technical challenges in each of these fields.

5.2 Safe Machine Learning-Based Perception and Open Problems in Scaling Formal Methods

Sydney Katz (Stanford University, US), Anthony Corso, Chris Strong, and Mykel Kochenderfer (Stanford University, US)

License: [Uncaptioned image] Creative Commons BY 4.0 International license © Sydney Katz, Anthony Corso, Chris Strong, and Mykel Kochenderfer

This talk outlines an open problem in scaling formal methods used for neural network verification. While it is important to scale neural network verification techniques to handle larger networks, it is equally important to ensure that we can formally specify properties for these larger networks, which often have complex, high-dimensional inputs such as images or natural language. We outline some recent work in specifying properties for a computer vision system using generative modeling.

6 Participants

  • Eric Atkinson – Binghamton University, US

  • Joydeep Biswas – University of Texas – Austin, US

  • Wendelin Böhmer – TU Delft, NL

  • Jonathan DeCastro – Toyota Research Institute – Cambridge, US

  • Clemens Dubslaff – TU Eindhoven, NL

  • Khen Elimelech – Rice University – Houston, US

  • Georgios Fainekos – Toyota Motor North America, R&D – Ann Arbor, US

  • Maris Galesloot – Radboud University Nijmegen, NL

  • Anna Gautier – KTH Royal Institute of Technology – Stockholm, SE

  • Kush Grover – TU München, DE

  • Sofie Haesaert – TU Eindhoven, NL

  • Nick Hawes – University of Oxford, GB

  • Nils Jansen – Ruhr-Universität Bochum, DE

  • Sebastian Junges – Radboud University Nijmegen, NL

  • Ruya Karagulle – University of Michigan – Ann Arbor, US

  • Sydney Katz – Stanford University, US

  • Michaela Klauck – Robert Bosch GmbH – Stuttgart, DE

  • Mykel Kochenderfer – Stanford University, US

  • Johannes Köhler – ETH Zürich, CH

  • Jan Kretinsky – Masaryk University – Brno, CZ

  • Hanna Kurniawati – Australian National University – Canberra, AU

  • Bruno Lacerda – University of Oxford, GB

  • Lars Lindemann – USC – Los Angeles, US

  • Sara Magliacane – University of Amsterdam, NL

  • Dimitra Panagou – University of Michigan – Ann Arbor, US

  • David Parker – University of Oxford, GB

  • Karinne Ramirez Amaro – Chalmers University of Technology – Göteborg, SE

  • Christian Schilling – Aalborg University, DK

  • Philipp Schillinger – Bosch Center for AI – Renningen, DE

  • Thiago D. Simão – TU Eindhoven, NL

  • Stephen Smith – University of Waterloo, CA

  • Matthijs Spaan – TU Delft, NL

  • Hazem Torfah – Chalmers University of Technology – Göteborg, SE

  • Jana Tumova – KTH Royal Institute of Technology – Stockholm, SE

  • Cristian Ioan Vasile – Lehigh University – Bethlehem, US

  • Abhinav Verma – Pennsylvania State University – University Park, US

  • Esen Yel – Rensselaer Polytechnic Institute – Troy, US

[Uncaptioned image]