Strategic reasoning is essential to ensure stable multi-agent coordination in complex environments, as it enables synthesis of optimal (or near-optimal) agent strategies and equilibria that guarantee expected outcomes, even in adversarial scenarios. Partially-observable stochastic games (POSGs) are a natural model for real-world settings involving multiple agents, uncertainty and partial information, but lack practical algorithms for computing or approximating optimal values and strategies. Recently, progress has been made for one-sided POSGs, a subclass of two-agent, zero-sum POSGs where only one agent has partial information while the other agent is assumed to have full knowledge of the state, with heuristic search value iteration (HSVI) proposed for computing approximately optimal values and strategies in one-sided POSGs [Horák et al., 2023]. This model is well suited to safety-critical applications, when making worst-case assumptions about one agent; examples include the attacker in a security application, modelled, e.g., as a patrolling or pursuit-evasion game.
However, many realistic autonomous coordination scenarios involve agents perceiving continuous environments using data-driven observation functions, typically implemented as neural networks (NNs). Examples include autonomous vehicles using NNs to perform object recognition or to estimate pedestrian intention, or NN-enabled vision in an airborne pursuit-evasion scenario. Such perception mechanisms bring new challenges, notably continuous environments, which are inherently tied to NN-enabled perception because of standard training regimes. This means that naive discretisation is difficult, since decision boundaries obtained for data-driven perception are typically irregular and can be misaligned with gridding schemes for discretisation, affecting the precision of the computed strategies.
This invited paper will discuss progress with developing a model class and algorithms for one-sided POSGs with neural perception mechanisms [R. Yan et al., 2022; Yan et al., 2023] that work directly with their continuous state space. Building on continuous-state POMDPs with NN perception mechanisms [Yan et al., 2023], where the key idea is that ReLU neural network classifiers induce a finite decomposition of the continuous environment into polyhedra for each classification label, a piecewise constant representation for the value, reward and perception functions is developed that forms the basis for a variant of HSVI, a point-based solution method that computes a lower and upper bound on the value function from a given belief to compute an (approximately) optimal strategy. We extend these ideas from the single-agent (POMDP) setting [Yan et al., 2023] to zero-sum POSGs. In the game setting, this involves solving a normal form game at each stage and iteration, and goes significantly beyond HSVI for finite POSGs [Horák et al., 2023].