Algorithms and Complexity for Continuous Problems
Abstract
The Dagstuhl Seminar 23351 was held at the Leibniz Center for Informatics, Schloss Dagstuhl, from August 27 to September 1, 2023. This event was the 14th in a series of Dagstuhl Seminars, starting in 1991. During the seminar, researchers presented overview talks, recent research results, work in progress and open problems. The first section of this report describes the goal of the seminar, the main seminar topics, and the general structure of the seminar. The third section contains the abstracts of the talks given during the seminar and the forth section the problems presented at the problem session.
Keywords and phrases:
computational stochastics, infinite-variate problems, quasi-Monte Carlo, sampling, tractability analysisSeminar:
August 27 – September 1, 2023 – https://www.dagstuhl.de/233512012 ACM Subject Classification:
Mathematics of computing Approximation ; Mathematics of computing Numerical analysis ; Mathematics of computing Probabilistic algorithmsCopyright and License:
1 Executive Summary
Michael Gnewuch (Universität Osnabrück, DE)
Dmitriy Bilyk (University of Minnesota – Minneapolis, US)
Jan Vybíral (Czech Technical University – Prague, CZ)
Larisa Yaroslavtseva (Universität Graz, AT)
License:
Creative Commons BY 4.0 International license © Michael Gnewuch, Dmitriy Bilyk, Jan Vybíral, and Larisa Yaroslavtseva
The goal of the Dagstuhl Seminar was to bring together researchers that work in various fields united by a common theme of complexity of continuous problems. In this context, complexity refers to the computational effort required to solve the problem approximately, up to a given error. For these, often high- or even infinite-dimensional, problems it is desirable to have theoretical bounds on the complexity as well as explicit constructions of (nearly) optimal algorithms. Their applications range over natural sciences, economics, statistics, engineering, computer science (including computer graphics, global optimization, and machine learning) and other areas of interest. The focus of the seminar was on the following highly interrelated topics:
Tractability analysis of high-dimensional problems.
Tractability analysis has its roots in information-based complexity. It studies continuous problems, where the instances are typically -variate functions. The focus is on how the problem complexity depends on the error tolerance and on . Topics discussed during the seminar were the study of recent notions of tractability and of new problem settings.
Computational stochastics.
The focus was on strong approximation and quadrature of stochastic differential equations (SDEs) with non-globally Lipschitz continuous coefficients. Systematic investigations on the numerical approximation of such SDEs have started in the last decade and are still in their infancy.
Infinite-variate problems: algorithms and complexity.
Many applications rely on models with countably many variables. The complexity analysis of the resulting continuous problems may be viewed as the limit of tractability analysis for d-variate functions, where d tends to infinity. For many problems sharp complexity bounds have been proved with the help of generic types of algorithms, as multilevel algorithms and multivariate decomposition methods, but mainly in the case where the spaces of input functions are weighted reproducing kernel Hilbert spaces (RKHSs) based on product weights or RKHSs of increasing smoothness. A major question was how to tackle important function spaces, which exhibit a different underlying structure.
Well-distributed point configurations: discrepancy, quasi-Monte Carlo methods, dispersion.
Many problems in approximation theory and complexity rely on the existence and construction of good (random or deterministic) point distributions with respect to discrepancy, integration errors, strength of cubature formulas, dispersion, or discrete energy. Numerous questions in this area are still open: in various regimes sharp bounds for discrepancy and dispersion still remain unknown and explicit efficient constructions are still lacking. These problems were intensely discussed during the seminar.
Linear and standard information in approximation theory.
A classical problem of approximation theory, learning theory and data analysis is the approximation of an unknown multivariate function from incomplete information. The information can be given by a limited number of function evaluations (standard information) or by evaluating a finite number of arbitrary linear functionals (linear information). Even in the simplest setting of approximation in a norm of a Hilbert space, the relation between using standard information and linear information was understood only recently and there is still a number of challenging open questions in this area. In several seminar talks interesting ideas to tackle these questions and new results were presented.
During the seminar we had five overview talks of 60 minutes (incl. discussion; one for each of the main research topics) given by leading experts in the respective area to enable reseachers from different fields to follow the regular talks. The speakers and the titles of these lectures were (ordered according to the list of main research topics):
-
Peter Kritzer (RICAM Linz), Tractability analysis.
-
Thomas Müller-Gronbach (Unversität Passau), On the complexity of strong approximation of SDEs with a non-Lipschitz drift coefficient.
-
Klaus Ritter (RPTU Kaiserslautern-Landau), Infinite-variate integration and -approximation.
-
Stefan Steinerberger (University of Washington – Seattle), Well-distributed point configurations.
-
Mario Ullrich (JKU Linz), On optimal approximation based on random samples.
These talks were delivered at the first three days of the seminar. Additionally, we had 24 regular talks of 30 minutes (incl. discussion), many of them presented by young researchers. Due to the time format, there was plenty of time for discussions and collaborations in smaller groups.
Originally we planned an additional poster session for the participants who do not present a talk. Since finally the number of time slots for talks was sufficient to ensure that everyone who wanted to present a talk could actually do so, there was no need for a poster session anymore.
On Wednesday morning we had a problem session to share interesting open problems and conjectures and to initiate further discussions. Several researchers used the opportunity to present interesting and challenging open problems. One problem was even solved in a discussion of some participants directly after the problem session. The detailed problem formulations can be found at the end of this report.
2 Table of Contents
3 Overview of Talks
3.1 Constructing Low-Discrepancy Point Sets: Subset Sampling and Optimal Sets
François Clément (Sorbonne University – Paris, FR)
License:
Creative Commons BY 4.0 International license © François Clément
Joint work of: François Clément, Carola, Doerr, Kathrin Klamroth, Luís Paquete
Low-discrepancy sequences such as Sobol’ or Halton have been designed to have small discrepancy values asymptotically. However, for practical applications, only a finite number of points will be possible, often too small to reach the asymptotic regime. In this talk, I present two different approaches to construct point sets with low discrepancy values.
The first is via the Star Discrepancy Subset Selection Problem, which consists in selecting from a fixed point set of size the best subset of size , best being the subset with the smallest star discrepancy. We tackle this problem both via exact methods in dimensions 2 and 3 [1] and via heuristics in higher dimensions [2]. In particular, we observe that we are able to obtain sets with 10-50% smaller discrepancy for all tested dimensions and number of points.
The second is the construction of optimal point sets for the star discrepancy. This has been a rather overlooked problem, with the main previous work by White in 1977 solving for up to 6 points in dimension 2. We provide non-linear programming formulations to solve the problem for n smaller than 20 in dimension 2 and smaller than 8 in dimension 3 [3]. In particular, the structure of the local discrepancy values over for our optimal sets is very different to that of previously known sets such as Fibonacci or Sobol’.
References
- [1] François Clément; Carola Doerr; Luís Paquete, Star discrepancy subset selection: Problem formulation and efficient approaches for low dimensions, Journal of Complexity, 70:101645, 2022
- [2] François Clément; Carola Doerr; Luís Paquete, Heuristic approaches to obtain low-discrepancy point sets via subset selection, 2023
- [3] François Clément; Carola Doerr; Kathrin Klamroth, Luís Paquete, Constructing Optimal Star Discrepancy Sets, https://arxiv.org/abs/2311.17463, 2024
3.2 On the existence of optimal shallow networks
Steffen Dereich (Universität Münster, DE)
License:
Creative Commons BY 4.0 International license © Steffen Dereich
Joint work of: Steffen Dereic, Arnulf Jentzen, Sebastian Kassing
In this talk, we discuss existence of global minima in optimisation problems over shallow neural networks. More explicitly, the function class over which we minimise is the family of all functions that can be expressed as artificial residual or feedforward neural networks with one hidden layer featuring a specified number of neurons with ReLU (or Leaky ReLU) activation. We give existence results. Moreover, we provide counterexamples that illustrate the relevance of the assumptions imposed in the theorems.
3.3 Data compression using lattices for machine learning
Kumar Harsha (Universität Osnabrück, DE)
License:
Creative Commons BY 4.0 International license © Kumar Harsha
Joint work of: Kumar Harsha, Michael Gnewuch, Marcin Wnuk
The mean squared error is one of the standard loss functions in supervised machine learning. However, calculating this loss for enormous data sets can be computationally demanding. Modifying a earlier approach [1], we present algorithms to reduce extensive data sets to a smaller size using rank-1 lattice rules. With this compression strategy in the precomputation step, every lattice point gets a pair of weights depending on the original data and responses, representing its relative importance. As a result, the compressed data makes iterative loss calculations in optimization steps much faster. The proposed compression strategy is highly beneficial for regression problems without an analytical solution. Our derivation of the formulas for the weights assumes that input functions have a convergent Fourier series and that the relevant Fourier coefficients form a hyperbolic cross. Accordingly, we have analyzed our algorithms’ error for functions whose Fourier coefficients decay sufficiently fast such that they lie in Wiener algebras.
References
- [1] Josef Dick; Michael Feischl, A quasi-Monte Carlo data compression algorithm for machine learning, Journal of Complexity, 67:101587, 12 2021
3.4 Results and open questions around the adaption problem
Stefan Heinrich (RPTU – Kaiserslautern, DE)
License:
Creative Commons BY 4.0 International license © Stefan Heinrich
In the framework of Information-Based Complexity we consider the adaption problem for linear solution operators in the randomized setting. Along with surveying recent advances we discuss a number of remaining/arising/related open questions.
The results presented are contained in the following papers:
References
- [1] Stefan Heinrich, Randomized complexity of parametric integration and the role of adaption I. Finite dimensional case, arXiv:2306.13471
- [2] Stefan Heinrich, Randomized Complexity of Parametric Integration and the Role of Adaption II. Sobolev spaces, arXiv:2306.13499
- [3] Stefan Heinrich, Randomized Complexity of Vector-Valued Approximation, to appear in: A. Hinrichs, P. Kritzer, F. Pillichshammer (Eds.), Monte Carlo and Quasi-Monte Carlo Methods 2022, Springer-Verlag, see also arxiv:2306.13697
3.5 Instant Neural Graphics Primitives
Alexander Keller (NVIDIA – Berlin, DE)
License:
Creative Commons BY 4.0 International license © Alexander Keller
Joint work of: Alexander Keller, Thomas Müller, Alex Evans, Christoph Schied
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs.
3.6 Tractability in unweighted function classes
David Krieg (Johannes Kepler Universität Linz, AT)
License:
Creative Commons BY 4.0 International license © David Krieg
Main reference: David Krieg: “Tractability of sampling recovery on unweighted function classes”, CoRR, Vol. abs/2304.14169, 2023.
It is well-known that the integration problem as well as the problem of sampling recovery in the -norm suffer from the curse of dimensionality on many classical function classes. This includes unweighted Korobov spaces (Sobolev spaces with mixed smoothness) as well as classical smoothness classes such as Hölder classes or the classes . We show that those problems become (polynomially) tractable, even for approximation with , if we further impose the condition that the sum of Fourier coefficients is bounded.
The classes where we obtain tractability are unweighted (invariant under a reordering of the variables) and bigger than the corresponding weighted Korobov spaces. Tractability is achieved by the use of non-linear algorithms, while linear algorithms cannot do the job.
In fact, the tractability result is a relatively simple implication of powerful results by Rauhut and Ward [Appl. Comput. Harmon. Anal. 40 (2016), pp. 321-351] on -minimization. The approach is not limited to the Fourier system, but it remains to be seen whether any “interesting” classes of nonperiodic functions can be tackled.
3.7 Tractability analysis
Peter Kritzer (Österreichische Akademie der Wissenschaften – Linz, AT)
License:
Creative Commons BY 4.0 International license © Peter Kritzer
Joint work of: Peter Kritzer, Josef Dick, Adrian Ebert, Onyekachi Emenike, Fred J. Hickernell, Christian Irrgeher, Gunther Leobacher, Freidrich Pillichshammer, Henryk Wozniakowski
The information complexity of the problem of approximating an operator for normed spaces and is defined as the minimal number of information evaluations needed to find an -approximation. A large literature specifies conditions under which the information complexity for a sequence of numerical problems defined for dimensions grows at a moderate rate when the error threshold decreases and/or the dimension increases, i.e., the sequence of problems is tractable.
In this talk, we give an overview of basic ideas in tractability analysis, highlight classical results for linear problems defined on Hilbert spaces, and outline some more recent results on this subject.
3.8 Optimal confidence intervals for randomized quadrature
Robert J. Kunsch (RWTH Aachen, DE TU Chemnitz, DE)
License:
Creative Commons BY 4.0 International license © Robert J. Kunsch
Main reference: Robert J. Kunsch: “Linear Monte Carlo quadrature with optimal confidence intervals”, CoRR, Vol. abs/2309.09059, 2023.
We study the numerical integration of functions from isotropic Sobolev spaces using finitely many function evaluations within randomized algorithms, aiming for the smallest possible probabilistic error guarantee at confidence level . This error criterion is more challenging than the classical root-mean-squared error which is usually taken for Monte Carlo algorithms within information-based complexity and where only and are put in relation. For spaces consisting of continuous functions, non-linear Monte Carlo methods with optimal confidence properties have already been known, in few cases even linear methods that succeed in that regard.[1] In this talk we promote a method called stratified control variates (SCV), see [2], and by it show that already linear methods achieve optimal probabilistic error rates in the high smoothness regime without the need to adjust algorithmic parameters to the uncertainty . We also analysed a version of SCV in the low smoothness regime where may contain functions with singularities. Here, we observe a polynomial dependence of the error on which cannot be avoided for linear methods. This is worse than what is known to be possible using non-linear algorithms where only a logarithmic dependence on occurs if we tune in for a specific value of . This new way of looking at randomized integration still leaves many questions open, for instance, precise lower bounds for linear methods in the low smoothness regime, as well as universal methods or optimality in mixed-smoothness spaces.
References
- [1] Robert J. Kunsch; Daniel Rudolf, Optimal confidence for Monte Carlo integration of smooth functions, Advances in Computational Mathematics, 45:3095–3122, 2019
- [2] Robert J. Kunsch, Linear Monte Carlo quadrature with optimal confidence intervals, arXiv:2309.09059 [math.NA], 2023
3.9 High-dimensional approximation: Transforming periodic Approximations vs. Rando Fourier Features
Laura Lippert (TU Chemnitz, DE)
License:
Creative Commons BY 4.0 International license © Laura Lippert
We study the problem of scattered-data
approximation on , where we have given sample points and the corresponding function evaluations. We compare two approaches. In the first one we transform functions on to , apply an approximation based on hyperbolic wavelet regression and transform the resulting function back.
In fact, we transform the sample points to points on , evaluate the periodic basis functions at these points, create a matrix and solve the matrix equation with an LSQR-algorithm to get an approximation.
The other approach involves random Fourier features, where we draw frequencies at random and learn coefficients from the given data to construct the approximant, i.e.
We give error estimates for both cases, involving different function spaces. We truncate the ANOVA decomposition to approximate high-dimensional functions of low effective dimension.
References
- [1] Laura Lippert; Daniel Potts, Variable Transformations in combination with Wavelets and ANOVA for high-dimensional approximation, arXiv:2207.12826, 2022
- [2] Laura Lippert; Daniel Potts; Tino Ullrich, Fast Hyperbolic Wavelet Regression meets ANOVA, Numerische Mathematik 154, 155–207, 2023
3.10 On the minimal dispersion in the unit cube
Alexander Litvak (University of Alberta – Edmonton, CA)
License:
Creative Commons BY 4.0 International license © Alexander Litvak
We discuss ideas which lead to recent improvements of upper bounds for the minimal dispersion of a point set in the unit cube and its inverse. We also discuss sharpness of bounds. The talk is partially based on a joint work with G. Livshyts.
3.11 Fixed-radius spherical cap discrepancy
Michelle Mastrianni (University of Minnesota – Minneapolis, US)
License:
Creative Commons BY 4.0 International license © Michelle Mastrianni
Joint work of: Dmitriy Bilyk, Stefan Steinerberger
A seminal result of Beck shows that for any set of points on the -dimensional sphere, there always exists a spherical cap such that the number of points in the cap deviates from the expected value by at least . We refine the result by removing a layer of averaging: we show that, when is not , there exists a set of real numbers such that for each in the set one is always guaranteed to find a spherical cap with radius for which Beck’s result holds. The main ingredient is a generalization of the notion of badly approximable numbers to the setting of Gegenbauer polynomials, which we call Gegenbadly approximable numbers. These are fixed numbers such that the sequence of Gegenbauer polynomials evaluated at avoids being close to in a precise quantitative sense.
3.12 Complexity of composite linear problems
Peter Mathé (Weierstraß Institut – Berlin, DE)
License:
Creative Commons BY 4.0 International license © Peter Mathé
Joint work of: Peter Mathé, Bernd Hofmann
We consider compact composite linear operators in Hilbert space, where the composition is given by some compact operator followed by some non-compact one possessing a non-closed range. Focus is on the impact of the non-compact factor on the overall behavior of the decay rates of the singular values of the composition.
3.13 Comparison of Two Search Criteria for Lattice-based Kernel Approximation
Weiwen Mo (KU Leuven, BE)
License:
Creative Commons BY 4.0 International license © Weiwen Mo
Joint work of: Weiwen Mo, Frances Y. Kuo, Dirk Nuyens, Abirami Srikumar, Ian H. Sloan
The kernel interpolant in a reproducing kernel Hilbert space is optimal in the worst-case sense among all approximations of a function using the same set of function values. In this talk, we compare two search criteria to construct lattice point sets for use in lattice-based kernel approximation. The first candidate, , is based on the power function that appears in machine learning literature. The second candidate, , is a search criterion used for generating lattices for approximation using truncated Fourier series. We find that the empirical difference in error between the lattices constructed using the two search criteria is marginal. The criterion is preferred as it is computationally more efficient and has a bound with a superior convergence rate.
3.14 On the complexity of strong approximation of SDEs with a non-Lipschitz drift coefficient
Thomas Müller-Gronbach (Universität Passau, DE)
License:
Creative Commons BY 4.0 International license © Thomas Müller-Gronbach
We study the complexity of pathwise approximation in -th mean of the solution of a stochastic differential equation at the final time based on finitely many evaluations of the driving Brownian motion. First, we briefly review the case of equations with globally Lipschitz continuous coefficients, for which an error rate of at least in terms of the number of evaluations of the driving Brownian motion is always guaranteed by using the equidistant Euler-Maruyama scheme. Then we illustrate that giving up global Lipschitz continuity may lead to non-polynomial error rates for the Euler-Maruyama scheme or even for any method based on finitely many evaluations of the driving Brownian motion. Finally, we turn to recent complexity results in the case of equations with a drift coefficient that is not globally Lipschitz continuous. Here we focus on scalar equations with a Lipschitz continuous diffusion coefficient and a drift coefficient that satisfies piecewise smoothness assumptions or has fractional Sobolev regularity.
3.15 Training of DNNs with lattice rules
Dirk Nuyens (KU Leuven, BE)
License:
Creative Commons BY 4.0 International license © Dirk Nuyens
Joint work of: Dirk Nuyens, Alex Keller, Frances Kuo, Ian Sloan
Following Mishra, Rusch (2021) and Longo, Mishra, Rusch, Schwab (2021) we consider the so-called “generalization error” which describes how well a trained DNN can approximate a function on the whole domain. In fact this is just the L2 approximation error if measured in the L2-norm. We start from the assumption that the underlying function is periodic and in any Korobov space with smoothness alpha. This is motivated by the periodic model of the random diffusion field when doing uncertainty quantification.
3.16 Super-polynomial Accuracy of Median-of-means
Zexin Pan (Stanford University, US)
License:
Creative Commons BY 4.0 International license © Zexin Pan
Joint work of: Zexin Pan, Art Owen
Digital net is an important class of Quasi-Monte Carlo methods used for multidimensional integration. In the talk, I am going to show digital net randomized by linear scrambling and digital shift exhibits surprising concentration behavior. Taking the median of several digital net averages can exclude outliers and boost the convergence rate from nearly cubic to super-polynomial when the integrand is smooth. I will end with some discussions on the difficulties in building confidence intervals.
3.17 Sampling recovery in the uniform norm
Kateryna Pozharska (National Academy of Sciences of Ukraine – Kyiv, UA TU Chemnitz, DE)
License:
Creative Commons BY 4.0 International license © Kateryna Pozharska
Joint work of: David Krieg, Kateryna Pozharska, Mario Ullrich, Tino Ullrich
Main reference: David Krieg, Kateryna Pozharska, Mario Ullrich, Tino Ullrich: “Sampling recovery in the uniform norm”, CoRR, Vol. abs/2305.07539, 2023.
In the talk, I will present the results of our joint work with David Krieg, Mario Ullrich and Tino Ullrich [1] on the recovery of functions based on function evaluations.
The main emphasis is made on the uniform recovery, however we provide a method to transfer results for -approximation to rather general seminorms (including , ). The underlying (least squares) algorithm is based on a random construction and subsampling based on the solution of the Kadison-Singer problem.
Besides an explicit bound for the corresponding sampling widths, we also obtain some interesting inequalities between the sampling, Kolmogorov and Gelfand widths. Namely, we show that there is an absolute constant , such that for a compact topological space and a compact subset of the following holds
| (1) |
The bound in (1) is optimal up to constants, also if we consider only convex and symmetric and replace the Kolmogorov width by the Gelfand width on the right hand side. This means, that there are linear algorithms using samples that are as good as all algorithms using arbitrary linear information up to a factor of at most . A result that cannot be true without oversampling [2]. Moreover, our results imply that linear sampling algorithms are optimal up to a constant factor for many reproducing kernel Hilbert spaces.
Acknowledgements.
KP would like to acknowledge support by the Philipp Schwartz Fellowship of the Alexander von Humboldt Foundation.
References
- [1] David Krieg; Kateryna Pozharska; Mario Ullrich; Tino Ullrich, Sampling recovery in the uniform norm, arXiv: 2305.07539v2, 2023
- [2] Erich Novak, Deterministic and stochastic error bounds in numerical analysis, Lecture Notes in Mathematics 1349, Springer-Verlag, 1988
3.18 Randomized Euler algorithm for SDEs with drift in integral form and its connection with Perturbed SGD
Pawel Przybylowicz (AGH Univ. of Science Technology-Krakow, PL)
License:
Creative Commons BY 4.0 International license © Pawel Przybylowicz
We will present a version of randomized Euler algorithm for approximation of solutions of stochastic differential equations when the drift is in integral form. We will show upper estimates for the error and some lower bounds in the worst-case model. We will show some relationship of the randomized Euler algorithm with perturbed SGD used in machine learning.
3.19 Infinite-variate Integration and -Approximation
Klaus Ritter (RPTU – Kaiserslautern, DE)
License:
Creative Commons BY 4.0 International license © Klaus Ritter
In this survey we consider two basic computational problems, integration and -approximation, for functions of countably infinite real variables. For motivation we briefly refer to stochastic models based on iid-sequences of random variables, e.g., series expansions of stochastic processes and random fields.
Thereafter we discuss appropriate cost models (information cost) for infinite-variate problems as well as the general structure of the functions spaces (RKHS of tensor product form) that have been studied so far. Finally, we present results (decay of the minimal errors) and key ingredients of the proofs for tensor products of weighted spaces and of spaces of increasing smoothness.
3.20 Integration and -Approximation on Gaussian Spaces
Robin Rüßmann (RPTU – Kaiserslautern, DE)
License:
Creative Commons BY 4.0 International license © Robin Rüßmann
Joint work of: Robin Rüßmann, Michael Gnewuch, Klaus Ritter
We study integration and approximation on reproducing kernel Hilbert spaces with Gaussian kernels of tensor product form. We find new results in the infinite-variate case, and we improve some known results in the finite-variate case.
Our most important tool is a close relation between Gaussian spaces and Hermite spaces of infinite smoothness and tensor product form.
This relation allows us to transform any algorithm for integration or approximation on the Gaussian space into an algorithm for the same problem on the Hermite space and vice versa, preserving error and cost, which means that known upper and lower error bounds for one function space setting also apply to the other setting.
3.21 Large holes in large point sets
Mathias Sonnleitner (Universität Passau, DE)
License:
Creative Commons BY 4.0 International license © Mathias Sonnleitner
Dispersion is a measure of equidistribution of point sets. We give a brief introduction and survey known results with focus on a large number of points compared to the dimension. From the literature on coverings of Euclidean space and convex body approximation, we deduce implications for the spherical case and point to open problems.
3.22 Multi-level quasi-Monte Carlo methods for kernel interpolation in uncertainty quantification
Abirami Srikumar (UNSW Sydney, AU)
License:
Creative Commons BY 4.0 International license © Abirami Srikumar
Joint work of: Abirami Srikumar, Alexander D. Gilbert, Michael Giles, Frances Y. Kuo, Ian H. Sloan
As high-dimensional problems become increasingly prevalent in many applications, the effective evaluation of these problems within the limits of our current technology poses a great hurdle due to the exponential increase in computational cost as dimensionality increases. One class of strategies for evaluating such problems efficiently are quasi-Monte Carlo (QMC) methods.
In this talk, we explore the effectiveness of multi-level quasi-Monte Carlo methods for approximating solutions to elliptic partial differential equations with stochastic coefficients that depend periodically on the stochastic parameters. In particular, we are interested in fast approximation using kernel-based lattice point interpolation. We present some regularity results, theory on the convergence properties of errors of such approximations and the results of numerical experiments.
3.23 Well-distributed Point Configurations
Stefan Steinerberger (University of Washington – Seattle, US)
License:
Creative Commons BY 4.0 International license © Stefan Steinerberger
This is a survey talk to describe several different problems in the broad framework of well-distributed point configurations. Problems discussed include: low-discrepancy sets of points, minimizers of energy functionals, greedy sequences, problems on the sphere and minimizer of the logarithmic energy as well as the connection to the crystallization conjecture.
3.24 Improving diversity in StableDiffusion with genetic crossover and well distributed point configurations
Olivier Teytaud (Meta AI Research – Tournon-sur-Rhone, FR)
License:
Creative Commons BY 4.0 International license © Olivier Teytaud
Joint work of: Olivier Teytaud, Mariia Zameshina, Laurent Najman, Mathurin Videau
Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms, including color diversity. Diverse Diffusion is a general unsupervised technique that can be applied to existing text-to-image models. Our approach focuses on finding vectors in the Stable Diffusion latent space that are distant from each other. We generate multiple vectors in the latent space until we find a set of vectors that meet the desired distance requirements and the required batch size. To evaluate the effectiveness of our diversity methods, we conduct experiments examining various characteristics, including color diversity, LPIPS metric, and ethnicity/gender representation in images featuring humans. The results of our experiments emphasize the significance of diversity in generating realistic and varied images, offering valuable insights for improving text-to-image models. Through the enhancement of image diversity, our approach contributes to the creation of more inclusive and representative AI-generated art.
We also present genetic crossovers for combining latent variables into a better latent variable.
After the seminar, we start a cool collaboration with Carola, Stefan, Dmitry, Laurent, for producing fantastic point configurations in high-dimensional point spaces.
Joint work ESIEE and Meta.
3.25 On optimal approximation based on random samples
Mario Ullrich (Johannes Kepler Universität Linz, AT)
License:
Creative Commons BY 4.0 International license © Mario Ullrich
Joint work of: Mario Ullrich, Matthieu Dolbeault, David Krieg, Katerina Pozharska, Mathias Sonnleitner, Tino Ullrich
In this talk we give an overview of some recent and not so recent developments in the area of approximation of functions based on function evaluations. The emphasis is on information-based complexity and the worst-case setting, i.e., we ask for the minimal number of information (aka measurements) needed by any algorithm to achieve a prescribed error for all inputs, basically ignoring implementation issues.
However, it turned out that in many cases, certain (unregularized) least squares methods based on “random” information, like function evaluations, can catch up with arbitrary algorithms based on arbitrary linear information, i.e., the best we can do theoretically.
3.26 Sampling numbers of smoothness classes via minimization
Tino Ullrich (TU Chemnitz, DE)
License:
Creative Commons BY 4.0 International license © Tino Ullrich
Joint work of: Thomas Jahn, Tino Ullrich, Felix Voigtlaender
Using techniques developed recently in the field of compressed sensing we show new upper bounds for general (nonlinear) sampling numbers of (quasi-)Banach smoothness spaces in . In particular, we show that in relevant cases such as mixed and isotropic weighted Wiener classes or Sobolev spaces with mixed smoothness, sampling numbers in can be upper bounded by best n-term trigonometric widths in . We describe a recovery procedure from m function values based on -minimization (basis pursuit denoising). With this method, a significant gain in the rate of convergence compared to recently developed linear recovery methods is achieved. In this deterministic worst-case setting we see an additional speed-up of (up to log factors) compared to linear methods in case of weighted Wiener spaces. For their quasi-Banach counterparts even arbitrary polynomial speed-up is possible. Surprisingly, our approach allows to recover mixed smoothness Sobolev functions belonging on the -torus with a logarithmically better rate of convergence than any linear method can achieve when and is large. This effect is not present for isotropic Sobolev spaces.
3.27 Some Computational Approaches to the Pair Correlation Statistic
Christian Weiß (Hochschule Ruhr-West – Mülheim, DE)
License:
Creative Commons BY 4.0 International license © Christian Weiß
The pair correlation statistic measures the behavior of gaps between the first elements of a sequence on a local scale. Although a generic independent, identically distributed random sequence drawn from uniform distribution has so-called Poissonian pair correlations, there are only few explicitly known such examples. The main reason why they are difficult to find is that it is in general hard to calculate the pair correlation statistic. In this talk, we therefore concentrate on computational aspects of the pair correlation and present some possible approaches. For instance, geometric properties like the gap structure, an underlying lattice structure of the sequence or the discrepancy turn out to be useful tools. We also present some recent combinatorial results.
3.28 The fixed vector randomised lattice algorithm for high-dimensional integration
Laurence Wilkes (KU Leuven, BE)
License:
Creative Commons BY 4.0 International license © Laurence Wilkes
Joint work of: Frances Y. Kuo, Dirk Nuyens, Laurence Wilkes
The fixed vector algorithm offers a very simple solution to the problem of producing a lattice-based randomised algorithm for numerical integration with the optimal randomised error. We shift the construction of the generating vector of the lattice to a precomputation so that the only randomised element is to choose the number of function evaluations from a suitable range. In the talk, we will look at the existence result for such a fixed generating vector, a method to construct the vector and finally look at how the algorithm can be generalised to work in the half-period cosine space and with a lower smoothness parameter.
3.29 Randomized approximation of summable sequences: adaptive and non-adaptive
Marcin Wnuk (Universität Osnabrück, DE)
License:
Creative Commons BY 4.0 International license © Marcin Wnuk
Joint work of: Marcin Wnuk, Erich Novak, Robert Kunsch
We are considering approximation of the identical embedding of finite-dimensional spaces by randomized algorithms. We establish a lower bound on the -th minimal error which is, roughly speaking, of the form: there exists an such that if for all
then
Here denotes the -th minimal error which can be obtained when using randomized non-adaptive algorithms. This has at least two interesting consequences:
-
1.
We are able to give an example of a sequence of linear problems for which the -th minimal error for randomized adaptive algorithms decreases much faster than the minimal error for randomized non-adaptive algorithms-a phenomenon which was only recently observed by S.Heinrich in the context of vector-valued mean computation.
-
2.
It enables us to show that the only operators which can be arbitrarily well approximated by randomized non-adaptive algorithms are compact operators- i.e. exactly the same operators which can be arbitrarily well approximated by deterministic algorithms.
4 Open problems
4.1 Low discrepancy sets in 2D
Dmitriy Bilyk (University of Minnesota – Minneapolis, US)
License:
Creative Commons BY 4.0 International license © Dmitriy Bilyk
Problem 1: Permutation sets
This problem is concerned with new constructions of two-dimensional low discrepancy sets. Two standard examples of sets achieving the lowest possible growth of the discrepancy are the van der Corput set and the Fibonacci lattice.
The digit-reversing van der Corput set is a set of points of the following form
where the coordinates are written as binary fractions of length , and the binary digits take values zero or one. Many other variations are known (digit shifts and digit scrambling, cyclic shifts, constructions in bases other than two etc).
The Fibonacci lattice consists of , where is the Fibonacci number, and has the form
This can be viewed as an approximation of the Kronecker lattice, i.e. points , where is the golden ratio. This generalizes to similar constructions using approximants to other badly approximable numbers.
One can easily observe that both of the examples presented above are of the form
where is a permutation of an -elements set . Therefore, natural questions arise:
-
Which other permutations produce low discrepancy sets?
-
Which structural and combinatorial properties of the permutation are responsible for low discrepancy of the arising point sets?
-
Do minimizers of the discrepancy always have a structure of such “permutation sets”? (at least, if restricted to the grid ?)
-
Are (or more generally, discrepancies) minimized by “permutation sets”?
-
Closely related question: For a set of points , where , consider the tensor-product interaction energy of the form
Under which conditions on the function are the minimizers of this energy “permutation sets”? And which permutations produce minimizers? (Through Warnock-type formulas, the periodic discrepancy can be rewritten exactly as such an energy, and other versions of the discrepancies are closely related to similar energies.)
-
Are there generalizations of permutation constructions to higher dimensions?
Problem 2: Projections of the vertices of the unit cube
The aforementioned van der Corput set can be realized as an image of the vertices of the -dimensional unit cube under a linear mapping given by the matrix
-
Can other mappings be constructed which similarly produce low discrepancy sets?
-
Can probabilistic results be proved for the discrepancy of random projections?
-
Can this approach be extended to construct well distributed sets in higher dimensions?
4.2 Standard information versus linear information for -approximation
David Krieg (Johannes Kepler Universität Linz, AT)
License:
Creative Commons BY 4.0 International license © David Krieg
Let be a compact domain of unit volume and let be the space of continous complex-valued functions on . Consider a compact, convex and symmetric subset of . For , we want to compare the rate of convergence of the numbers
and
for . These numbers describe the minimal worst case error for -approximation on if function values, respectively arbitrary continuous linear measurements are allowed. The rate of convergence of a non-negative and non-increasing sequence , denoted by , is the supremum of all such that .
How much can we loose in the rate when restricting from arbitrary linear information to function evaluation? That is, the task is to study the loss function
How does the loss function behave as a function of and ?
References
- [1] J. Creutzig, P. Wojtaszczyk. Linear vs. nonlinear algorithms for linear problems. Journal of Complexity 20 (2004), 807–820.
- [2] M. Dolbeault, D. Krieg, M. Ullrich. A sharp upper bound for sampling numbers in . Applied and Computational Harmonic Analysis 63 (2023), 113–134.
- [3] D. Krieg, K. Pozharska, M. Ullrich, T. Ullrich. Sampling recovery in the uniform and other norms, preprint, https://arxiv.org/abs/2305.07539.
- [4] E. Novak, H. Woźniakowski. Tractability of multivariate problems. Volume III: Standard information for operators. Volume 18 of EMS Tracts in Mathematics. EMS, Zürich, 2012.
4.3 Are linear algorithms non-adaptive?
Mathias Sonnleitner (Universität Passau, DE)
License:
Creative Commons BY 4.0 International license © Mathias Sonnleitner
Let be normed spaces (over ) and be linear and bounded. Suppose that for each , we have pieces of adaptive information
based on linear and continuous functionals on . Let be a linear algorithm with .
Can we find , where are linear and (linear) such that ?
This basic problem was solved completely using elementary means in a discussion with David Krieg, Robert Kunsch and Marcin Wnuk during the seminar.
4.4 4-regular graphs from the van der Corput sequence
Stefan Steinerberger (University of Washington – Seattle, US)
License:
Creative Commons BY 4.0 International license © Stefan Steinerberger
Consider the following test for pseudorandomness: to any distinct, real numbers , we associate a 4-regular graph as follows: using to denote the permutation ordering the elements, , we build a graph on by connecting and (cyclically) and and (cyclically).
One reason why these graphs are interesting is the following result for the second eigenvalue of the adjacency matrix of such a graph.
Theorem 1 (Friedman, Theorem 1.2. in [1]).
If are i.i.d. random variables chosen from an absolutely continuous distribution and is the 4-regular graph constructed from them as above, then for any there exists such that with likelihood at least , we have
As it turns out, this graph construction can indeed be effectively used for understanding the strength of a random number generator: if the numbers are truly random, the arising graphs are random in a “near-”optimal way [3].
One could now wonder what happens to these graphs when the underlying real numbers are far from “random” and two canonical examples are the Kronecker sequence and the van der Corput sequence. Fig. 2 shows the type of graph obtained from the van der Corput sequence in base 2
We observe the emergence of some strange topological features that are currently unexplained.
Problem.
What is this emerging manifold? How does it depend on the base of the van der Corput sequence?
Something similar happens to the Kronecker sequence but there it is perhaps slightly less mysterious (the continued fraction expansion can help to explain it). It has since become clear that this graph structure tends to exhibit all sorts of strange patterns for structured sequences and many examples are given in [2]. Very few of these cases are understood.
References
- [1] J. Friedman, A proof of Alon’s second eigenvalue conjecture and related problems, Mem. Amer. Math. Soc. 195 (2008), no. 910
- [2] D. Korssjoen, B. Li, S. Steinerberger, R. Tripathi, R. Zhang, Finding Structure in Sequences of Real Numbers via Graph Theory: a Problem List, Involve 15, 251-270 (2022).
- [3] S. Steinerberger, Using Expander Graphs to test whether samples are i.i.d, arXiv:2008.01153
4.5 On a problem of classes of optimal information
Mario Ullrich (Johannes Kepler Universität Linz, AT)
License:
Creative Commons BY 4.0 International license © Mario Ullrich
Joint work of: Mario Ullrich, Mathias Sonnleitner
Main reference: Mathias Sonnleitner, Mario Ullrich: “On the power of iid information for linear approximation”, CoRR, Vol. abs/2310.12740, 2023.
New open problem arise after recent results on the power of standard information (function values), and more general classes , indicate that already under mild assumptions on we might hope for optimal approximations, i.e., that the corresponding minimal errors are of the same order as minimal errors based on arbitrary information. Several of related open problems can be found in the survey paper [1], which was nearly finished during the Dagstuhl Seminar 23351. A (simple, and too optimistic) version of an open problem was presented at the seminar, and already disproved in a subsequent discussion with David Krieg and Erich Novak.
References
- [1] Mathias Sonnleitner, Mario Ullrich. On the power of iid information for linear approximation, arXiv:2310.12740, 2023.
5 Participants
-
Dmitriy Bilyk – University of Minnesota – Minneapolis, US
-
François Clément – Sorbonne University – Paris, FR
-
Ronald Cools – KU Leuven, BE
-
Steffen Dereich – Universität Münster, DE
-
Benjamin Doerr – Ecole Polytechnique – Palaiseau, FR
-
Carola Doerr – CNRS, Sorbonne University – Paris, FR
-
Michael Giles – University of Oxford, GB
-
Michael Gnewuch – Universität Osnabrück, DE
-
Kumar Harsha – Universität Osnabrück, DE
-
Stefan Heinrich – RPTU – Kaiserslautern, DE
-
Alexander Keller – NVIDIA, DE
-
David Krieg – Johannes Kepler Universität Linz, AT
-
Peter Kritzer – Österreichische Akademie der Wissenschaften – Linz, AT
-
Thomas Kühn – Universität Leipzig, DE
-
Robert J. Kunsch – RWTH Aachen, DE & TU Chemnitz, DE
-
Frances Y. Kuo – UNSW Sydney, AU
-
Laura Lippert – TU Chemnitz, DE
-
Alexander Litvak – University of Alberta – Edmonton, CA
-
Michelle Mastrianni – University of Minnesota – Minneapolis, US
-
Peter Mathé – Weierstraß Institut – Berlin, DE
-
Weiwen Mo – KU Leuven, BE
-
Thomas Müller-Gronbach – Universität Passau, DE
-
Erich Novak – Friedrich-Schiller-Universität Jena, DE
-
Dirk Nuyens – KU Leuven, BE
-
Zexin Pan – Stanford University, US
-
Kateryna Pozharska – National Academy of Sciences of Ukraine – Kyiv, UA & TU Chemnitz, DE
-
Pawel Przybylowicz – AGH Univ. of Science & Technology-Krakow, PL
-
Klaus Ritter – RPTU – Kaiserslautern, DE
-
Daniel Rudolf – Universität Passau, DE
-
Robin Rüßmann – RPTU – Kaiserslautern, DE
-
Mathias Sonnleitner – Universität Passau, DE
-
Abirami Srikumar – UNSW Sydney, AU
-
Stefan Steinerberger – University of Washington – Seattle, US
-
Olivier Teytaud – Meta AI Research – Tournon-sur-Rhone, FR
-
Mario Ullrich – Johannes Kepler Universität Linz, AT
-
Tino Ullrich – TU Chemnitz, DE
-
Jan Vybíral – Czech Technical University – Prague, CZ
-
Christian Weiß – Hochschule Ruhr-West – Mülheim, DE
-
Laurence Wilkes – KU Leuven, BE
-
Marcin Wnuk – Universität Osnabrück, DE