On the Instance Optimality of Detecting Collisions and Subgraphs
Abstract
Suppose you are given a function via (black-box) query access to the function. You are looking to find something local, like a collision (a pair s.t. ). The question is whether knowing the “shape” of the function helps you or not (by shape we mean that some permutation of the function is known). Formally, we investigate the unlabeled instance optimality of substructure detection problems in graphs and functions. A problem is -instance optimal if it admits an algorithm satisfying that for any possible input, the (randomized) query complexity of is at most times larger than the query complexity of any algorithm which solves the same problem while holding an unlabeled copy of the input (i.e., any that “knows the structure of the input”). Our results point to a trichotomy of unlabeled instance optimality among substructure detection problems in graphs and functions:
-
A few very simple properties have an -instance optimal algorithm.
-
Most properties of graphs and functions, with examples such as containing a fixed point or a -collision in functions, or a triangle in graphs, are -far from instance optimal for some constant .
-
The problems of collision detection in functions and finding a claw in a graph serve as a middle ground between the two regimes. We show that these two properties are not -instance optimal, and conjecture that this bound is tight. We provide evidence towards this conjecture, by proving that finding a claw in a graph is -instance optimal among all input graphs for which the query complexity of an algorithm holding an unlabeled certificate is .
Keywords and phrases:
instance optimality, instance complexity, unlabeled certificate, subgraph detection, collision detectionCategory:
Track A: Algorithms, Complexity and GamesFunding:
Omri Ben-Eliezer: Supported in part by a Taub Family Foundation “Leaders in Science & Technology” fellowship.Copyright and License:
![[Uncaptioned image]](x1.png)
2012 ACM Subject Classification:
Theory of computation Streaming, sublinear and near linear time algorithms ; Theory of computation Computational complexity and cryptography ; Theory of computation Randomness, geometry and discrete structuresAcknowledgements:
Research partially conducted while Omri Ben-Eliezer was at Weizmann Institute, MIT, and Simons Institute.Editors:
Keren Censor-Hillel, Fabrizio Grandoni, Joël Ouaknine, and Gabriele PuppisSeries and Publisher:

1 Introduction
Efficient detection of small structures in complex data is a fundamental challenge across computer science. In this work, we explore to what extent prior knowledge on the input may help. Consider, for instance, the problem of detecting a collision in an unknown function given query access to . (Here, a collision in is a pair of disjoint elements so that .) We ask the following question.
How does an algorithm that knows nothing about in advance (aside from the domain size ) compare to an algorithm that has some prior knowledge an the structure of ?
The prior knowledge we consider in this work takes the form of an unlabeled copy of that the algorithm receives in advance as in Grossman et al. [8]. That is, the algorithm receives a permutation of – the composed function for some unknown permutation – as an “untrusted hint”. We typically call this permutation of an unlabeled certificate; we require the algorithm to be correct with good probability regardless of whether the hint is correct (i.e., even if is not a permutation of the unlabeled certificate). However, the number of queries made by the algorithm is only measured if the true input is indeed a permutation of the unlabeled certificate.
In the worst case, clearly queries are necessary, whether we know anything about the structure of or not. But are there beyond-worst-case instances where holding additional structural information on may accelerate collision detection?
Definition 1 (instance optimality; informal).
A randomized Las Vegas111For simplicity we consider in this paper Las Vegas randomized algorithms, but all of the results apply also to Monte Carlo type algorithms (that allow some error in the returned value). algorithm deciding if an unknown function satisfies a property is instance optimal if there exists an absolute constant satisfying the following. For every function , and any randomized algorithm for the same task, the following holds:
where the operator refers to the expected number of queries that an algorithm makes on a certain input.
Finally, we say that is instance optimal if there exists an instance optimal algorithm for it.
Note the order of the quantifiers in the definition: for every , the algorithm has to compete with an algorithm that “specializes” to functions of the form . In other words, an algorithm is instance optimal if it performs as well as every algorithm , that knows the structure of , but not the actual labels. Note that the correctness of algorithm is unconditional – that is must be correct even if the structure of doesn’t match the certificate receives.
The expression on the right-hand side of the definition is a “min max” on: The minimum over all the algorithms where the complexity of the algorithm is measured over the worst-case input that is isomorphic to the certificate.
An algorithm being unlabeled instance optimal means it always performs as well (up to a constant) as the algorithm that knows the structure of the input. If there is no instance optimal algorithm, that means there exists some function where knowing the structure of the function is helpful. Thus, instance optimality is a strong requirement: If a property is instance optimal that means that knowing the structure of the input function never helps. When a property is not instance optimal, it will sometimes be useful to discuss its “distance” from instance optimality.
Definition 2 (distance from instance optimality; informal).
Consider the setting of Definition 1. For a function that grows to infinity as , we say that is -far from instance optimality if for every and algorithm there exist a function and an algorithm satisfying
Conversely, is -close to instance optimality if there exists an Algorithm , such that for every function and every randomized algorithm the following holds:
We may now rephrase our initial question about collisions in the language of instance optimality. Is collision detection an instance optimal problem? I.e., is the property of containing a collision instance optimal? Is it far from instance optimality? Suppose that we have query access to a function and are interested in finding a collision. There are two fundamental types of queries to that one can make: the first option is to query an element that we have already seen in the past, by which we mean that we have already queried some element satisfying that . This option amounts to extending a “walk” on the (oriented) graph of . The second option is to query a previously unseen element , which amounts to starting a new walk. The question, then, is the following: is there a universal algorithm (which initially knows nothing about ) for choosing when to start new walks, and which walks to extend at any given time, that is competitive with algorithms that know the unlabeled structure of ?
Substructure detection problems
There are many other types of natural problems in computer science that involve small (i.e., constant-sized) substructure detection. A natural generalization of a collision is a -collision (or multi-collision), where we are interested in finding different elements satisfying . Fixed points, i.e., values for which , are important in local search and optimization problems, in particular for the study of local maxima or minima in an optimization setting.
Subgraph detection in graphs is also a fundamental problem in the algorithmic literature. Motifs (small subgraphs) in networks play a central role in biology and the social sciences. In particular, detecting and counting motifs efficiently is a fundamental challenge in large complex networks, and a substantial part of the data mining literature is devoted to obtaining efficient algorithms for these tasks. It is thus natural to ask: is it essential to rely on specific properties of these networks in order to achieve efficiency? In other words, are subgraph detection and counting instance optimal problems?
Similarly, the problem of finding collisions is a fundamental one in cryptography. Many cryptographic primitives are built around the assumption that finding a collision for some function, is hard (e.g. efficiently signing large documents, commitments with little communication and of course distributed ledgers such as blockchain). If one wants to break such a cryptographic system, should one spend resources studying the structure of ? If finding collisions is instance optimal, that would mean that any attempt to find collisions by studying the structure of a function is destined to be futile.
In this work we focus on the instance optimality of constant-size substructure detection problems in graphs and functions. Before stating our results, let us briefly discuss these data models.
Models
We consider two different types of data access in our work. The first type is that of functions. In this case the input is some function , and the goal is to determine whether satisfies a certain property (e.g., whether it contains a collision or a fixed point). In this case the goal of an instance optimal algorithm is to perform as well as an algorithm that receives, as an untrusted hint, the unlabeled structure of the function without the actual assignment of labels. Here the complexity is measured as the number of queries an algorithm makes, where each query takes an input and returns .
The second type of data is of graphs. Here the goal is to find a constant-sized subgraph. An instance optimal algorithm should perform as well as an algorithm that is given an isomorphic copy of the graph as an “untrusted hint”. For simplicity, we focus on the standard adjacency list model (e.g., [7]). Here for each vertex the algorithm knows the vertex set in advance, and can query the identity of the -th neighbor of a vertex (for and of its choice, and according to some arbitrary ordering of the neighbors), or the degree of . We note that all of the results also hold in other popular graph access models, including the adjacency matrix model and the neighborhood query model.
Interestingly, graphs and functions seem closely related in our context. Specifically, the problem of finding a claw in a graph (a star with three edges) is very similar to that of finding a collision in a function, and the results we obtain for these problems are for the most part analogous.
1.1 Main Results and Discussion
Our main result in this paper characterizes which substructure detection problems in functions and graphs are instance optimal. Let us start with the setting of functions.
A structure is an oriented graph where each vertex has outdegree at most one, and we say that contains as a substructure if there exist values such that if and only if the edge exists in . (For example, a collision corresponds to the structure .) Finally, the property includes all functions containing the structure . Our first theorem constitutes a partial characterization for instance optimality in functions.
Theorem 3 (Instance optimality of substructure detection in functions).
Let be a connected, constant-sized oriented graph with maximum outdegree , and consider the function property of containing as a substructure. Then is
-
1.
Instance optimal if is a simple oriented path of length ;
-
2.
-far from instance optimal for any that contains a fixed point, two edge-disjoint collisions, or a -collision;
-
3.
-far from instance optimal for any that contains a collision.
Similarly, in graphs we denote by the property of containing as a (non-induced) subgraph. Our next theorem provides a characterization for the instance optimality of subgraph detection.
Theorem 4 (Instance optimality of subgraph detection in graphs).
Let be a connected, constant-sized graph with at least one edge. Then is:
-
1.
Instance optimal if is an edge or a wedge (path of length 2);
-
2.
-far from instance optimal if is any graph other than an edge, a wedge, or a claw (a star with edges);
-
3.
-far from instance optimal when is a claw.
Almost instance optimality of claws and collisions
While we provide a full characterization of those substructures (or subgraphs) for which is instance optimal, there remains a notable open problem: is the problem of containing a collision (in functions) or a claw (in graphs) “almost instance optimal”, e.g., is it -close to instance optimality?
The problems of finding a collision in a function and detecting a claw in a graph are closely related and seem to be similar in nature. We conjecture that both of these problems are close to being instance optimal.
Conjecture 5.
There exists an algorithm for collision detection (in functions ) that is -close to instance optimality.
Conjecture 6.
Determining if a graph contains a claw is -close to instance optimality.
While we are not yet able to prove the conjectures in full generality, we provide initial evidence toward the correctness, at least in the graph case. Specifically, we prove Conjecture 6 for graphs in which claw detection is “easy” with a certificate, that is, can be done in up to queries.
Theorem 7 (Informal).
The graph property of containing a claw is -instance optimal when restricted to inputs that require queries in expectation for an algorithm with an unlabeled certificate.
While the result was phrased for undirected graphs, it carries on also for collision detection in functions , in the case where the algorithm is allowed to go both “forward” (i.e., for an to retrieve ) and “backward” (i.e., for to retrieve elements of the inverse set ). See the paragraph below on model robustness for further discussion.
We conjecture that the same algorithm we use to show near instance optimality in the regime of Theorem 7 is also near instance optimal in the general regime. The algorithm is roughly defined as follows. maintains parallel “walks” at different “scales”, where in each round (consisting of a total of queries) adds one step to each of the walks. We try to extend each until it reaches length or until it has to end (either because of finding a collision/claw or due to reaching the end of a path or closing a cycle). In the case that reaches length , we “forget” it and restart at a fresh random starting point.
The challenge of merging walks
The only barrier to proving the above two conjectures in the general case seems to be our current inability to deal with “merging” walks in the algorithm. Any algorithm for collision detection in functions, or claw detection in graphs, can be viewed as maintaining a set of walks. In each step we either choose to start a new walk by querying a previously unseen vertex, or extend an existing walk by querying its endpoint (or one of its two endpoints, in the graph case). The event of merging corresponds to the case that two disjoint walks and meet, resulting in the creation of a longer walk . Our proof of Theorem 7 shows the instance optimality of claw detection in the regime where merging is unlikely to happen during the algorithm’s run.
Model robustness
Throughout the paper we chose to focus on specific models for convenience. However, all our results are model robust and apply in many “natural” models. In particular, in the case of functions we chose to work on the model where an algorithm can only go forward. That is, an algorithm can query in a black box manner, and doesn’t have the capability to make inverse/backward () queries. Similar characterization results to the graph case also apply if an algorithm can walk backwards; in fact, the model where walking backward is allowed seems to serve as a middle ground between our models for graphs and functions, in the sense that we deal with directed graph properties but are allowed to move in the graph as if it were undirected.
For convenience we wrote all our results for Las Vegas randomized algorithms. All the results in this paper also apply if we require the algorithm to be a Monte Carlo randomized algorithm, i.e., one that is allowed to err with constant probability.
In graphs, we use the popular adjacency list model (which allows sampling random vertices, querying a single neighbor, or querying the degree of a vertex) for data access. The same characterization results also apply under other types of data access, such as the adjacency matrix model or the neighborhood query model (where querying a node retrieves all of its neighbors at once).
1.2 Technical Overview: Collisions and Fixed Points
In this section we give an overview of our main ideas and techniques. Many of the ideas are shared between functions and graphs; we chose to present the main ideas for a few canonical problems, such as fixed point and collision detection in functions, and claw detection in graphs.
Showing the polynomial separation for most graph and function properties amounts, roughly speaking, to providing constructions where a certain substructure is hidden, but where certain hints are planted along the way to help the algorithm holding a certificate to navigate within the graph. Given the constructions, which are themselves interesting and non-trivial, it is not hard to prove the separation. As an example of a polynomial separation construction and result, we discuss the case of a fixed point in functions. For more general statements and proofs regarding these separations, please refer to Sections 4 (for functions) and 5 (for graphs) in the full paper on arxiv: https://doi.org/10.48550/arXiv.2312.10196.
The -separation for claws and collisions is the most technically involved “lower bound” type contribution of this paper. Unlike the polynomial separation results, where the core idea revolves around describing the “right” way to hide information, here the construction is more complicated (roughly speaking): the trick of planting hints that allow the algorithm to navigate does not work well, and our arguments rely on the observation that it is sometimes essential for an algorithm without a certificate to keep track of multiple different scales in which relevant phenomena may emerge, compared to an algorithm with a certificate that knows in advance which of the scales is relevant. The proof is also more challenging, requiring us to closely track counts of intermediate substructures of interest. For the sake of the current discussion, we focus on collision detection, but the proof (and construction) for claws is very similar.
Before diving into the ideas behind fixed point and collision detection, let us briefly mention the simplest component in the characterization: an instance optimal algorithm for finding a path of length . The algorithm chooses a random value and evaluates the function times on successive values to see if a path of length emerges (and not a smaller cycle, or a smaller path ending in a fixed point). This is repeated until a path is found or all values have been exhausted. It is instance optimal, since knowing the structure of the function does not help; stopping after less than steps is meaningless, since it only saves us a constant fraction of the queries.
1.2.1 Fixed point detection: Polynomially far from instance optimality
We give an overview of the proof that finding a fixed point is polynomially far from instance optimality. Small variations of the constructions can be used to show that the same is true for any structure containing a fixed point, a -collision, or two edge-disjoint collision.
In order to obtain such a result we provide a distribution of functions that have several fixed points with a secret parameter so that an algorithm with a certificate (knowing the parameter in this case) can find a fixed point in queries while any algorithm that does not know the secret parameter (i.e. without a certificate) requires queries to find a fixed point.
The idea is to construct a function with cycles of size roughly , where one random value in one of the cycles is turned into a fixed point (which effectively turns the said cycle into a path ending at ). It is quite clear that for such a distribution finding the fixed point takes time . But we want to add some information or hint that will allow a certificate holder to find out which is the “correct” cycle.
To give such a hint we add to each cycle many paths of length entering it. The distance between two paths entering the th cycle is some (unique) prime where is of size roughly (so roughly paths enter the cycle). See Figure 1 for a drawing of this construction.
The hint is the value associated with the unique cycle that ends up with a fixed point. The algorithm (with the hint) we propose will check many (about ) “short” (length ) paths and see when they collide with another set of paths that is supposed to be on the cycles (these are “long” paths of length ). Once our algorithm finds three paths entering the same cycle which are of distances that are all a multiple of , the algorithm will conclude that this is the unique path that at its end the fixed point resides and will continue on the path. On the other hand, for any algorithm that does not know which of the ’s is the chosen one and hence which path ends in a fixed point, each residing in a cycle is equally likely to be a fixed point, and thus the algorithm requires queries in expectation.
1.2.2 Finding Collisions: far from instance optimality
The distribution constructed above will not work for collision detection, since functions generated according to this distribution will inherently have many collisions. Below we describe a substantially different construction demonstrating that collision detection is (at least) logarithmically far from instance optimality. We note that the same proof outline, and same construction idea can also be used to show that finding a claw in a graph is not instance optimal.
In order to obtain such a result we provide a distribution of functions that have several collisions, again, with a secret parameter, so that an algorithm with a certificate (knowing the parameter in this case) can find a collision in queries for some constant , while any algorithm that does not know the secret parameter (i.e. without a certificate) requires queries to find a collision.
The hard distribution is as follows: there are length scales. For scale we have cycles, each of length (note that the total number of nodes in all cycles is ). For a uniformly randomly chosen scale we turn of the cycles to be a path ending in a loop of size at the end (this is a collision).
The secret parameter is the value of . The algorithm with a certificate simply picks a value at random and follows the path it defines for steps. The algorithm stops if (i) a collision is discovered or (ii) the path has reached length without a collision or (iii) the path created a cycle of length . The probability of picking a node on a good path (one ending in a collision of length ) is
(since there are such cycles, each of size ). The cost (in terms of queries) of picking a value on the wrong size cycle, say of size , is . It is possible to show that the total expected work when picking the wrong value is .222the constant is a bit arbitrary, and other constants larger than 1 will also work. Therefore the expected amount of work until a good path is found is
The result is queries in expectation.
We next show that any algorithm that does not know requires queries, which results in a logarithmic separation from the algorithm with a certificate. In essence, this means that the algorithm needs to spend a substantial effort at all possible scales (instead of just one scale, ) in order to find the collision.
Consider an algorithm without a certificate, and suppose that we choose the secret parameter in an online manner as follows. Our initial construction starts with cycles of length . For each such , we pick of the cycles of length , and color one of the nodes in each such cycle by red (call these points the “-red points”. Note that at this time we have no information whatsoever on . Now, each time that a red point on some cycle of length is encountered, we flip a coin with an appropriate probability (which initially is of order ) to decide whether the current value of is the secret parameter or not. If it is, then we turn all -red points (for this specific value of ) into collisions as described above, and remove the color from all other red points (in paths of all possible lengths). Otherwise, we remove the color from all -red points (for this specific ) and continue.
It turns out that this construction produces the same distribution as we described before (where was chosen in advance). However, it can also be shown that to find a collision with constant probability, red points need to be encountered along the way. The rest of the analysis provides an amortized argument showing that the expected time to find each red vertex by any algorithm is . The main idea of the amortized analysis (which we will not describe in depth here) is to treat cycles in which we made many queries – at least a constant fraction of the cycle – differently from cycles where we made few queries. For cycles of the first type, the probability to find a red point (if one exists) is of order , but the amount of queries that we need to spend is proportional to . For cycles of the second type, each additional query only has probability to succeed finding a red point, but the query cost is only . In both cases, the rate between the success probability and the query cost is of order .
1.2.3 Finding claws: -close to instance optimality in merging-free regime
The proof that collision detection is -far from instance optimality extends very similarly to claw detection in graphs. We next show that this bound is tight in the “low query” regime where admits an algorithm for claw detection (with a certificate) using queries, for a small constant .
Every claw-free graph is a union of disjoint cycles and paths. Thus, every algorithm for finding a claw can be viewed as maintaining a set of walks of several types: Some of the walks may have closed a cycle; others may have reached one or two ends of the path that the walk resides in. All other walks simply consist of consecutive vertices in the interior of some path in .
Clearly, walks of the first type – those that have closed a simple cycle – cannot be extended to find a claw. Our first part of the proof is a reduction eliminating the need to consider walks of the second type (i.e., ones that have reached at least one endpoint). Specifically, we show that for every graph on vertices there is a graph on vertices satisfying several properties:
-
In all simple paths in , either both endpoints have degree or both are degree or more (i.e., are claw centers).
-
The query complexity of detecting a claw in is equal, up to a multiplicative constant, to that of . This is true both with or without an unlabeled certificate.
-
The construction of from is deterministic. Thus, an unlabeled certificate for can be constructed if one has an unlabeled certificate for .
The construction is very simple: we add two new vertices and connect them to all degree- vertices (and to each other, if needed).
Merging without claws requires queries
The second part of our argument shows that one cannot make two walks merge in the first queries (with constant probability and for some small constant ) without finding a claw beforehand. The proof relies on an advanced birthday paradox type analysis that is suitable for adaptive algorithms. For the purpose of this part, one may consider as a union of paths and cycles (without any claws). We bucket these paths and cycles into groups, where in each group all paths and cycles are of roughly the same length – within a multiplicative factor of from each other. Suppose that after each “new” (previously unseen) vertex is queried, the algorithm immediately knows to which bucket this vertex (and the corresponding walk emanating from it) belongs. We show that even in this case the lower bound holds.
Focusing on a specific bucket, let be the set of all walks in this bucket at some point in the algorithm’s run. An assignment of walks to “locations” within the bucket is considered valid if no two walks intersect. We argue that the set of locations of walks is uniformly random among all sets of valid configurations. Similarly to our analysis of the separation (see the previous subsection), our next step in this part deals separately with “short walks” and “long walks”. Very roughly speaking, our proof shows that walks have sufficient “degree of freedom” so that their probability to merge will be very small, even if provided that they lie in the same bucket. We omit the precise details of the analysis in this overview. The proofs can be found in the full paper on arxiv: https://doi.org/10.48550/arXiv.2312.10196.
Asymptotic stochastic dominance of
The third and last part of the argument shows that the algorithm stochastically dominates any other algorithm (asymptotically) in the following sense. Conditioned on an algorithm (making queries) not encountering any merging walks during its operation, the algorithm is at least as likely to find a claw, while using a slightly larger amount of roughly queries.
Recall first how is defined. For let be the algorithm which repeatedly does the following: pick a random vertex in ; make a bidirectional walk from it for steps, or until a claw is found (leading to a “win”) or an endpoint is found (leading to an early termination). maintains one copy of each (for a total of copies), and alternates between them: each round of makes queries, one for each .
Now let be any algorithm operating on graphs on vertices. Consider the event that finds a claw (for the first time) after at most queries through the following process. queries a “new” vertex of distance between and from the claw center , and finds it by completing a walk from to . We claim that the event that finds a claw is equal to the union of the events (for ).
The proof goes through a careful coupling argument between and any fixed (separately). Through the coupling, we may assume that and have access to the same source of randomness generating “new” vertex queries, that is, the -th new vertex starting a walk in is also the -th new vertex in . We may further assume, by symmetry considerations, that respects an “older first” principle: if two walks and have exactly the same “shape” (within ) at some point, then will prefer to extend the walk among them that is older. Now, suppose that finds the claw in its walk , of distance between and from . By the “older first” principle, this implies that for all vertices with that are at least away from an endpoint (call these values of good), must have walked for at least from so far. For all other values of (call these bad), walked a number of steps that is at least the distance from an endpoint. In contrast, makes up to queries around each good vertex, and up two times as many queries as around each bad one.
1.3 Related Work
The term instance optimality was coined by Fagin, Lotem and Naor [6]. If an algorithm always outperforms every other algorithm (up to a constant), particularly the algorithm that is aware of the input distribution, it is defined as being instance optimal. This definition is very strict, and thus there are numerous situations in which it cannot be meaningfully attained. As a result, several works (including ours) address this by necessitating that an instance optimal algorithm be competitive against a certain class of algorithms (for example, those that know an a permutation of the input, rather than the input itself). This notion of instance optimality is sometimes referred to as “unlabeled instance optimality”.
Unlabeled instance optimality
Afshani, Barbay, and Chan [1] considered and presented unlabeled instance-optimal algorithms for locating the convex hull and set maxima of a set of points. Valiant and Valiant [21] developed an unlabeled instance optimal algorithm for approximating a distribution using independent samples from it (with the cost function being the number of samples). Later [22], they provided an algorithm for the identity testing problem. Here the problem is determining, given an explicit description of a distribution, whether a collection of samples was selected from that distribution or from one that is promised to be far away. More recent works on instance optimality in distribution testing include, for example, the work of Hao et al. [14, 13].
Grossman, Komargodski, and Naor examined unlabeled instance optimality in the query model [8]. Their work relaxes the definition of instance optimality by no longer requiring an optimal algorithm to compete against an algorithm that knows the entire input, but rather against an algorithm that knows something about the input. Arnon and Grossman [2] define the notion of min-entropic optimality, where instead of relaxing the ”for-all” quantifier over algorithms, they relax “for-all” quantifier over inputs. That is, for an algorithm to be optimal it is still required to perform as well any other algorithm; however it is no longer required to be optimal on every input distribution, but rather only on a certain class of inputs.
Instance optimality and universal optimality in graphs
Subgraph detection and counting has not been thoroughly investigated from the perspective of instance optimality (but see the discussion below on shortest paths); establishing a unified theory of instance optimality in this context remains an intriguing open problem. However, instance optimality has been investigated for other graph problems of interest. For example, Haeupler, Wajc and Zuzic [12] investigate instance optimality and a related notion called universal optimality in a family of classical and more “global” distributed graph problems such as computing minimum spanning trees and approximate shortest paths. Roughly speaking, universal optimality is a notion of instance optimality suitable for weighted graph problems, where the structure of the graph is known and the weights are unknown. An algorithm is universally optimal if it is competitive with any algorithm that knows the weights of the edges.
A recent exciting breakthrough of Haeupler, Hladík, Rozhoň, Tarjan, and Tětek [9] shows that Dijkstra’s algorithm – with an appropriate choice of heap with beyond worst case guarantees – is, in fact, universally optimal. Another very recent and independent work of the same set of authors [10] shows that (biderctional) Dijkstra is also instance optimal for the problem of bidirectional search in weighted multigraphs (and instance optimal up to a tight term in the unweighted case). While this is perhaps the closest work to ours in the sense that it deals with a variant of a subgraph detection problem (in this case, a shortest path between two given vertices), neither the results nor the techniques translate between this work and ours (or vice versa).
Strong instance optimality
The original, robust definition of instance optimization calls for an algorithm to be superior to every other algorithm. For getting the top aggregate score in a database with the guarantee that each column is sorted, [6] provided an instance-optimal algorithm. Demaine, López-Ortiz, and Munro [5] provided instance-optimal algorithms for locating intersections, unions, or differences of a group of sorted sets. Baran and Demaine [3] showed an instance optimal algorithm for finding the closest and farthest points on a curve. Grossman, Komargodski and Naor [8] and Hsiang and Liu [17] studied instance optimality in the decision tree model.
Cryptography and complexity
The problems of finding a constant sized structure in , where is a total function guaranteed to contain the structure at hand has been studied extensively and is a fundamental problem in computational complexity and there are complexity classes in TFNP defined around it [18, 20]. We note that we can slightly change the functions in our paper to also make them total problems, and all our proofs will still hold.
As mentioned above, the problem of finding collisions is a fundamental one in cryptography. The standard definition is that of a collision resistant hash (CRH), where finding a collision is a computationally hard problem. Such functions are very significant for obtaining efficient signature schemes and commitment to large piece of data using little communication. But other related structures are also considered in the literature: for instance, functions where it is hard to find multiple collisions [15].
Other works
1.4 Full Paper
Due to space constraints, this (conference) version of the paper is an extended abstract including only the introduction (which in itself introduces only some of the main proof ideas and concepts). For full proofs, as well as formal definitions of the model, please see the full paper on arxiv.333https://arxiv.org/abs/2312.10196
References
- [1] Peyman Afshani, Jérémy Barbay, and Timothy M. Chan. Instance-optimal geometric algorithms. J. ACM, 64(1):3:1–3:38, 2017. doi:10.1145/3046673.
- [2] Gal Arnon and Tomer Grossman. Min-entropic optimality. Electron. Colloquium Comput. Complex., TR21-152, 2021. URL: https://eccc.weizmann.ac.il/report/2021/152.
- [3] Ilya Baran and Erik D. Demaine. Optimal adaptive algorithms for finding the nearest and farthest point on a parametric black-box curve. In Proceedings of the 20th ACM Symposium on Computational Geometry (SOCG), pages 220–229, 2004. doi:10.1145/997817.997852.
- [4] Lijie Chen, Jian Li, and Mingda Qiao. Towards instance optimal bounds for best arm identification. In Proceedings of the 2017 Conference on Learning Theory (COLT), volume 65 of Proceedings of Machine Learning Research, pages 535–592, 2017. URL: http://proceedings.mlr.press/v65/chen17b.html.
- [5] Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Adaptive set intersections, unions, and differences. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 743–752, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338634.
- [6] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 66(4):614–656, 2003. doi:10.1016/S0022-0000(03)00026-6.
- [7] Mira Gonen, Dana Ron, and Yuval Shavitt. Counting stars and other small subgraphs in sublinear-time. SIAM Journal on Discrete Mathematics, 25(3):1365–1411, 2011. doi:10.1137/100783066.
- [8] Tomer Grossman, Ilan Komargodski, and Moni Naor. Instance complexity and unlabeled certificates in the decision tree model. In 11th Innovations in Theoretical Computer Science Conference (ITCS), pages 56:1–56:38, 2020. doi:10.4230/LIPIcs.ITCS.2020.56.
- [9] Bernhard Haeupler, Richard Hladík, Václav Rozhon, Robert E. Tarjan, and Jakub Tetek. Universal optimality of dijkstra via beyond-worst-case heaps. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 2099–2130. IEEE, 2024. doi:10.1109/FOCS61266.2024.00125.
- [10] Bernhard Haeupler, Richard Hladík, Václav Rozhon, Robert E. Tarjan, and Jakub Tetek. Bidirectional Dijkstra’s algorithm is instance-optimal. In Ioana Oriana Bercea and Rasmus Pagh, editors, 2025 Symposium on Simplicity in Algorithms, SOSA 2025, New Orleans, LA, USA, January 13-15, 2025, pages 202–215. SIAM, 2025. doi:10.1137/1.9781611978315.16.
- [11] Bernhard Haeupler, Richard Hladík, John Iacono, Václav Rozhoň, Robert E. Tarjan, and Jakub Tětek. Fast and simple sorting using partial information. In Proceedings of the Annual 2025 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3953–3973, 2025. doi:10.1137/1.9781611978322.134.
- [12] Bernhard Haeupler, David Wajc, and Goran Zuzic. Universally-optimal distributed algorithms for known topologies. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC ’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21-25, 2021, pages 1166–1179. ACM, 2021. doi:10.1145/3406325.3451081.
- [13] Yi Hao and Alon Orlitsky. Data amplification: Instance-optimal property estimation. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 4049–4059. PMLR, 2020. URL: http://proceedings.mlr.press/v119/hao20a.html.
- [14] Yi Hao, Alon Orlitsky, Ananda Theertha Suresh, and Yihong Wu. Data amplification: A unified and competitive approach to property estimation. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 8848–8857, 2018. URL: https://proceedings.neurips.cc/paper/2018/hash/a753a43564c29148df3150afb4475440-Abstract.html.
- [15] Ilan Komargodski, Moni Naor, and Eylon Yogev. Collision resistant hashing for paranoids: Dealing with multiple collisions. In EUROCRYPT 2018 - 37th Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 162–194, 2018. doi:10.1007/978-3-319-78375-8_6.
- [16] Zhaoqi Li, Lillian J. Ratliff, Houssam Nassif, Kevin G. Jamieson, and Lalit Jain. Instance-optimal PAC algorithms for contextual bandits. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL: http://papers.nips.cc/paper_files/paper/2022/hash/f4821075019a058700f6e6738eea1365-Abstract-Conference.html.
- [17] Alison Hsiang-Hsuan Liu and Nikhil S. Mande. Instance complexity of boolean functions. CoRR, abs/2309.15026, 2023. doi:10.48550/arXiv.2309.15026.
- [18] Nimrod Megiddo and Christos H. Papadimitriou. On total functions, existence theorems and computational complexity. Theor. Comput. Sci., 81(2):317–324, 1991. doi:10.1016/0304-3975(91)90200-L.
- [19] Shyam Narayanan, Václav Rozhon, Jakub Tetek, and Mikkel Thorup. Instance-optimality in I/O-efficient sampling and sequential estimation. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 658–688. IEEE, 2024. doi:10.1109/FOCS61266.2024.00048.
- [20] Christos H. Papadimitriou. On the complexity of the parity argument and other inefficient proofs of existence. J. Comput. Syst. Sci., 48(3):498–532, 1994. doi:10.1016/S0022-0000(05)80063-7.
- [21] Gregory Valiant and Paul Valiant. Instance optimal learning of discrete distributions. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 142–155, 2016. doi:10.1145/2897518.2897641.
- [22] Gregory Valiant and Paul Valiant. An automatic inequality prover and instance optimal identity testing. SIAM J. Comput., 46(1):429–455, 2017. doi:10.1137/151002526.
- [23] Ivor van der Hoog, Eva Rotenberg, and Daniel Rutschmann. Simpler optimal sorting from a directed acyclic graph. In 2025 Symposium on Simplicity in Algorithms (SOSA), pages 350–355, 2025. doi:10.1137/1.9781611978315.26.