Optimal Antimatroid Sorting
Abstract
The classical comparison-based sorting problem asks us to find the underlying total ordering of a given set of elements, where we can only access the elements via comparisons. In this paper, we study a restricted version, where, as a hint, a set of possible total orderings is given, usually in some compressed form.
Recently, an algorithm called topological heapsort with optimal running time was found for case where is the set of topological orderings of a given directed acyclic graph, or, equivalently, is the set of linear extensions of a partial ordering [Haeupler et al. 2024]. We show that a simple generalization of topological heapsort is applicable to a much broader class of restricted sorting problems, where corresponds to a given antimatroid.
As a consequence, we obtain optimal algorithms for the following restricted sorting problems, where the allowed total orders are …
-
…restricted by a given set of monotone precedence formulas;
-
…the perfect elimination orders of a given chordal graph; or
-
…the possible vertex search orders of a given connected rooted graph.
Keywords and phrases:
sorting, working-set heap, greedy, antimatroidCopyright and License:
2012 ACM Subject Classification:
Theory of computation Design and analysis of algorithmsEditors:
Anne Benoit, Haim Kaplan, Sebastian Wild, and Grzegorz HermanSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
One of the fundamental problems in theoretical computer science is sorting. It has a wide variety of applications in practice [24] and is frequently used as an introduction to design and analysis of algorithms.
Perhaps the most intensely studied variant is comparison-based sorting, where one can only access the input elements via comparisons (otherwise, we assume the standard word RAM model). This excludes bucket sort, radix sort, and other integer-sorting techniques [10, 16, 15]. The well-known information-theoretic bound (ITB) states that comparison-based sorting requires comparisons in the worst case, and many algorithms asymptotically matching this bound are known.
In this paper, we study a restricted variant of comparison-based sorting. In addition to the elements to be sorted, we are give a set of possible total orders, with the guarantee that the actual order of the input is contained in . Call this the -sorting problem. Note that may be very large compared to , so it makes sense to consider variants of the problem where is given in some compressed form. Regardless of that, the ITB for the -sorting problem is , so perhaps the first question to ask is: Can we solve the -sorting problem with only comparisons?
In turns out that the answer is yes if . Fredman [9] showed how to solve the -sorting problem with comparisons. This algorithm assumes that is given explicitly and is thus entirely impractical. Fredman also showed that there exist sets with where the -sorting problem requires comparisons, which means the ITB is not always tight.
Most later work on this problem has been focused on the special case where is the set of linear extensions of a partial order , given by its Hasse diagram. It is known that comparisons are sufficient, that is, the ITB is tight (even when ). After decades of research [21, 9, 29, 19, 2, 5, 33], eventually, two algorithms were found that are simultaneously comparison-optimal and running-time optimal [14, 32]. Both algorithms actually solve the slightly more general DAG sorting problem: Given is a directed acyclic graph , and is taken as the set of topological orderings of . Note that each DAG defines a partial order (though may contain some “unnecessary” edges), and is the set of linear extensions of .
Most important for our purposes is the topological heapsort algorithm of Haeupler, Hladík, Iacono, Rozhon, Tarjan, and Tětek [14]. This was the first DAG-sorting algorithm with optimal running time , though it makes comparisons, which is non-optimal if is small. Achieving comparison optimality requires further modifications, which we discuss later.
We briefly describe topological heapsort. Let be the given DAG. At the beginning, identify the set of sources (vertices of in-degree 0) of , and insert them into a priority queue . In each following step, we first find and remove the minimum element from . Then, we remove from . This may create several new sources, which we all insert into . Repeat until is empty. It is easy to see that this algorithm is correct if the underlying total order of the vertices is indeed a topological ordering of . Haeupler et al. achieve optimal running time by using a special priority queue with the so-called working-set property.
Let us now try to generalize topological heapsort to solve the general -sorting problem. Suppose that instead of the graph , we are given a black-box data structure that reports all elements that are the minimum of at least one ordering . After “removing” the first elements , let report all elements such that is a prefix of some ordering (the available elements). Again, it is easy to see that topological heapsort with this data structure is correct. But is it optimal?
The main contribution of this paper (Section 3) is to show that topological heapsort has optimal running time if corresponds to a class of structures called antimatroids, which are a generalization of partial orders, when we disregard the running time for the data structure . We additionally show that, for several possible representations of antimatroids, we can implement such that its total running time is linear in the size of the input, so the running time stays optimal (Section 4). Finally, we show how to additionally achieve comparison optimality (Section 5). The full version of the paper contains various omitted proofs, as well as a discussion of several generalizations of antimatroids where topological heapsort is not optimal. In the remainder of this section, we give an informal definition of antimatroids, and discuss some related work.
Antimatroids.
Fix a set of total orderings, and consider the black-box data structure mentioned above. Recall that, for each sequence of elements, tells us the set of available elements such that is a prefix of some . It is clear that any set can be described in this way. Intuitively, is an antimatroid if the following conditions apply. First, once an element becomes available, it stays available until it is chosen (availability is monotone); and second, availability of an element can only depend on the set of elements chosen so far (but not their order).
The first condition can be nicely related to topological heapsort: It means that our priority queue at no point contains unavailable elements. The second condition may seem more surprising, but it turns out to be also necessary for optimality of topological heapsort and tightness of the ITB (for details, see the full version of the paper).
Antimatroids were first studied by Dilworth [7] in lattice theory. Later, they were found to be special cases of so-called greedoids [25, 26], which are structures admitting a certain greedy optimization procedure. A common alternative definition of antimatroids treats them as set systems; this definition can be related to matroids (hence the name), which are also special cases of greedoids. We refer to the surveys of Björner and Ziegler [1] and Korte, Schrader, and Lovász [27] for more.
Related work.
Optimal algorithms are known [17, 4, 29] for merging two sorted lists; this corresponds to a partial order consisting of exactly two disjoint chains. Merging can be generalized to multiway merging, where the input is split into any number of sorted lists; here, the ITB is known to be related to the Shannon entropy of the list lengths. Multiway merging is important for pratical sorting algortihms that exploit runs in a given unsorted sequence (see, e.g., Gelling, Nebel, Smith, and Wild [11]).
A different special case of the -sorting problem concerns the case where the set is the permutations that avoid a certain pattern .111A permutation of a set avoids a permutation of if contains no subsequence that is order-isomorphic to , according to some fixed total order on . This restricted sorting problem is usually formulated in a different, but equivalent way where is given as a sequence that is known to avoid a pattern. It is known that for some constant depending on , so the ITB is linear in this case. Opler [31] gave an optimal linear-time algorithm for the pattern-avoiding sorting problem, which works even when the pattern is unknown. We refer to his paper for more information on that problem.
The -sorting problem is a restricted sorting problem where, interestingly, the ITB is not tight. Here, two sets of size each are given, and we need to sort the set of size . If only comparisons of the form with , are allowed, then this is a restricted sorting problem by our definition. It is known that the number of possible orderings is only [9], which makes the ITB ; however, Fredman [9] showed that comparisons are needed. It is currently unknown if an algorithm with optimal running time exists.
Finally, a question closely related to restricted sorting is finding balanced pairs, which are comparisons that split the set of possible total orders into two parts of approximately equal size. There is a body of work on balanced pairs in partial orders [21, 9, 29, 2, 3], and the concept has been generalized to antimatroids [8].
2 Preliminaries
All the following definitions are taken from Björner and Ziegler [1] with only slight changes.
Words and languages.
Let an alphabet be a set of letters. A word on is a finite sequence of letters. We write words without commas as , where , and we write for the concatenation of two words and . We denote the length of a word by . The support of a word is the set of letters contained in it, and the empty word is denoted by .
A word is simple if each letter occurs at most once in it (i.e., ), and is a permutation of if each letter occurs precisely once in it. The set of all simple words on is denoted by , and a simple language is a subset . We denote the set of permutations contained in a simple language by , and we call full if .
Partial orders.
A partial order on a set is denoted with a capital letter, like , and we write for the actual partial order relation. Total orders are treated as permutations of the underlying set . Whenever necessary, we write for the total order relation corresponding to the permutation , i.e., we have if precedes in . An oracle for a total order is a black-box function that answers queries of the form in constant time.
Antimatroids.
An antimatroid (sometimes called antimatroid language) is a simple language that satisfies the following two axioms:
-
(i)
For all and , if , then . ( is closed under taking prefixes.)
-
(ii)
For each with , there is some such that .
Axiom (i) is called accessibility and implies that every word in can be built by starting with and successively appending letters, without ever leaving . Axiom (ii) ensures the two “availability properties” of antimatroids mentioned in the introduction.
In this paper, we only consider full antimatroids (recall that this means ).222Some authors define antimatroids to be always full [27]. A full antimatroid is completely determined by ; in fact, is exactly the set of prefixes of . However, not every set of permutations can be described by an antimatroid; for example, any antimatroid on that contains and also contains .
Throughout this paper, we use a different formalism, which is closer to the informal definition of antimatroids given in the introduction. For an alphabet , a precedence function for a letter is a function that is monotone, i.e., we have if . A monotone precedence system (MPS) on is a collection of precedence functions. The language of is the set
The full version of the paper contains a proof that monotone precedence systems indeed characterize (full) antimatroids. In the following, we write for convenience.
Example 1.
Let be a partial order on a set . Define with if and only if . It is easy to see that is exactly the set of linear extensions of . Thus, antimatroids generalize partial orders.
Example 2.
Let , and consider the MPS with for all , and iff or . Then . Since contains both and its reverse , but not all possible orderings, it is not the set of linear extensions of any partial order. Thus, antimatroids strictly generalize partial orders.
Priority queues.
The central data structure of topological heapsort is the working-set priority queue. This is a priority queue (like the well-known binary heap [34]) with the so-called working-set property. Informally, this property ensures that extracting an element is fast if that element has been inserted only shortly before. We omit the precise definitions here, since they are not needed for our proofs, and refer to Haeupler et al. [14] for more details.
3 Generalized topological heapsort
Let us now formally state the antimatroid-sorting meta-problem. Given is a set , an antimatroid on , and a comparison oracle for a permutation . The task is to output (or, equivalently, output in sorted order according to the oracle comparisons ). Note the assumption in particular implies that is full.
We call this a meta-problem since the precise representation of is unspecified. We discuss some possible representations in Section 4. For now, we give a meta-algorithm for the problem (Algorithm 1), using the characterization of as a monotone precedence system .
We can already show correctness of our meta-algorithm regardless of implementation details.
Lemma 3.
Topological heapsort is correct for every input .
Proof.
Let and . We need to show that topological heapsort produces .
Since , we have for each . In particular, this means that , so is contained in at the start. Since also for all , the first element added to is indeed . Suppose now that we have at the start of some iteration. Since , we have . Since for all , the algorithm next appends to . By induction, the algorithm indeed returns .
Two parts of the algorithm are left unspecified: How to identify the initial elements in , and the elements inserted in each iteration. We now define an abstract data structure for this task. A candidate data structure for a monotone precedence system maintains a set , initially , and supports the following two operations:
-
: Set and report all with .
-
: Given with , report all with and . Then, add to .
Observe that allows at most calls to step after init, one for each . Further, parameters of step calls form a sequence . The total time for , written , is the overall time for one call to init and a valid sequence of step calls, in the worst case.
Clearly, any correct candidate data structure can be used to finish the implementation of topological heapsort. The actual implementation will heavily depend on the given representation of the antimatroid; see Section 4.
3.1 Running time analysis
Since we want our analysis to be independent of the candidate data structure, we split the work done by topological heapsort into two essential parts.
-
The total time spend by the candidate data structure, as defined above.
-
The time required to initialize (given the initial elements), delete the minimum from , and insert given elements into in each step. Call this the queue time .
Since the loop performs at most iterations, the running time of topological heapsort is .
Note that the queue time does not depend on the representation of at all. In the following, we show that is always , by reducing to the special case where is a partial order, which has been solved by Haeupler et al. [14]. The number of comparisons can be as large as the queue time, which is not optimal if . In Section 5, we improve this bound to the optimal by modifying the algorithm.
Partial orders.
Let be a directed acyclic graph with vertices and edges, and consider the set of topological orderings of . Clearly, is the set of linear extensions of a partial order on . Thus, is an also antimatroid in (see Example 1).
It is easy to implement a candidate data structure for via the standard topological sorting algorithm [18, 23], with total running time . Haeupler et al. [14] showed that the queue time of topological heapsort is optimal for partial orders:
Lemma 4 (Haeupler et al. [14]).
Let be a monotone precedence system corresponding to a partial order on a set , and let be a total order on . Then, for topological heapsort with input , we have .
General antimatroids.
We now generalize Lemma 4 to antimatroids. We need the following definition. The transcript of a run of topological heapsort on an input is the sequence , where is the set of elements initially in the priority queue, and for is the set of elements in the priority queue after the -th step. Observe that, if , then , and for each . See Figure 1 for an example.
A crucial fact that we use in the proof below is that the queue time only depends on the transcript of the run. In particular, two runs on inputs and that have the same transcript also have the same queue time, even if .
Theorem 5.
Let be a monotone precedence system on , and let be a total order on . Then, for topological heapsort with input , we have .
Proof.
Consider a run of topological heapsort on with transcript , and let .
We now define a partial order on . For each , let for each . Informally, we have if is deleted from the priority queue before is inserted. Let be the monotone precedence system corresponding to . Clearly, . We claim the following:
-
(i)
The transcript of running topological heapsort on is again .
-
(ii)
.
We first explain how this implies the theorem. By Lemma 4, running topological heapsort on has queue time . By claim (i), running topological heapsort on has the same queue time (since the transcript is the same), which, by claim (ii), is at most , as required.
We now prove claim (i). Let be the transcript of running topological heapsort on . Each is a minimal element of by definition, so . On the other hand, each satisfies by definition and thus , so .
Now take some and assume that by induction, we have for all . For each , we have , and , so by definition. On the other hand, each satisfies either (a) or (b) . (a) directly implies ( is removed earlier). (b) implies , which again implies .
We finish the proof with claim (ii). Let . We show that , i.e., that for each , we have .
Let , and let be minimal such that . Let . First, by definition of , we have . Second, we have by minimality of . Now suppose , and let . Then , implying . By monotonicity, we have , as desired.
Remark.
The technique of constructing an auxiliary partial order is inspired by the work of Haeupler, Hladík, Rozhoň, Tarjan, and Tětek [12] (we discuss further connections in Section 4.2). In fact, the exact partial order in the proof of Theorem 5 appears in the revised version333An earlier preprint [13] used a different proof technique. of the paper that introduced topological heapsort [14]. They observe that is an interval order, which means is relatively easy to bound and can be related to the working-set bound.
The algorithm of van der Hoog, Rotenberg, and Rutschmann [32] is also analyzed with the help of interval orders, but seemingly cannot be easily generalized to antimatroids.
4 Antimatroid representations
In this section, we consider several representations of antimatroids (some only covering special cases), and discuss how to implement candidate data structures (CDSs) for these representations. If is a CDS for an animatroid with some representation, then we call efficient if , where is the size of the representation (in machine words). Theorem 5 implies that topological heapsort with an efficient CDS has optimal running time.
4.1 Precedence formulas
We first consider an explicit representation of general full antimatroids based on monotone precedence systems. The precedence formula representation of a full antimatroid on is a collection , where is a monotone boolean formula on the variable set . (A monotone boolean formula allows only the operators and and the constants 0 and 1, notably prohibiting the negation .)
Each formula represents a precedence function as follows: For , we let be the value of when setting each to 1, and each to 0. Since any monotone function can be represented as a monotone boolean formula, this representation is suitable for every antimatroid. In the following, we assume each formula is given as a linked list of symbols with each variable, operator, constant, or parenthesis in the formula occupying one machine word. Other representations, like parsing trees and circuits, are also possible.
An CDS for is implemented as follows. We need two preprocessing steps. First, for each formula, we simplify it. We repeatedly replace substrings , , , and so on, in linear overall time. Afterwards, we have for each with .
Second, compute a directed graph , where , and if contains the variable . For each such edge, we store a list of pointers to each occurrence of within . Note that the size of (including pointers) is at most the total size of all formulas.
The initial candidate set reported by is the set of sources in . is implemented as follows: For each out-neighbor of , replace each occurrence of in with 1. Then, simplify as outlined above, and report if after the simplification. Finally, remove from the graph. To see that the data structure is correct, observe that after calling on each in some set , we have for each with .
The total running time is linear, since each edge is visited once, and simplification is linear overall. Thus, the data structure is efficient, and so:
Theorem 6.
Given an antimatroid on in a precedence formula representation of length , we can solve the -sorting problem in optimal time .
There are two representations of antimatroids that are similar to precedence formulas: rooted set collections and alternative precedence structures [27]. These roughly correspond to precedence formulas in conjuctive normal form and disjunctive normal form, respectively. A third related characterization with so-called elementary ranking conditions [30] is used in linguistics (see the full version of the paper for details). In the following, we discuss two more examples that only apply to subclasses of antimatroids.
4.2 Vertex search and distance orderings
Given a connected graph and a designated root , the vertex search antimatroid on , written , contains the empty word as well as each simple word where and each prefix induces a connected subgraph of . The words in model each way of traversing the graph in a manner similar to BFS or DFS. The basic idea is easily extended to directed graphs; here we require that each prefix induces a subgraph of where every vertex is reachable from . (We assume that each vertex is reachable from in .)
It is easy to see that is indeed an antimatroid; for example, as a precedence formula for each vertex , take , where are the neighbors (resp. in-neighbors) of . However, not every antimatroid is a vertex search antimatroid.
Again, the restricted sorting problem for vertex search antimatroids is solved in optimal time with the precedence formula representation given above.
The vertex search antimatroid plays a role in the distance ordering problem, a variant of single-source shortest path (SSSP) problem. Given is an edge-weighted directed graph with a designated root , and the task is to order the vertices by distance from . Recently, Haeupler et al. [12] showed that Dijkstra’s classical SSSP is universally optimal when equipped with a certain working-set heap (which is more powerful than the one presented in Section 2). Universally optimal means, informally, that for every fixed graph, the algorihtm is worst-case optimal w.r.t. the weight function. In fact, they essentially match the information-theoretic lower bound with a running time of , where is the set of possible distance orderings for the input graph with root , with any weight function.
We now demonstrate a strong connection between antimatroid sorting and the distance ordering problem.444Such a connection is not surprising, since the technique used in the analysis of Dijkstra’s algorithm [12] inspired the original analysis of topological heapsort [14] and large parts of this paper. First off, the set of possible distance orderings is precisely the vertex search antimatroid (see the full version of the paper for a proof). Thus, we can run topological heapsort to find a distance ordering with optimal running time – if we are allowed to directly compare two vertices. That means comparing the distances of two vertices to , which we cannot do in the distance ordering problem (without computing the distances first).
Still, it can be seen that the transcript (see the proof of Theorem 5) of a run of Dijsktra’s algorithm on an input is the same as running topological heapsort on the corresponding vertex search antimatroid. This gives an alternative proof of universal optimality; details are found in the full version of the paper. It should be noted that Haupler et al. also give an algorithm that is universally optimal in both time and number of comparisons, which is harder (c.f. Section 5).
4.3 Elimination orderings
An undirected graph is chordal if has no induced cycles of length four or more. It is well-known that chordal graphs can be characterized by the existence of perfect elimination orderings (PEOs). A PEO is obtained by successively removing simplicial vertices, that is, vertices whose neighborhood form a clique.
The set of elimination orderings form an antimatroid, also known as the simplicial vertex pruning [1] or simplicial shelling [27] antimatroid. Given a graph , we can define the simplicial vertex pruning antimatroid via the MPS with
A candidate data structure for is essentially a data structure that maintains the set of simplicial vertices in under simplicial vertex deletion.
There are known static and fully-dynamic algorithms to compute simplicial vertices in arbitrary graphs [22, 28]. Since we only care about chordal graphs and only need vertex deletion, we can use a relatively simple data structure based on clique trees. Details are found in the full version of the paper. The total running time of removing all vertices is . Thus, we can solve the corresponding restricted sorting problem in optimal time.
5 Comparison optimality
In this section, we modify the topological heapsort algorithm to only use an optimal comparisons, instead of , while keeping the optimal running time. The special case of partial orders was again solved by Haeupler et al. [14]. Our algorithm is similar, but some non-trivial adaptation is required.
First of all, note that the additional term is only relevant when . In the partial order/DAG-sorting case, Haeupler et al. [14] showed that this condition implies the existence of a long directed path in the DAG. Observe that such a path is pre-sorted. We prove a corresponding fact for antimatroids (Section 5.1).
Haeupler et al. then proceed roughly as follows. They first compute a set of marked vertices on the pre-sorted path. Intuitively, the marked vertices represent the interaction with the remainder of the graph: The removal of unmarked vertices does not affect “availability” of vertices outside of the path. This means that often, a large set of unmarked vertices can be removed at once, without any additional comparisons.
This strategy cannot be adapted to general antimatroids in a straight-forward way, since the availability constraints are too complex to preprecompute a small set of marked vertices. We instead use the following three-step algorithm (given is an antimatroid on ):
-
1.
Compute a long pre-sorted sequence (Section 5.1).
-
2.
Sort the set , using topological heapsort on the sub-antimatroid of induced by (Section 5.2). This gives us a sorted sequence .
-
3.
Merge the two sequences and , utilizing the antimatroid constraints (Section 5.3).
All three steps can be implemented efficiently using a candidate data structure for . We obtain:
Theorem 7.
Given a set , a candidate data structure for an antimatroid on , and an oracle for a total order , we can sort w.r.t. in time and with comparisons.
This means that each efficient candidate data structure presented in Section 4 yields a comparison-optimal sorting algorithm.
5.1 Bottlenecks in antimatroids
Let be a monotone precedence system on . The layer sequence of is a partition of into nonempty layers inductively defined as follows:
-
.
-
, where , for each .
Observe that a layer sequence exists if and only if , and that a layer sequence is always unique. If a candidate data structure is available, we can compute the layer sequence in time as follows: Call and add the reported elements to . Then, call for each , add the reported elements to , and so on.
Lemma 8.
Let be a monotone precedence system with layers. Then .
Proof.
We essentially follow Haeupler et al. [14]. Let be the layer sequence for . For each , let be an arbitrary permutation of . We claim that the word is contained in . Indeed, if , then all precede in . Since by definition of , we have .
There are ways of constructing . Writing and , we have
We now proceed to show how to find a long sequence such that for all , we have for all ; in other words, the sequence is pre-sorted. In the DAG-sorting case, we can simply take a longest path in the DAG. Here, we need a different approach, which is also related to the technique of Haeupler et al. [14].
Lemma 9.
Let be the layers of a monotone precedence system , and let . Then, for all and , there exists an such that .
Proof.
Suppose the lemma does not hold for some pair , i.e., there exists some such that for all . Let . Since , we have for all , implying that actually . This means that , implying that , a contradiction.
An element is a called a bottleneck if it is the only element in its layer. Let be the number of bottlenecks and let be the number of layers. Since each non-bottleneck layer contains at least two vertices, we have , which implies . By Lemma 8, we have
Corollary 10.
Let be an MPS with bottleneck elements. Then .
The bottleneck sequence of contains all bottlenecks, ordered by the layer they appear in. Lemma 9 implies that, for every permutation , we have . Overall, we obtain
Lemma 11.
Let be an MPS on , and write . Then there exists a sequence that appears as a subsequence in every , and . Moreover, given a candidate data structure for , we can compute in time .
5.2 Sorting subsets
In this section, we show how to sort the non-bottleneck elements of an antimatroid . Recall that a candidate data structure is given, and we want to sort in overall time , with comparisons (see Theorem 7). We show how to sort any subset within this budget.
We first need some definitions. Let be an alphabet. Given a word and a subset , the restriction of to , written , is obtained by removing all letters in from . For a simple language , let . It is known that if is an antimatroid, then the restriction is also an antimatroid (called the trace [27, 1]).
Let us now assume we have a candidate data structure for . Then we can use topological heapsort to sort . The running time is , and we need comparisons. First, observe that , since the restriction operation surjectively maps to . Second, recall that if is obtained by removing all bottleneck elements (Corollary 10). If we further assume that , then we are within the budget of Theorem 7.
It remains to build a candidate data structure for with . Suppose is given. Then is implemented as follows. First, call , and store the result in a set . Then, we repeat the following as long as there is an element : Discard , call , and add all reported elements to . At the end, report the final set .
To implement , we similarly first call , store the result in a set , and perform the repeated replacement as above.
The (rather technical) correctness proof for is found in the full version of the paper. The running time is clearly . Thus, we have:
Lemma 12.
Given a set , a subset , a candidate data structure for an antimatroid on , and an oracle for a total order , we can sort w.r.t. in time and with comparisons.
5.3 Merging
We now want to optimally merge under antimatroid constraints:
Theorem 13.
Let be an antimatroid on , let , and let be a partition of . Let and . Let contain each permutation such that and .
Given , , , an oracle for , and a candidate data structure for , we can compute in time and with comparisons.
A proof of Theorem 13 is given in the full version of the paper. The algorithm used can be seen as a simplified version of Haeupler et al.’s heapsort with lookahead algorithm [14]; in fact, if the antimatroid is given as a collection of precedence formulas, it simplifies to a partial order under the assumption that and are sorted. (However, our proof works with general candidate data structures.)
5.4 Putting things together
We now prove Theorem 7. First, we compute the bottleneck sequence with Lemma 11. Then, we use Lemma 12 to sort the subset . In other words, if is the underlying total order, we compute . Finally, we merge and using Theorem 13. The overall running time is , and we use comparisons, where is as defined in Theorem 13. Note that . Moreover, we have by Lemma 11. Thus, we need time and comparisons, as desired.
6 Conclusion
In this paper, we have shown that the basic variant of topological heapsort nicely generalizes to the problem of sorting antimatroids, and its running time stays optimal. We also showed how the algorithm can be modified to be comparison-optimal in the antimatroid case, which in particular implies that the information-theoretic bound is tight for antimatroids. Since topological heapsort is not optimal for further generalizations like greedoids (see the full version of the paper), perhaps antimatroids are the most general structure that can be sorted “greedily”. The question whether the information-theoretic bound for greedoids holds, and whether greedoids can be sorted efficiently in some other way, is left open.
For another interesting open question, let us consider an important class of antimatroids that was omitted from Section 4. Convex shelling antimatroids [6, 27, 1] are obtained by progressively removing points from the convex hull of a given point set. A candidate data structure for such an antimatroid would be a certain kind of decremental convex hull data structure. The total running time of this data structure cannot be linear as in the other examples, since computing the convex hull can take superlinear time [20]. Is there a candidate data structure for the convex shelling antimatroid that is fast enough to sort optimally?
References
- [1] Anders Björner and Günter M. Ziegler. Introduction to Greedoids. In Neil White, editor, Matroid Applications, volume 40 of Encyclopedia of Mathematics and Its Applications. Cambridge University Press, 1992. doi:10.1017/CBO9780511662041.009.
- [2] G. R. Brightwell, S. Felsner, and W. T. Trotter. Balancing pairs and the cross product conjecture. Order, 12(4):327–349, 1995. doi:10.1007/bf01110378.
- [3] Graham R. Brightwell. Balanced pairs in partial orders. Discret. Math., 201(1-3):25–52, 1999. doi:10.1016/S0012-365X(98)00311-2.
- [4] Mark R. Brown and Robert Endre Tarjan. A fast merging algorithm. J. ACM, 26(2):211–226, 1979. doi:10.1145/322123.322127.
- [5] Jean Cardinal, Samuel Fiorini, Gwenaël Joret, Raphaël M. Jungers, and J. Ian Munro. Sorting under partial information (without the ellipsoid algorithm). In Leonard J. Schulman, editor, Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pages 359–368. ACM, 2010. doi:10.1145/1806689.1806740.
- [6] Brenda L. Dietrich. A circuit set characterization of antimatroids. Journal of Combinatorial Theory, Series B, 43(3):314–321, 1987. doi:10.1016/0095-8956(87)90007-4.
- [7] R. P. Dilworth. Lattices with unique irreducible decompositions. Annals of Mathematics, 41(4):771–777, 1940. doi:10.1007/978-1-4899-3558-8_10.
- [8] David Eppstein. Antimatroids and balanced pairs. Order, 31(1):81–99, 2014. doi:10.1007/S11083-013-9289-1.
- [9] Michael L. Fredman. How good is the information theory bound in sorting? Theoretical Computer Science, 1(4):355–361, 1976. doi:10.1016/0304-3975(76)90078-5.
- [10] Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci., 47(3):424–436, December 1993. doi:10.1016/0022-0000(93)90040-4.
- [11] William Cawley Gelling, Markus E. Nebel, Benjamin Smith, and Sebastian Wild. Multiway powersort. In Gonzalo Navarro and Julian Shun, editors, Proceedings of the Symposium on Algorithm Engineering and Experiments, ALENEX 2023, Florence, Italy, January 22-23, 2023, pages 190–200. SIAM, 2023. doi:10.1137/1.9781611977561.CH16.
- [12] Bernhard Haeupler, Richard Hladík, Václav Rozhon, Robert E. Tarjan, and Jakub Tetek. Universal optimality of dijkstra via beyond-worst-case heaps. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 2099–2130. IEEE, 2024. doi:10.1109/FOCS61266.2024.00125.
- [13] Bernhard Haeupler, Richard Hladík, John Iacono, Vaclav Rozhon, Robert Tarjan, and Jakub Tětek. Fast and simple sorting using partial information, 2024. arXiv:2404.04552v1.
- [14] Bernhard Haeupler, Richard Hladík, John Iacono, Václav Rozhoň, Robert E. Tarjan, and Jakub Tětek. Fast and simple sorting using partial information. In Yossi Azar and Debmalya Panigrahi, editors, Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3953–3973. SIAM, 2025. doi:10.1137/1.9781611978322.134.
- [15] Yijie Han. Deterministic sorting in time and linear space. J. Algorithms, 50(1):96–105, January 2004. doi:10.1016/j.jalgor.2003.09.001.
- [16] Yijie Han and Mikkel Thorup. Integer sorting in expected time and linear space. In Proceedings of the 43 rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’02), pages 135–144. IEEE Comput. Soc, 2002. doi:10.1109/SFCS.2002.1181890.
- [17] Frank K. Hwang and Shen Lin. A simple algorithm for merging two disjoint linearly-ordered sets. SIAM J. Comput., 1(1):31–39, 1972. doi:10.1137/0201004.
- [18] Arthur B. Kahn. Topological sorting of large networks. Commun. ACM, 5(11):558–562, 1962. doi:10.1145/368996.369025.
- [19] Jeff Kahn and Jeong Han Kim. Entropy and sorting. In S. Rao Kosaraju, Mike Fellows, Avi Wigderson, and John A. Ellis, editors, Proceedings of the twenty-fourth annual ACM symposium on Theory of computing - STOC ’92, STOC ’92, pages 178–187. ACM, 1992. doi:10.1145/129712.129731.
- [20] David G. Kirkpatrick and Raimund Seidel. The ultimate planar convex hull algorithm? SIAM J. Comput., 15(1):287–299, 1986. doi:10.1137/0215021.
- [21] S. S. Kislitsyn. A finite partially ordered set and its corresponding set of permutations. Mathematical Notes of the Academy of Sciences of the USSR, 4(5):798–801, November 1968. doi:10.1007/bf01111312.
- [22] Ton Kloks, Dieter Kratsch, and Haiko Müller. Finding and counting small induced subgraphs efficiently. Inf. Process. Lett., 74(3-4):115–121, 2000. doi:10.1016/S0020-0190(00)00047-8.
- [23] Donald E. Knuth. The Art of Computer Programming, Volume 1: Fundamental Algorithms. Addison-Wesley, 1968.
- [24] Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison-Wesley, 1973.
- [25] B. Korte and L. Lovász. Mathematical structures underlying greedy algorithms. In Ferenc Gécseg, editor, Fundamentals of Computation Theory, volume 117 of Lecture Notes in Computer Science, pages 205–209, Berlin, Heidelberg, 1981. Springer Berlin Heidelberg. doi:10.1007/3-540-10854-8_22.
- [26] Bernhard Korte and László Lovász. Greedoids - a structural framework for the greedy algorithm. In William R. Pulleyblank, editor, Progress in Combinatorial Optimization, pages 221–243. Elsevier, 1984. doi:10.1016/b978-0-12-566780-7.50019-2.
- [27] Bernhard Korte, Rainer Schrader, and László Lovász. Greedoids. Springer Berlin Heidelberg, 1991. doi:10.1007/978-3-642-58191-5.
- [28] Min Chih Lin, Francisco J. Soulignac, and Jayme Luiz Szwarcfiter. Arboricity, h-index, and dynamic algorithms. Theor. Comput. Sci., 426:75–90, 2012. doi:10.1016/J.TCS.2011.12.006.
- [29] Nathan Linial. The information-theoretic bound is good for merging. SIAM J. Comput., 13(4):795–801, 1984. doi:10.1137/0213049.
- [30] Nazarré Merchant and Jason Riggle. OT grammars, beyond partial orders: ERC sets and antimatroids. Natural Language & Linguistic Theory, 34(1):241–269, August 2015. doi:10.1007/s11049-015-9297-5.
- [31] Michal Opler. An optimal algorithm for sorting pattern-avoiding sequences. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 689–699. IEEE, 2024. doi:10.1109/FOCS61266.2024.00049.
- [32] Ivor van der Hoog, Eva Rotenberg, and Daniel Rutschmann. Simpler optimal sorting from a directed acyclic graph. In Ioana Oriana Bercea and Rasmus Pagh, editors, 2025 Symposium on Simplicity in Algorithms (SOSA), pages 350–355. SIAM, January 2025. doi:10.1137/1.9781611978315.26.
- [33] Ivor van der Hoog and Daniel Rutschmann. Tight bounds for sorting under partial information. In 65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, Chicago, IL, USA, October 27-30, 2024, pages 2243–2252. IEEE, 2024. doi:10.1109/FOCS61266.2024.00131.
- [34] J. W. J. Williams. Algorithm 232: Heapsort. Communications of the ACM, 7(6):347–348, 1964.
