Abstract 1 Introduction 2 Model 3 Characterizing On-target Polymers 4 Why Balancing is Nontrivial 5 Main Result: Concentration of Off-target Polymers 6 Framework for Upper-Bounding Off-target Polymers 7 Concentrations of Polymers in the TBN Model 8 Example Applications 9 Discussion References Appendix A Hilbert Basis Implementation Appendix B Detailed Balance and Equilibrium

Computing and Bounding Equilibrium Concentrations in Athermic Chemical Systems

Hamidreza Akef ORCID The University of Texas at Austin, TX, USA Minki Hhan ORCID The University of Texas at Austin, TX, USA David Soloveichik ORCID The University of Texas at Austin, TX, USA
Abstract

Computing equilibrium concentrations of molecular complexes is generally analytically intractable and requires numerical approaches. In this work we focus on the polymer-monomer level, where indivisible molecules (monomers) combine to form complexes (polymers). Rather than employing free-energy parameters for each polymer, we focus on the athermic setting where all interactions preserve enthalpy. This setting aligns with the strongly bonded (domain-based) regime in DNA nanotechnology when strands can bind in different ways, but always with maximum overall bonding – and is consistent with the saturated configurations in the Thermodynamic Binding Networks (TBNs) model. Within this context, we develop an iterative algorithm for assigning polymer concentrations to satisfy detailed-balance, where on-target (desired) polymers are in high concentrations and off-target (undesired) polymers are in low. Even if not directly executed, our algorithm provides effective insights into upper bounds on concentration of off-target polymers, connecting combinatorial arguments about discrete configurations such as those in the TBN model to real-valued concentrations. We conclude with an application of our method to decreasing leak in DNA logic and signal propagation. Our results offer a new framework for design and verification of equilibrium concentrations when configurations are distinguished by entropic forces.

Keywords and phrases:
Equilibrium concentrations, Thermodynamic Binding Networks, Monomer-polymer model, Detailed balance
Funding:
Hamidreza Akef: Support was provided by Schmidt Sciences.
Minki Hhan: Support was provided by Schmidt Sciences.
David Soloveichik: Support was provided by Schmidt Sciences, Department of Energy award DE-SC0024467, and National Science Foundation SemiSynBio III: GOALI award.
Copyright and License:
[Uncaptioned image] © Hamidreza Akef, Minki Hhan, and David Soloveichik; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Models of computation
; Theory of computation Design and analysis of algorithms
Acknowledgements:
We thank Joshua Petrack for valuable discussions and insights.
Editors:
Josie Schaeffer and Fei Zhang

1 Introduction

In general, chemical equilibria of complex chemical systems are not analytically solvable and numerical tools are required for analysis. Such tools include NUPACK [17] for thermodynamic analysis of nucleic-acid systems, as well as more abstract platforms that support domain-level abstraction and free energy specification [12] including via rule-based modeling [11], and software for computing steady-state concentrations of chemical reaction networks [7, 10]. However, engineers often want a deeper understanding of the equilibrium than what analytically opaque numerical calculations can provide. Moreover, we often seek to understand infinite classes of designs such as logic circuits constructed from gate modules or parameterized constructions. For example, there is a growing body of work on leak in DNA-based systems, exhibiting a family of schemes parameterized by a “redundancy parameter” meant to decrease leak arbitrarily at the cost of additional system components [13, 5, 15, 1, 16, 14]. It is proven that producing off-target species necessarily decreases the overall number of separate complexes – the change controlled by the redundancy parameter – and thus incurs an entropic penalty. However, due to system complexity, the relationship between this rigorously proven thermodynamic unfavorability and the actual concentrations of off-target species often remains implicit.

While properties of equilibria of abstract coupled chemical reactions have been extensively explored in the chemical reaction network theory literature (e.g., [6], Chapter 14, for detailed-balance equilibria), and full explicit-parameter schemes (e.g., [9]) can characterize the entire space of equilibria, these analytical approaches have severe limitations. For large systems the resulting explicit formulas are unwieldy and offer no effective guidance on how to choose the parameters so that off-target concentrations remain below a desired bound.

In this paper we are interested in the following problem: Our chemical species are complexes (termed polymers) made up of indivisible units called monomers. We assume there are finitely many possible polymers, and among these we are given a set of on-target species that we want to have desired equilibrium concentrations, and all other off-target species to have some sufficiently low equilibrium concentrations. Our task is to determine a consistent (detailed-balance) equilibrium, and therefore the concentrations of the monomers that would lead to this equilibrium. Note that our interest in setting equilibrium concentrations rather than solving for them (based on initial concentrations or total monomer concentrations, for example) is misaligned with most computational approaches. For instance, it can be seen as the reverse problem to NUPACK 3’s concentrations tool which takes the concentrations of the monomers (strands) and returns the corresponding equilibrium concentrations of the polymers (complexes).

While having the flexibility to set monomer concentrations may appear to simplify the problem in the same way that finding some detailed-balance equilibrium is easier than computing the one consistent with input monomer concentrations, the complexity comes from simultaneously ensuring that off-target polymers remain in low concentration. Consider the natural approach of taking logarithms of all concentrations, thereby converting the detailed-balance equations (balancing each reaction) into a linear system amenable to standard linear solvers. When we fix the concentrations of on-target polymers, the remaining (off-target) concentrations are typically under-determined. The linear system then describes an unbounded affine subspace, and there is no obvious way to extract upper bounds on off-target species’ concentrations without leaving the linear framework. Our solution is an iterative algorithm that assigns concentrations to off-target polymers in decreasing order of concentration, ensuring that each off-target species remains below a desired threshold concentration when possible. Importantly, terminating the algorithm at any iteration still provides valid upper bounds for all remaining off-target polymers.

In the most general formulation of the monomer-polymer equilibrium concentration problem, the polymer free energies can be assigned arbitrarily incorporating binding strength, geometric constraints, etc. In this work, we focus on the simpler athermic case rather than tackling the problem in its full generality. Our allowed polymers are such that all possible reactions between them are enthalpy-neutral. This model is consistent with systems of strong fully-complementary DNA domains in which domain-level bonds can only switch binding partners but not de-hybridize. Thermodynamic Binding Network (TBN) [5, 2] saturated configurations capture this condition, but our setting is more general without a built-in notion of domains (binding sites).

The main result of this paper is Algorithm 1 and Theorem 5.4 showing how starting with desired concentrations of on-target polymers (already in detailed balance), we can set the concentrations of off-target polymers to satisfy detailed balance and thus thermodynamic equilibrium. In Section 4 we explain the apparent difficulties in balancing reactions which our approach needed to overcome.

If, rather than computing exact concentrations of off-target polymers, it is sufficient to bound them, then we refer the reader to Section 6. In Section 7 we apply our framework to the analysis of systems in the TBN model specifically, connecting the combinatorial notions of stability and entropy loss in the TBN model to equilibrium concentrations. In Section 8, we show applications of our method to the analysis of a simple TBN AND gate, as well as a parameterized family of signal propagation systems (translator cascades) from prior work. For the translator cascade, we argue that tuning concentrations of on-target polymers according to our framework is essential for leak to decrease exponentially with the redundancy parameter. We conclude with a discussion of future work (Section 9), including a formulation of new combinatorial conditions in the TBN model to make our framework easily applicable.

2 Model

Let denote the set of nonnegative integers. Given a finite set 𝒜, we define 𝒜 as the set of functions f:𝒜.

A multiset over the finite set 𝒜 is described by its counting function f𝒜, where for each element a𝒜, the value f(a) indicates how many times a appears in . We often write 𝒜 to mean that is a multiset over 𝒜, and we denote the count of a𝒜 in by [𝒶]. The notation a means that [𝒶]1. The cardinality of a multiset , denoted ||, is the total number of elements in the multiset ||=𝒶𝒜[𝒶]. For two multisets and over 𝒜, we define their union + as the multiset whose counting function is the pointwise sum (+)(𝒶)=[𝒶]+[𝒶] for all 𝒶𝒜. For example, let ={𝒶,𝒶,𝒷,𝒸}. Then, [𝒶]=2, [𝒷]=1, [𝒸]=1, and [𝒹]=0 for all d{a,b,c}. Also, the cardinality of is ||=2+1+1=4. Finally, note that could also be written as the union {a,b}+{a,c}, for example.

The linear combination of multisets with nonnegative integers is defined analogously: For multisets 1,,𝓃 and a1,,an, i=1nai𝒾 corresponds to the counting function i=1nai𝒾[𝒶]. Let 1,2𝒮 be two multisets over the same set 𝒮. The difference 12 is defined as the multiset 𝒮 such that for every a𝒮, (12)[𝒶]=1[𝒶]2[𝒶] provided that 1[𝒶]2[𝒶] for all a𝒮.

We also define the intersection of a multiset with a set. Given a multiset over 𝒜 and a subset 𝒮𝒜, the intersection 𝒮 is a multiset over 𝒜 defined by (𝒮)[𝒶]=[𝒶] if a𝒮, otherwise (𝒮)[𝒶]=0. For instance, {a,a,b,c}{a,c}={a,a,c}.

The main object of this paper is an abstract model of monomers and polymers, motivated by systems in DNA nanotechnology. This model captures how simple indivisible molecules (monomers) combine to form complexes (polymers) under specific physical and chemical constraints.

Definition 2.1.

Let 𝚿𝟎 be a finite set of monomers, and 𝚿𝚿𝟎 be a finite set of polymers over these monomers, where each polymer P𝚿 is a multiset of monomers.

Let 𝐱𝟎(0,1)𝚿𝟎 represent the vector of concentrations for all monomers, and let 𝐱(0,1)𝚿 represent the vector of concentrations for all polymers (also called configuration). The relationship between monomer and polymer concentrations is governed by mass conservation. Specifically, we require

𝐱𝟎=𝐀𝐱 (1)

where 𝐀|𝚿𝟎|×|𝚿| is a matrix such that each entry Aij specifies the number of monomers of type i in polymer j.

For example, the polymer P={m1,m1,m2,m3} contains two copies of m1, one of m2, and one of m3. Note that we will be interested in cases where the set of polymers of interest 𝚿 is a finite (proper) subset of all possible polymers over 𝚿𝟎.

In DNA nanotechnology the monomers are typically DNA strands with different sequences. Polymers are analogous to a multistranded DNA structure composed of multiple DNA strands. We use the term polymer rather than “complex” in order to better emphasize their composition from monomers and to be consistent with the TBN literature.

To model the equilibrium behavior of such systems, we use the free energy formulation in the notation of Dirks et al. [4], where the equilibrium concentrations are obtained by minimizing the following free energy function (corresponding to the pseudo-Helmholtz free energy used throughout chemical reaction network theory literature [8, 6]):

𝐠(𝐱)=P𝚿xP(logxPlogΩP1) (2)

where xP denotes the concentration of polymer P, and ΩP is its partition function corresponding to the exponential of the polymer’s negative free energy. The minimization is subject to the mass conservation constraint given in Equation 1.

In this work, we focus on athermic systems where all interactions are equally favored enthalpically. Thus we assume that P𝚿xPlogΩP is constant for all configurations 𝐱 satisfying mass conservation (Equation 1), yielding a simplified cost function that is entirely entropic:

g(𝐱)=P𝚿xP(logxP1). (3)

This function serves as the objective of our optimization problem. Its minimizer under the constraint of Equation 1 corresponds to the equilibrium concentration of polymers.

Understanding the equilibrium requires us to formalize how polymers can transform into one another. We do so by defining reconfigurations and the reactions they induce.

Definition 2.2.

Two multisets of polymers 1,2𝚿 are reconfigurations of each other, written 12, if for every monomer m𝚿𝟎, the total count of m is the same in 1 as in 2; i.e., P𝚿1[𝒫]𝒫[𝓂]=𝒫𝚿2[𝒫]𝒫[𝓂]. Whenever 12, we also define reaction 12.

We occasionally use Greek alphabets to denote the reactions. Note that while is a binary relation, a reaction α:12 is an ordered pair representing a directed transformation between the multisets. Our arguments will refer to reactants (left-hand side) and products (right-hand side) of particular reactions.111All reactions are of course reversible, but we treat each direction as a separate reaction.

An important property of the minimum of Equations 2 and 3 is that it satisfies detailed balance over reactions [8, 6]. For us (Equation 3), for any single reaction α:12, the equilibrium concentrations satisfy:

P1xP1[𝒫]=P2xP2[𝒫].

We say that a reaction α is balanced when this equality holds. This notion of balance will be central to our characterization of equilibrium concentrations formed by on-target polymers, which we explore next.

The fact that balance leads to the minimum of the free energy is well-established, and we present its sketch in Appendix B for the completeness of the paper.

3 Characterizing On-target Polymers

We now define the set 𝒮 of on-target polymers – intuitively, these are the high-concentration polymers whose equilibrium concentration we set as input to our algorithm.

Definition 3.1.

Let 𝒮𝚿 be a set of polymers, and let μ:𝒮(0,1] be a function assigning a concentration exponent to each polymer in 𝒮. We say that 𝒮 is on-target with concentration exponents μ if:

  1. 1.

    For every polymer P𝚿, there are multisets of polymers 𝒮 and 𝚿 where contains P such that .

  2. 2.

    For any two multisets 1,2𝒮, if 12, then their concentration exponents are equal μ(1)=μ(2). Here, the concentration exponent of a multiset is defined as μ()=P[𝒫]μ(𝒫).

While presented in terms of restricting 𝒮, another (perhaps more apropos) interpretation of the first condition is that we are restricting 𝚿 to be only those polymers that can be obtained via a reconfiguration of polymers over 𝒮. The interpretation of the second condition is that every reaction among polymers in 𝒮 is in detailed balance. More precisely:

 Remark 3.2.

Let 0<c<1, and suppose every on-target polymer P𝒮 has concentration cμ(P). Then every reaction 12, where 1,2𝒮, satisfies cμ(1)=cμ(2) and thus the product of the concentrations of the left-hand side polymers is equal to the product of the concentrations of the right-hand side polymers.222Note that we use symbol μ for concentration exponents because of the direct parallel to the standard notion of chemical potential, which is expressed with logarithmic terms of concentration: chemical potential μ=μ+RTln(x), where x is the mole fraction.333Our base concentration c is smaller than one because the units of concentration are mole fractions – the ratio of polymers to solvent molecules, and the derivation of the energy function Equation 2 only holds in the regime of less polymer than solvent [4].

If 𝒮=𝚿 then we are done, being guaranteed that all reactions are in detailed balance. However, the case of interest is where we do not know the concentration exponents of all polymers – making it the goal of this paper to compute them or bound them to ensure that the equilibrium concentration of off-target polymers is small.

We now define canonical reactions, which are the reactions with reactants from 𝒮. These reactions can generate polymers outside of 𝒮 from polymers in 𝒮. Each canonical reaction has key quantities we call imbalance and novelty which will play a key role in our algorithm.

Definition 3.3.

For an on-target set 𝒮 with μ, a reaction α:12 is called canonical if all its reactants are over 𝒮 (i.e., 1𝒮). Then the imbalance of α is defined as k(α):=μ(1)μ(2^), where 2^:=2𝒮. The novelty of α is defined by l(α):=|22^|.

Given the definition of multiset subtraction, the novelty l(α) is always non-negative.

Intuitively, the imbalance of the reaction represents how far from detailed balance it is if we consider only the polymers whose concentrations are known. The novelty is the number of off-target (new) polymers produced by the same reaction.

Intuitively, a large k(α) means a larger bias of the reaction toward its reactants, i.e., elements of 𝒮, prior to assigning concentration exponents to off-target polymers. This is desired since it implies less pressure to make off-target polymers and gives us more room to maneuver in assigning concentrations to them.

Intuitively, a larger novelty l(α) means that the canonical reaction is more entropically favored to its products. Note that the term P𝚿xPlogxP in Equation 3 is minimized when the concentration is more “spread out” over the different species (like Shannon entropy), and thus there is an entropic force to generate new polymers. A large l(α) is thus undesired. As justified in the subsequent results, the ratio k(α)/l(α) captures the effective tradeoff between imbalance and novelty.

Note that summing canonical reactions gives another canonical reaction. While on the one hand this leads to infinitely many canonical reactions, on the other hand, this additive property allows us to analyze the set of all canonical reactions using a Hilbert basis of finite size. We explain how to employ the Hilbert basis in our algorithm to avoid the infinity of canonical reactions in Appendix A.

The notion of stability of 𝒮, defined below, captures the idea that on-target polymers are in higher concentration compared to off-target polymers. Recall that concentration exponents μ are 1 for polymers in 𝒮 (Definition 3.1). As shown later by Theorem 5.4, the following definition ensures that all concentration exponents of polymers outside of 𝒮 are greater than 1 by our algorithm. Since the base concentration c<1, this implies that off-target polymers are in smaller concentration.

Definition 3.4.

The on-target set 𝒮 with μ is called stable if, for every canonical reaction α with l(α)0, the ratio k(α)/l(α)>1.

While our formalism allows different concentration exponents for different polymers in 𝒮, nearly all insight can be obtained from the simple case of uniform on-target concentration exponents:

Definition 3.5.

The on-target set 𝒮 with μ is called uniform if every polymer P𝒮 has concentration exponent μ(P)=1.

4 Why Balancing is Nontrivial

Our main goal is to ensure that in equilibrium the concentration of on-target polymers is much higher than the others. This section presents two examples illustrating why this is nontrivial.

Consider a uniform set of on-target polymers 𝒮={A,B,C,D}, aiming at the equilibrium concentration xA=xB=xC=xD=c. For two off-target polymers P1 and P2, suppose that we figured out a canonical reaction

α:2A+B+CP1+P2.

This reaction is in detailed balance if and only if xP1xP2=xA2xBxC=c4. At this point, it is unclear how to balance xP1 and xP2 without inspecting the other reactions: For example, the reaction α is balanced by assigning xP1=xP2=c2, but another canonical reaction A+2B+2D2P1 would not be if it exists.

The following example exhibits a different potential problem. While Remark 3.2 shows that reactions entirely within 𝒮 are balanced, it is not clear that given our choice of concentration exponents μ for 𝒮, the reactions involving polymers outside of 𝒮 could be balanced at all. Suppose there are two reactions

β:A+BP3 and γ:B+C+D2P3.

These reactions intuitively say that the off-target polymer P3 is non-interacting to other polymers in these reactions so that the above problem of balancing the concentrations over off-target polymers is absent.444Looking ahead, these non-interacting polymers are indeed easier to assign concentrations to as shown in Section 5.1. However, balancing the reaction β suggests the concentration xP3=c2 at equilibrium, but the reaction γ gives xP3=c1.5. In other words, it is unclear if the concentrations of each of the polymers in 𝒮 can be the same or even close to each other.

Despite these difficulties, our main result guarantees that the configuration on on-target polymers we demand is in equilibrium without the above issue, and can be extended to the configuration over all polymers at equilibrium, and ensures that the off-target polymers remain in very low concentration at equilibrium. Intuitively, we prove that there exists a canonical reaction for which we can assign concentrations to product polymers without creating conflicts with other reactions, and we provide a method to find this reaction. Moreover, as we will see later, these assignments occur in order of decreasing concentration allowing us to restrict the concentration of off-target species.

5 Main Result: Concentration of Off-target Polymers

For the remainder, we fix 𝚿𝟎, 𝚿, and a particular set of on-target polymers 𝒮 with concentration exponents μ.

To systematically assign concentration exponents to all polymers, we will organize them into hierarchical groups called levels. The process begins with a designated set of polymers 𝒮, the on-target polymers, which are assigned initial concentration exponents via μ:𝒮+.

All other polymers, called off-target polymers, will be partitioned into level sets 𝒮1,𝒮2,, constructed inductively based on how these polymers appear in certain canonical reactions. At each level i, we will compute a scalar value μi that serves as the concentration exponent for all polymers newly added at that level.

In this way, we gradually extend the initial function μ to a global function μ¯:𝚿+ that assigns a concentration exponent to every polymer in the system. This inductive process and the precise definitions of μi, 𝒮i, and μ¯ will be described in detail below.

Definition 5.1.

Given 𝒮0,,𝒮i1, assume μ¯(P) is assigned for any Pj=0i1𝒮j. For a canonical reaction α:12, we define 2^:=2(𝒿=0𝒾1𝒮𝒿). The ith-level imbalance of α is defined as ki(α):=μ¯(1)μ¯(2^), and the ith-level novelty by li(α):=|2||2^|.

While 2^ technically depends on the level index i, we suppress this dependency in the notation for simplicity; it will be clear from the context that it is updated at each level. Additionally, by the definition of a canonical reaction, we have μ¯(1)=μ(1) in the expression above.

Definition 5.2.

Let μi=minα{ki(α)/li(α)} where the minimum is taken over all canonical reactions α with li(α)0.555The minimum can actually be taken over a finite subset of canonical reactions using Hilbert basis. We refer the reader to Appendix A for more detail. The canonical reactions achieving the minimum are termed i-levelizing canonical reactions. The ith level set 𝒮i is defined as the set of all polymers P appearing in any i-levelizing reactions not already in j=0i1𝒮j. Every polymer P𝒮i is assigned concentration exponent μ¯(P):=μi.

At the heart of this inductive construction is the requirement that each i-levelizing reaction maintains balance with respect to the assigned concentration exponents. That is, for every reaction α:12 used to define level 𝒮i, the assigned exponents ensure the reaction remains detailed balanced, cμ¯(1)=cμ¯(2). This holds because each polymer in 2 that already appeared in previous levels contributes its already-defined exponent, while newly introduced polymers in 𝒮i all receive the same value μi. Decomposing 2 into previously levelized polymers ^2 and new polymers in 𝒮i, we obtain:

μ¯(1)=μ¯(^2)+ki(α)=μ¯(^2)+μili(α)=μ¯(^2)+μi(|2||^2|)=μ¯(2),

ensuring that the total concentration exponent on both sides of the reaction is equal. This confirms that the assignment of μi for polymers in level 𝒮i is consistent with the balance requirements dictated by the underlying reactions.

The full procedure for determining the level sets 𝒮i and the corresponding concentration exponents μi is formally specified in Definitions 5.1 and 5.2 and carried out through the step-by-step process described in Algorithm 1. This algorithm inductively builds the extended concentration exponent function μ¯ by identifying polymers that can be levelized at each step and assigning them an appropriate exponent value based on their participation in canonical reactions.

While the algorithm itself does not explicitly state a stopping condition, its termination is ensured by the structure of the level construction process. Each time a new level 𝒮i is defined, it includes at least one polymer not present in any previous level. Since the set of all polymers 𝚿 is finite by assumption, only finitely many levels can be introduced. As a result, the inductive process must terminate after assigning a level and concentration exponent to each polymer, thus completing the definition of the extended function μ¯.

For us, canonical reactions represent the simplest meaningful interactions and serve as building blocks for more complex behavior. Their central role for our results is motivated by the following lemma:

Lemma 5.3.

Let 𝐱𝟎(0,1)𝚿𝟎 be a vector of concentrations of the monomers. If all canonical reactions are balanced at configuration 𝐱(0,1)𝚿, then the cost function g(𝐱) is minimum subject to 𝐀𝐱=𝐱𝟎.

Proof.

We will show that any arbitrary reaction can be decomposed into a canonical reaction. Consequently, if detailed balance holds for all canonical reactions, it follows that detailed balance holds for all reactions. Per Appendix B, the cost function g reaches its minimum – under the constraint 𝐀𝐱=𝐱𝟎 – when all reactions are balanced. This confirms the statement of the lemma.

Consider an arbitrary non-canonical reaction α:1+𝒫2 with P𝒮 where 1+𝒫 denotes the union of two multisets 1 and {P}. According to the first condition of Definition 3.1, there exists a canonical reaction β:12+𝒫 that produces P. Now, apply β and α sequentially on the reactants 1+1. This gives rise to a new reaction:

γ:1+11+(2+𝒫)=(1+𝒫)+22+2.

Therefore, the reaction α, when catalyzed by 2 (i.e., adding 2 to reactants and products) and using the inverse of the canonical reaction β, can be replaced by the reaction γ, which involves fewer reactant polymers outside of 𝒮 than α did originally:

(1+𝒫)+2=1+(2+𝒫)1+12+2.

By repeating this procedure, α decomposes into a canonical reaction. This concludes the proof.

With this groundwork in place, we now state our main theorem, which characterizes the equilibrium distribution of the entire polymer system, both on-target and off-target, in terms of their assigned levels.

Theorem 5.4.

Let 𝒮 be the stable set of on-target polymers with concentration exponents μ:𝒮(0,1]. For the extended concentration exponent μ¯:𝚿+ generated by Algorithm 1 and for any 0<c<1, there are monomer concentrations 𝐱𝟎(0,1)𝚿𝟎 such that the configuration 𝐱(0,1)𝚿 with each polymer P𝚿 at concentration cμ¯(P) is the minimum of g(𝐱) subject to 𝐀𝐱=𝐱𝟎 (i.e., Equations 3 and 1). Furthermore, μ¯(P)μ1>1 for all P𝒮.

In this configuration, every off-target polymer has strictly lower concentration than any on-target polymer. This is because concentrations scale exponentially with the extended exponent μ¯(P), and off-target polymers are assigned strictly larger exponents than the shared minimal value μ of the on-target set 𝒮. As a result, for 0<c<1, off-target concentrations are exponentially suppressed.

Algorithm 1 Calculating level sets and concentration exponents. The repeat loop will terminate because there are finitely many polymers. The Hilbert basis implementation avoiding the infinite set Λ is discussed in Appendix A.

Proof of Theorem 5.4.

We use a proof by contradiction to show that each canonical reaction must be balanced in the configuration with [P]=cμ¯(P), which suffices to conclude the proof thanks to Lemma 5.3.

Suppose that there exists a canonical reaction α:12. We consider two cases μ¯(1)<μ¯(2) or μ¯(1)>μ¯(2). Recall all polymers are levelized.

Case 1.

μ¯(1)<μ¯(2). Let t be the top level among the polymers in 2. Note that μ¯(P)=μt for each P𝒮t. Since μt is defined as the minimum of kt(α)/lt(α) over all canonical reactions αΛ with lt(α)0, we must have:

μtkt(α)lt(α)=μ¯(1)μ¯(^2)lt(α)<μ¯(2)μ¯(^2)lt(α)=μt

which is a contradiction.

Case 2.

μ¯(1)>μ¯(2). Let βP be the levelizing reaction including P as product for each polymer P2. Consider a canonical reaction P2[𝒫]β𝒫:12+2 obtained by summing 2[𝒫] copies of βP which includes 2 as products.666Here we use the linear combination of reactions in a standard way; for example, for α:𝒩1𝒩2 and β:𝒩1𝒩2, the reaction 2α+β denotes the reaction 2𝒩1+𝒩12𝒩2+𝒩2. By applying 21 to this reaction, we have a canonical reaction β:12+22+1 such that

μ¯(1)=μ¯(2)+μ¯(2)<μ¯(2)+μ¯(1).

Therefore this case is reduced to Case 1, leading once again to a contradiction.

The “Furthermore” part is proven in Corollary 6.4.

5.1 Non-Interacting Off-target Polymers

Our main theorem implies that if the on-target polymers 𝒮 and the corresponding concentration exponents μ are chosen appropriately, the equilibrium exists with the extended concentration exponents μ¯ consistent with μ. In this section, we show that μ¯ exponents for non-interacting off-target polymers P can be assigned in a particularly easy way. Non-interacting off-target polymers are those that can be independently made by a canonical reaction:

Corollary 5.5.

Let 𝒮 be a stable set of on-target polymers. Suppose that an off-target polymer P𝒮 is a product of a canonical reaction αP:𝒫P+𝒫 for some multisets 𝒫 and P from 𝒮.777Note that there may be other reactions involving P with other off-target polymers. Then μ¯(P)=μ(𝒫)μ(P).

Proof.

By Theorem 5.4, the reaction αP is in detailed balance, cμ¯(𝒫)=cμ¯(P+𝒫), and we take the logarithm of both sides.

The same idea extends further. At any point of the execution of the algorithm, suppose that the extended concentration exponents of the set of polymers 𝒮¯ have been already assigned. For a non-interacting polymer P outside of 𝒮¯ with the reaction αP:𝒫P+𝒫 for multisets 𝒫 and P from 𝒮¯, we can assign μ¯(P)=μ¯(𝒫)μ¯(P). This assignment is valid by the same reason to the corollary.

Example 5.6.

Consider the set of monomers and polymers given by 𝚿𝟎={a,b,c} and 𝚿={A,B,C,X,Y,Z}; where A={a,a}, B={a,b}, C={c}, X={a,a,a,b}, Y={b,b,c,c}, and Z={b,b,c,c,c}. Consider the uniform on-target set 𝒮={A,B,C}. Instead of inspecting all canonical reactions, we can focus on three canonical reactions

α:A+BX,β:3B+2CX+Y,γ:A+2B+3CX+Z.

Here X is the non-interacting off-target polymer in α, thus μ¯(X)=2. After assigning μ¯(X), Y and Z become non-interacting off-target polymers in β and γ, respectively, so that we derive μ¯(Y)=3 and μ¯(Z)=4.

6 Framework for Upper-Bounding Off-target Polymers

Often we are only interested in an upper bound on the concentration of an off-target undesired polymer. In this case, rather than generating its exact equilibrium concentration via Algorithm 1, we can more efficiently compute an upper bound by narrowing our focus to reactions that directly produce P instead of examining the full set of canonical reactions. This leads to a simpler surrogate quantity that approximates μ¯(P) from below.

Definition 6.1.

For Pj=0i1𝒮j, let μ~i(P)=minα{ki(α)/li(α)} where the minimum is taken over all canonical reactions α that include P as a product.

Since P has not yet been levelized by construction, any such reaction α producing P, obviously involves at least one unlevelized polymer, ensuring li(α)0.

Intuitively, μ~i(P) captures a conservative estimate of the exponent with which polymer P can grow in concentration at level i, based only on reactions that actually produce P. Since it is defined as a minimum over a subset of possible reactions, it is easier to compute than the full concentration exponent μ¯(P), yet is still meaningful for analysis:

Theorem 6.2.

For any polymer Pj=0i1𝒮j, μ¯(P)μ~i(P)μi.

Lemma 6.3.

For a canonical reaction α with li+1(α)0, it holds that ki(α)/li(α)ki+1(α)/li+1(α).

Proof.

Note that μiki(α)/li(α) by definition of μi. Let n be the count of level-i product polymers in α, which must be smaller than li+1(α). Then we have

ki+1(α)li+1(α)=ki(α)nμili(α)n=ki(α)nμili(α)nki(α)n(ki(α)/li(α))li(α)n=ki(α)li(α).

Proof of Theorem 6.2.

Since all polymers will eventually be levelized, there exists a j-levelizing canonical reaction α that includes P as a product for some ji. Then, μ~i(P)ki(α)/li(α)kj(α)/lj(α)=μ¯(P) where the first inequality follows from Definition 6.1, and the second follows from Lemma 6.3.

Furthermore, the value μ~i(P) is computed by taking the minimum of ki(α)/li(α) only over canonical reactions that produce P (and have li(α)0). In contrast, μi is defined as the minimum over all canonical reactions, regardless of whether they produce P or not. In other words, the set of reactions considered when computing μ~i(P) is a subset of those considered for μi, making the search space for μ~i(P) more restricted. Since a minimum over a smaller set cannot be smaller than the minimum over a larger set, we conclude that μ~i(P)μi.

To summarize, μ~i(P) serves as a computationally efficient lower bound on μ¯(P). When used alongside Theorem 5.4, this upper bounds the equilibrium concentration of P.

The usefulness of Theorem 6.2 extends beyond individual estimates. It contributes to structural insights about level assignments during the iterations of our algorithm. Specifically, after levelizing up through Si1, we know that any unlevelized polymer Pj=0i1𝒮j satisfies μ¯(P)μi>μi1. This inequality ensures that polymers awaiting level assignment must exhibit smaller concentrations. In particular, because the first level imbalance k1 and novelty l1 are precisely identical to k and l defined in Definition 3.3, we derive μ¯(P)μiminα{k(α)/l(α)}>1 and the following corollary.

Corollary 6.4.

μ¯(P)>1 for all P𝒮.

7 Concentrations of Polymers in the TBN Model

While the TBN model is combinatorial in nature, quantifying over discrete (saturated) configurations, at the end we are most often interested in determining real-valued concentrations, which are accessible to (bulk) wet-lab experiment. The framework developed in this paper helps to bridge this gap between combinatorics of discrete configurations and concentrations.888Of course, much of the heavy lifting in bridging this gap is done by the derivation [4] of the cost function g(𝐱), but our work expands on it beyond numerical simulation.

Consider a saturated (i.e., maximally bonded) configuration in the TBN model. Quantified over all saturated reconfigurations , the key quantity of interest in the TBN model is the “entropy” of , defined as the number of polymers in (i.e., ||). The TBN model defines the “stable” configurations to be those that have maximum entropy among all saturated configurations.

Corresponding to the TBN literature [5], a multiset of polymers is called TBN-stable if any reaction has ||||. We say that a set 𝒮 of polymers is TBN-stability closed if every multiset 𝒮 is TBN-stable and further any reaction where contains a polymer outside of 𝒮 (i.e., P for some P𝒮) satisfies ||>||. In other words, producing a polymer outside of a TBN-stability closed set 𝒮 costs entropy. The following lemma connects our notion of stability of on-target polymers (Definition 3.4) to TBN-stability.

Lemma 7.1.

If 𝒮 is TBN-stability closed and satisfies condition (1) of Definition 3.1, then 𝒮 is on-target with μ(P)=1 for all P𝒮 (i.e., uniform) and stable.

Proof.

We first check 𝒮 is on-target with μ. For a reaction α:12 where 1 and 2 are multisets over 𝒮, μ(1)=|1||2|=μ(2). Applying the same argument to α:21 yields μ(1)μ(2). Combining the two gives μ(1)=μ(2), proving 𝒮 with μ is an on-target set.

Now we will prove that uniform on-target 𝒮 is stable. Consider a canonical reaction β:12 with l(β)>0. By definition, we have k(β)=μ(1)μ(^2)=|1||^2| and l(β)=|2||^2|. Therefore, the condition for 𝒮 being stable, namely k(β)/l(β)>1, is implied by |1|>|2|.

Thus, the polymers in the TBN-stability closed set 𝒮 represent the intended “high-concentration” polymers, while everything outside of 𝒮 is considered undesired (off-target).

Let canonical reaction α be ; we define |||| to be the entropy loss e(α) of the reaction α. Recall that during the first iteration of our algorithm, novelty l(α) is the number of off-target polymers generated in canonical reaction α. The imbalance k(α) in the first iteration can be understood in terms of the entropy loss of the reaction: the decrease in the number of polymers of a reaction is exactly e(α)=k(α)l(α). Thus to have an upper bound on the concentration of off-target polymers via Theorem 6.2, it is sufficient to find the smallest ratio e(α)/l(α) of any reaction:

Corollary 7.2.

Let set 𝒮𝚿 be a TBN-stability closed set of polymers. Let μ1=minα{e(α)/l(α)}+1 minimized over all reactions α: where 𝒮, 𝚿 and l(α)>0. For any 0<c<1, there are monomer concentrations 𝐱𝟎(0,1)𝚿𝟎 and configuration 𝐱(0,1)𝚿 that minimizes g(𝐱) subject to 𝐀𝐱=𝐱𝟎 where 𝐱 satisfies: each polymer P𝒮 has concentration exactly c, and each polymer P𝚿𝒮 has concentration not more than cμ1.

In other words, if 𝒮 is a TBN-stability closed set of polymers then we can assign concentration c to each of them (uniform assignment). Then we consider the “worst” way to generate any polymers outside of 𝒮 (i.e., off-target): the way that has the smallest entropy loss and the largest novelty. The ratio of the entropy loss to novelty gives us an upper bound on the concentration exponent μ1 of any off-target polymer, bounding its concentration by cμ1. The smallest entropy loss to generate off-target polymers is already the key item of interest in the TBN literature. Thus the above corollary helps to bootstrap concentration bounding arguments via existing arguments based on entropy loss.

8 Example Applications

In this section we show several applications of our mathematical tools in the analysis of existing systems of interest in the DNA molecular programming literature. We base our arguments on previous results proving the entropy loss (quantity e(α) of Corollary 7.2) of the systems in producing off-target (leak) species.

We note that while existing literature succeeds in characterizing entropy loss via TBN-like combinatorial arguments, more work is needed to develop similarly rigorous combinatorial arguments on novelty (quantity l(α)); see also the discussion in Section 9. In the examples to follow, we claim to have identified the worst-case canonical reactions – i.e., having the least e(α)/l(α) or k(α)/l(α) ratio – without proof.

We first consider the TBN AND gate introduced in prior work [5, 2] and recently experimentally realized [14]. Figure 1 (left) shows the desired functionality of the AND gate in which inputs A and B cooperate to produce C. We are interested in bounding the concentration of C (leak) that can be produced in the absence of inputs A and B. Phrased in our terminology, combinatorial TBN arguments have shown that 𝒮={X,Y,Z} is TBN-stability closed, as well as {X,Y,Z,A} and {X,Y,Z,B} where one of the two inputs is present. This implies that any canonical reaction α producing C has entropy loss e(α)1. However, such arguments do not directly connect entropy loss to leak concentrations, justifying the need for a tool like our Corollary 7.2.

Figure 1: A TBN module implementing an AND gate. The presence of polymers A and B represents input 1, while their absence represents input 0. (Left) The intended reaction pathway generates the output polymer C when both A and B are present. (Middle) When both A and B are absent, the worst-case canonical reaction is α:X+Y+ZG1+C producing an erroneous output C (leak). The entropy loss and novelty of the reaction are e(α)=1 and l(α)=2, resulting in the concentration upper bound of C of c1.5 via Corollary 7.2. (Right) If only one input is present (B), then the worst-case canonical reaction is β:B+X+Y+ZG2+G3+C with e(β)=1 and l(β)=3, yielding the upper bound concentration of C of c1.33.

To apply our framework to the TBN AND gate, we use Corollary 7.2. In the absence of both inputs, take 𝒮={X,Y,Z}. We claim that the worst-case canonical reaction α producing C is X+Y+ZG1+C shown in Figure 1 (middle). This reaction has entropy loss e(α)=1 and novelty l(α)=2 since G1 and C are outside of 𝒮. Thus for any base concentration 0<c<1, there is an equilibrium with [X]=[Y]=[Z]=c where the leak concentration is [C]c1.5.

Similarly, consider the case of having input B but no input A. We take 𝒮={X,Y,Z,B} and claim999While we do not provide a formal proof that this reaction is worst-case, our confidence is based on the observation that we cannot increase the novelty further without increasing the entropy loss in proportion. For example, we can combine two copies of reaction α to yield reaction α but then e(α)=2e(α) and l(α)=2l(α), maintaining the same ratio. that the worst-case canonical reaction α producing C is X+Y+Z+BG2+G3+C as shown in Figure 1 (right). This reaction has entropy loss e(α)=1 and novelty l(α)=3 since G2, G3, and C are outside of 𝒮, yielding the equilibrium leak concentration of [C]c1.33. Our analysis thus provides concrete polynomial upper bounds on leak, consistent with the qualitative expectation of the TBN model that leak should become comparatively negligible in the limit of decreasing c. The bound is smaller when both inputs are absent than when B is present.

As the next example, we consider the “leakless” DNA strand displacement system previously theoretically and experimentally studied [15]. That work focuses on a family of “translator” modules that convert an input strand to an output strand of independent sequence. The family is parameterized by the redundancy parameter N defined as the number of bound domains in each fuel polymer Fi, such that the number of domains in each signal Xi is N+1. Leak is expected to decrease with decreasing overall concentration (as for the AND gate), as well as with increasing redundancy N. The decrease of leak with increasing N was confirmed by experiment, at least for small N. The reactions of the translator with N=3 are shown in Figure 2.

Figure 2: Translator cascade with redundancy parameter N=3. Polymer X0 serves as input and polymer Xn as output signal. The leak pathway with N+1 fuels coming together to generate a signal XN+1 in the absence of input is described in (a). Part (b) describes the intended reaction for each i; iterating them for i=1,,N+1 produces XN+1 given input X0. Note that if several translators are composed, then XN+1 is the input to the downstream translator and once the leak signal is generated via the pathway shown in (a) it can propagate via intended reactions to the output Xn of the last translator.

To apply our framework, we consider the system without toeholds, driven solely by entropy; with long domains alone, we are in the enthalpy neutral (athermic) regime. Prior work focused on the case where the system was prepared in a state with only fuel polymers (Fi), all at equal concentration, and zero initial concentration of waste polymers (Wi). Let on-target set be 𝒮={Fi,Wii=1,,n} with uniform μ=1. Rephrasing in our terminology, ref. [13] proved that any canonical reaction α: where Xi has entropy loss e(α)=N1, and this fact was used to argue that by increasing N we can make leak arbitrarily small. We now show that considering novelty in addition to entropy loss makes this argument problematic, and suggest an alternative parameter setting to arbitrarily decrease leak.

Consider a cascade of two translators. Importantly, the number of fuels for a single translator module is N+1; i.e., Xi and Xj overlap in sequence for i<j<i+N+1 but are completely sequence-independent for j=i+N+1. We claim that the worst-case leak pathway α is where the first translator leaks resulting in the triggering of the second translator. This pathway generates l(α)=N+3 new off-target polymers: 1 for the leaked upstream translator (all fuels coming together), N+1 for the triggered fuels of the second translator, and 1 for the output. Thus e(α)/l(α)=(N1)/(N+3). By Corollary 7.2, the concentration of the leak product in the uniform setting is bounded above by c4/3 for N=3. The bound tightens to c2 with increasing N; however, it does not get arbitrarily small. This suggests that we do not decrease equilibrium leak concentration arbitrarily by increasing “redundancy” N because while the entropy loss increases, the novelty increases as well.

To arbitrarily decrease leak, we propose to use positive initial concentrations of the waste polymers Wi. The fuel polymers are denoted F1,,Fn, and the waste polymers produced after translator triggering are W1,,Wn. Let X0 represent the input and Xn represent the output. We have n=2(N+1) for two translators composed together. This cascade of length n proceeds through the following reactions described in Figure 2:

Xi1+FiXi+Wi

for i=1,,n.

We first consider the triggered system (with-input) showing at least a constant fraction of the signal is propagated to the end as Xn. We then focus on the case without input and bound the leak. Note that this analysis utilizes Theorem 5.4 directly rather than Corollary 7.2 because we will have different concentration exponents μ for Fi and Wi in 𝒮.

For the triggered (with-input) system we define our on-target set as 𝒮={Fi,Wi,X0,Xii=1,,n}. We assign concentrations to the on-target polymers as follows: All fuel concentrations are equal to 2c, and all waste concentrations are equal to c. These concentrations correspond to concentration exponents μ(Fi)=1+logc2 and μ(Wi)=1. Let the concentration of the output in the final layer Xn be y and assign balancing concentrations of the other Xi. Since [Fi]/[Wi]=2, we have i=0n[Xi]<2y, meaning that more than half of the total signal (Xi) is at the output layer.

Now, we investigate the system without input. Let 𝒮={Fi,Wii=1,,n}. For the situation to properly correspond to the with-input case, we need to ensure that all monomer concentrations are the same between the two cases, except removing the monomer corresponding to input X0. Rather than thinking about specific monomers, we start in the with-input case and conceptually run reactions Xi+WiXi1+Fi to completely push all Xi to X0, and then remove X0 from the system.101010More precisely, this ensures that the concentrations of monomers making up on-target polymers are the same between the with-input and without-input case (other than X0). Since the total amount of all Xi is less than 2y in the with-input case, this results in: [Fi]<2c+2y and [Wi]>c2y. These correspond to μ(Fi)>logc(2c+2y) and μ(Wi)<logc(c2y).

Recall that redundancy N results in entropy penalty N1. We claim that the reaction with the smallest imbalance-novelty ratio (i.e., worst-case) is reaction β:

F1++FnG+WN+2++Wn+Xn,

where G is the “large polymer” formed after all F1,,FN displace the top strand from FN+1, and Xn is the leak output. The imbalance of this reaction is:

k(β)=i=1nμ(Fi)i=N+2nμ(Wi)>nlogc(2c+2y)n2logc(c2y). (4)

We can ensure that k(β) is at least a constant fraction of n for small enough c. For example, if we let yc/4, then k(β)n/4 for any c0.0064. The novelty is independent of n: l(β)=2 since G and Xn are not in 𝒮. Therefore, the imbalance-novelty ratio k(β)/l(β) of the worst case reaction is at least n/8, which increases linearly with N (recall n=2(N+1) for two translators composed together). Applying Theorem 5.4 leads to leak concentration of at most cn/8=c1/4+N/4. This upper bound111111Note that this upper bound is loose because of inequalities such as Equation 4. implies smaller-than c concentration of leak for N4, with the leak exponentially decreasing for larger N.

To summarize, by increasing redundancy N in the appropriate regime, we maintain the property that a constant fraction of the input is converted to output in the with-input case, while arbitrarily (exponentially in N) decreasing leak in the without-input case.

9 Discussion

Our results suggest a few important directions for future work. Given the central role of worst-case canonical reactions – i.e., canonical reactions with the lowest imbalance-novelty ratio (for Algorithm 1 and Theorem 5.4) or entropy loss-novelty ratio (Corollary 7.2) – it is important to develop formal techniques to prove that a given canonical reaction is indeed worst-case overall, at least or for a particular off-target polymer (for Section 6). Note that while combinatorial techniques in prior work in the TBN model have focused on proving entropy loss, more work is needed to study the ratio directly, making our framework more easily applicable. While we believe the canonical reactions highlighted in Section 8 are indeed worst-case, the argument is informal.

Another promising avenue of research is to establish a more direct link between a polymer’s monomer composition and its equilibrium concentration. Our current framework is effectively reaction-centric, inferring concentrations based on how polymers transform into one another. An alternative approach could be to derive concentration bounds directly from the structural properties of the off-target polymers, such as their size (monomer count) or degree of overlap with one another (multiset difference). Nonetheless, we hope that a variety of structure-based results could be proven based on a reduction to our canonical reaction framework.

Finally, this work has focused exclusively on the athermic case, where all molecular interactions are enthalpically neutral. While this is a reasonable and useful abstraction for systems with strong, saturated bonds, many real-world molecular systems, including many popular in DNA molecular programming, involve a range of binding strengths and enthalpic effects (e.g., from toehold binding). Extending our algorithmic framework to incorporate user-specified ΔG’s for each polymer could significantly broaden its applicability, and although this would complicate our algorithm, we do not anticipate any insurmountable difficulties.

References

  • [1] Keenan Breik, Cameron Chalk, David Doty, David Haley, and David Soloveichik. Programming substrate-independent kinetic barriers with thermodynamic binding networks. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 18(1):283–295, 2021. doi:10.1109/TCBB.2019.2959310.
  • [2] Keenan Breik, Chris Thachuk, Marijn Heule, and David Soloveichik. Computing properties of stable configurations of thermodynamic binding networks. Theor. Comput. Sci., 785:17–29, 2019. doi:10.1016/j.tcs.2018.10.027.
  • [3] W. Bruns, B. Ichim, C. Söger, and U. von der Ohe. Normaliz. Algorithms for rational cones and affine monoids. Available at https://www.normaliz.uni-osnabrueck.de.
  • [4] Robert M Dirks, Justin S Bois, Joseph M Schaeffer, Erik Winfree, and Niles A Pierce. Thermodynamic analysis of interacting nucleic acid strands. SIAM review, 49(1):65–88, 2007. doi:10.1137/060651100.
  • [5] David Doty, Trent A. Rogers, David Soloveichik, Chris Thachuk, and Damien Woods. Thermodynamic binding networks. In DNA Computing and Molecular Programming, 23rd International Conference, DNA 23, pages 249–266, 2017. doi:10.1007/978-3-319-66799-7_16.
  • [6] Martin Feinberg. Foundations of chemical reaction network theory. Springer, 2019.
  • [7] Stefan Hoops, Sven Sahle, Ralph Gauges, Christine Lee, Jürgen Pahle, Natalia Simus, Mudita Singhal, Liang Xu, Pedro Mendes, and Ursula Kummer. COPASI—a complex pathway simulator. Bioinformatics, 22(24):3067–3074, 2006. doi:10.1093/BIOINFORMATICS/BTL485.
  • [8] Fritz Horn and Roy Jackson. General mass action kinetics. Archive for rational mechanics and analysis, 47:81–116, 1972.
  • [9] Matthew D Johnston, Stefan Müller, and Casian Pantea. A deficiency-based approach to parametrizing positive equilibria of biochemical reaction systems. Bulletin of Mathematical Biology, 81:1143–1172, 2019.
  • [10] A. M. M. Leal. Reaktoro: An open-source unified framework for modeling chemically reactive systems, 2015. URL: https://reaktoro.org.
  • [11] John AP Sekar, Justin S Hogg, and James R Faeder. Energy-based modeling in BioNetGen. In 2016 IEEE international conference on bioinformatics and biomedicine (BIBM), pages 1460–1467. IEEE, 2016.
  • [12] Elie Soloveichik, Leo Orshansky, Cameron Chalk, and Boya Wang. Concentrat.io, 2025. Accessed 25 June 2025. URL: https://concentrat.io/.
  • [13] Chris Thachuk, Erik Winfree, and David Soloveichik. Leakless DNA strand displacement systems. In Andrew Phillips and Peng Yin, editors, DNA Computing and Molecular Programming, pages 133–153, Cham, 2015. Springer International Publishing. doi:10.1007/978-3-319-21999-8_9.
  • [14] Boya Wang, Cameron Chalk, David Doty, and David Soloveichik. Molecular computation at equilibrium via programmable entropy. bioRxiv, 2025. doi:10.1101/2024.09.13.612990.
  • [15] Boya Wang, Chris Thachuk, Andrew D. Ellington, Erik Winfree, and David Soloveichik. Effective design principles for leakless strand displacement systems. Proc. Natl. Acad. Sci. USA, 115(52):E12182–E12191, 2018. doi:10.1073/pnas.1806859115.
  • [16] Boya Wang, Chris Thachuk, and David Soloveichik. Speed and correctness guarantees for programmable enthalpy-neutral DNA reactions. ACS Synthetic Biology, 12(4):993–1006, 2023.
  • [17] Joseph N Zadeh, Conrad D Steenberg, Justin S Bois, Brian R Wolfe, Marshall B Pierce, Asif R Khan, Robert M Dirks, and Niles A Pierce. NUPACK: analysis and design of nucleic acid systems. Journal of computational chemistry, 32(1):170–173, 2011. doi:10.1002/JCC.21596.

Appendix A Hilbert Basis Implementation

This section recalls some results on Hilbert bases and explains how to use the Hilbert basis in Algorithm 1. While this section makes the main theorem mathematically rigorous, most readers can safely skip this section.

The main contents of this section are (1) the termination of Algorithm 1 and (2) the use of minimum (versus infimum) in Definition 5.2. We address both concerns by showing that, although there are infinitely many canonical reactions, we can always restrict our attention to a finite subset of them in our analysis and algorithm.

For integral vectors 𝐯1,,𝐯mn, the set C={i=1mbi𝐯i:b1,,bm0} is called a (rational polyhedral) cone, which is also known to be described by a system of inequalities C={𝐯:𝐁𝐯0} for some matrix 𝐁l×n. It is known that the set Cn has a finite subset (C)={𝐡1,,𝐡t}, called Hilbert basis of C, that generates Cn with non-negative integer coefficients, that is, for any 𝐯Cn there are a1,,at such that 𝐯=i=1tai𝐡i.

The set of canonical reactions Λ can be precisely described in terms of a rational polyhedral cone. For a canonical reaction α:12, we define a vector

𝐯α=(1[𝒫]2[𝒫])𝒫𝚿|𝚿|

capturing the stoichiometric change of concentrations due to reaction α. Note that 1[𝒫]2[𝒫]=2[𝒫]0 for P𝒮, thus 𝐯α𝐞P0 for 𝐞P=(δPP)P𝚿 for the delta function δij=1 iff i=j. Definition 2.2 ensures that this vector must satisfy the condition 𝐀𝐯α=0. Combining these, the cone

C𝒮={𝐯α|𝚿|:𝐀𝐯α0 and 𝐀𝐯α0 and 𝐯α𝐞P0 for all P𝒮}

characterizes the canonical relations: C𝒮|𝚿| is the set of vectors 𝐯α. Therefore, there exists a finite set of canonical reactions H that corresponds to the Hilbert basis (C𝒮).

The following lemma implies that for the purposes of our algorithm and analysis, we can focus on the Hilbert basis H of canonical reactions.

Lemma A.1.

Let α:12 and β:𝒩1𝒩2 be canonical reactions with li(α),li(β)0. Then, for any a,b, it holds that

ki(aα+bβ)li(aα+bβ)min(ki(α)li(α),ki(β)li(β)). (5)

The equality holds only when a=0, b=0, or ki(α)/li(α)=ki(β)/li(β).

Proof.

It is not hard to see that ki and li are linear, i.e., ki(aα+bβ)=aki(α)+bki(β) and li(aα+bβ)=ali(α)+bli(β). Then, Equation 5 is identical to the mediant inequality, which states that for p,q,r,s0 with q,s0 it holds that min(p/q,r/s)(p+r)/(q+s). Consider a canonical reaction α=a1η1++atηt for a1,,at>0 and η1,,ηtH. If α is i-levelizing, then the equality ki(α)/li(α)=μi=minj=1,..,t(ki(ηj)/li(ηj)) must hold, and the equality condition above ensures that μi=ki(ηj)/li(ηj) for all j=1,,t. In other words, the set of i-levelizing reactions is

{j=1tajηj:a1,,at} where {η1,,ηt} is the set of i-levelizing reactions in H.

This allows us to inspect the minimum over the finite set H instead of Λ in Definition 5.2. Similarly, as the Hilbert basis can be computed in finite time and implemented in [3], we can run Algorithm 1 with guaranteed termination by computing the minimum over H.

Appendix B Detailed Balance and Equilibrium

Recall that 𝐀|𝚿𝟎|×|𝚿| is the matrix such that each entry Aij specifies the number of monomers of type i in polymer j, and that 𝐀𝐱=𝐱𝟎 (Equation 1) captures the mass-conservation constraint.

Theorem B.1.

Let 𝐱𝟎(0,1)𝚿𝟎 be a fixed vector of monomer concentrations. If all reactions are balanced at the configuration 𝐱(0,1)𝚿 of polymer concentrations, then the cost function g(𝐱) is minimum subject to 𝐀𝐱=𝐱𝟎.

Proof.

(Sketch) The function g is strictly convex since its Hessian H is positive definite (specifically diagonal with Hjj=1/xj>0). Strict convexity of g implies that the local minimum of g becomes the unique (global) minimum.

We associate a vector 𝐯α|𝚿| with every reaction α, capturing the net stoichiometric effect of reaction α.121212This is the same vector 𝐯α as defined in Appendix A. For example, (1,1,0,) corresponds to X1X2 for Ψ={X1,X2,}. It is straightforward to show that the function g along with the direction of 𝐯α has zero derivative at 𝐱 if and only if the reaction α is balanced at 𝐱. More explicitly, for α:12, the P-th entry of the vector 𝐯α is (1[𝒫]2[𝒫]). The directional derivative D𝐯αg(𝐱)=P(1[𝒫]2[𝒫])log𝓍𝒫=0 implies that P1xP1[𝒫]=P2xP2[𝒫] holds, which means the reaction α is balanced.

The set {𝐱:𝐀𝐱=0} is spanned by the vectors 𝐯α for all reactions by Definition 2.2. Therefore, if all reactions are balanced at 𝐱, then any directional derivative at 𝐱 vanishes and 𝐱 is a critical point, which is the unique minimum.