Computing and Bounding Equilibrium Concentrations in Athermic Chemical Systems
Abstract
Computing equilibrium concentrations of molecular complexes is generally analytically intractable and requires numerical approaches. In this work we focus on the polymer-monomer level, where indivisible molecules (monomers) combine to form complexes (polymers). Rather than employing free-energy parameters for each polymer, we focus on the athermic setting where all interactions preserve enthalpy. This setting aligns with the strongly bonded (domain-based) regime in DNA nanotechnology when strands can bind in different ways, but always with maximum overall bonding – and is consistent with the saturated configurations in the Thermodynamic Binding Networks (TBNs) model. Within this context, we develop an iterative algorithm for assigning polymer concentrations to satisfy detailed-balance, where on-target (desired) polymers are in high concentrations and off-target (undesired) polymers are in low. Even if not directly executed, our algorithm provides effective insights into upper bounds on concentration of off-target polymers, connecting combinatorial arguments about discrete configurations such as those in the TBN model to real-valued concentrations. We conclude with an application of our method to decreasing leak in DNA logic and signal propagation. Our results offer a new framework for design and verification of equilibrium concentrations when configurations are distinguished by entropic forces.
Keywords and phrases:
Equilibrium concentrations, Thermodynamic Binding Networks, Monomer-polymer model, Detailed balanceFunding:
Hamidreza Akef: Support was provided by Schmidt Sciences.Copyright and License:
2012 ACM Subject Classification:
Theory of computation Models of computation ; Theory of computation Design and analysis of algorithmsAcknowledgements:
We thank Joshua Petrack for valuable discussions and insights.Editors:
Josie Schaeffer and Fei ZhangSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Introduction
In general, chemical equilibria of complex chemical systems are not analytically solvable and numerical tools are required for analysis. Such tools include NUPACK [17] for thermodynamic analysis of nucleic-acid systems, as well as more abstract platforms that support domain-level abstraction and free energy specification [12] including via rule-based modeling [11], and software for computing steady-state concentrations of chemical reaction networks [7, 10]. However, engineers often want a deeper understanding of the equilibrium than what analytically opaque numerical calculations can provide. Moreover, we often seek to understand infinite classes of designs such as logic circuits constructed from gate modules or parameterized constructions. For example, there is a growing body of work on leak in DNA-based systems, exhibiting a family of schemes parameterized by a “redundancy parameter” meant to decrease leak arbitrarily at the cost of additional system components [13, 5, 15, 1, 16, 14]. It is proven that producing off-target species necessarily decreases the overall number of separate complexes – the change controlled by the redundancy parameter – and thus incurs an entropic penalty. However, due to system complexity, the relationship between this rigorously proven thermodynamic unfavorability and the actual concentrations of off-target species often remains implicit.
While properties of equilibria of abstract coupled chemical reactions have been extensively explored in the chemical reaction network theory literature (e.g., [6], Chapter 14, for detailed-balance equilibria), and full explicit-parameter schemes (e.g., [9]) can characterize the entire space of equilibria, these analytical approaches have severe limitations. For large systems the resulting explicit formulas are unwieldy and offer no effective guidance on how to choose the parameters so that off-target concentrations remain below a desired bound.
In this paper we are interested in the following problem: Our chemical species are complexes (termed polymers) made up of indivisible units called monomers. We assume there are finitely many possible polymers, and among these we are given a set of on-target species that we want to have desired equilibrium concentrations, and all other off-target species to have some sufficiently low equilibrium concentrations. Our task is to determine a consistent (detailed-balance) equilibrium, and therefore the concentrations of the monomers that would lead to this equilibrium. Note that our interest in setting equilibrium concentrations rather than solving for them (based on initial concentrations or total monomer concentrations, for example) is misaligned with most computational approaches. For instance, it can be seen as the reverse problem to NUPACK 3’s concentrations tool which takes the concentrations of the monomers (strands) and returns the corresponding equilibrium concentrations of the polymers (complexes).
While having the flexibility to set monomer concentrations may appear to simplify the problem in the same way that finding some detailed-balance equilibrium is easier than computing the one consistent with input monomer concentrations, the complexity comes from simultaneously ensuring that off-target polymers remain in low concentration. Consider the natural approach of taking logarithms of all concentrations, thereby converting the detailed-balance equations (balancing each reaction) into a linear system amenable to standard linear solvers. When we fix the concentrations of on-target polymers, the remaining (off-target) concentrations are typically under-determined. The linear system then describes an unbounded affine subspace, and there is no obvious way to extract upper bounds on off-target species’ concentrations without leaving the linear framework. Our solution is an iterative algorithm that assigns concentrations to off-target polymers in decreasing order of concentration, ensuring that each off-target species remains below a desired threshold concentration when possible. Importantly, terminating the algorithm at any iteration still provides valid upper bounds for all remaining off-target polymers.
In the most general formulation of the monomer-polymer equilibrium concentration problem, the polymer free energies can be assigned arbitrarily incorporating binding strength, geometric constraints, etc. In this work, we focus on the simpler athermic case rather than tackling the problem in its full generality. Our allowed polymers are such that all possible reactions between them are enthalpy-neutral. This model is consistent with systems of strong fully-complementary DNA domains in which domain-level bonds can only switch binding partners but not de-hybridize. Thermodynamic Binding Network (TBN) [5, 2] saturated configurations capture this condition, but our setting is more general without a built-in notion of domains (binding sites).
The main result of this paper is Algorithm 1 and Theorem 5.4 showing how starting with desired concentrations of on-target polymers (already in detailed balance), we can set the concentrations of off-target polymers to satisfy detailed balance and thus thermodynamic equilibrium. In Section 4 we explain the apparent difficulties in balancing reactions which our approach needed to overcome.
If, rather than computing exact concentrations of off-target polymers, it is sufficient to bound them, then we refer the reader to Section 6. In Section 7 we apply our framework to the analysis of systems in the TBN model specifically, connecting the combinatorial notions of stability and entropy loss in the TBN model to equilibrium concentrations. In Section 8, we show applications of our method to the analysis of a simple TBN AND gate, as well as a parameterized family of signal propagation systems (translator cascades) from prior work. For the translator cascade, we argue that tuning concentrations of on-target polymers according to our framework is essential for leak to decrease exponentially with the redundancy parameter. We conclude with a discussion of future work (Section 9), including a formulation of new combinatorial conditions in the TBN model to make our framework easily applicable.
2 Model
Let denote the set of nonnegative integers. Given a finite set , we define as the set of functions .
A multiset over the finite set is described by its counting function , where for each element , the value indicates how many times appears in . We often write to mean that is a multiset over , and we denote the count of in by . The notation means that . The cardinality of a multiset , denoted , is the total number of elements in the multiset . For two multisets and over , we define their union as the multiset whose counting function is the pointwise sum . For example, let . Then, , , , and for all . Also, the cardinality of is . Finally, note that could also be written as the union , for example.
The linear combination of multisets with nonnegative integers is defined analogously: For multisets and , corresponds to the counting function . Let be two multisets over the same set . The difference is defined as the multiset such that for every , provided that for all .
We also define the intersection of a multiset with a set. Given a multiset over and a subset , the intersection is a multiset over defined by if , otherwise . For instance, .
The main object of this paper is an abstract model of monomers and polymers, motivated by systems in DNA nanotechnology. This model captures how simple indivisible molecules (monomers) combine to form complexes (polymers) under specific physical and chemical constraints.
Definition 2.1.
Let be a finite set of monomers, and be a finite set of polymers over these monomers, where each polymer is a multiset of monomers.
Let represent the vector of concentrations for all monomers, and let represent the vector of concentrations for all polymers (also called configuration). The relationship between monomer and polymer concentrations is governed by mass conservation. Specifically, we require
| (1) |
where is a matrix such that each entry specifies the number of monomers of type in polymer .
For example, the polymer contains two copies of , one of , and one of . Note that we will be interested in cases where the set of polymers of interest is a finite (proper) subset of all possible polymers over .
In DNA nanotechnology the monomers are typically DNA strands with different sequences. Polymers are analogous to a multistranded DNA structure composed of multiple DNA strands. We use the term polymer rather than “complex” in order to better emphasize their composition from monomers and to be consistent with the TBN literature.
To model the equilibrium behavior of such systems, we use the free energy formulation in the notation of Dirks et al. [4], where the equilibrium concentrations are obtained by minimizing the following free energy function (corresponding to the pseudo-Helmholtz free energy used throughout chemical reaction network theory literature [8, 6]):
| (2) |
where denotes the concentration of polymer , and is its partition function corresponding to the exponential of the polymer’s negative free energy. The minimization is subject to the mass conservation constraint given in Equation 1.
In this work, we focus on athermic systems where all interactions are equally favored enthalpically. Thus we assume that is constant for all configurations satisfying mass conservation (Equation 1), yielding a simplified cost function that is entirely entropic:
| (3) |
This function serves as the objective of our optimization problem. Its minimizer under the constraint of Equation 1 corresponds to the equilibrium concentration of polymers.
Understanding the equilibrium requires us to formalize how polymers can transform into one another. We do so by defining reconfigurations and the reactions they induce.
Definition 2.2.
Two multisets of polymers are reconfigurations of each other, written , if for every monomer , the total count of is the same in as in ; i.e., . Whenever , we also define reaction .
We occasionally use Greek alphabets to denote the reactions. Note that while is a binary relation, a reaction is an ordered pair representing a directed transformation between the multisets. Our arguments will refer to reactants (left-hand side) and products (right-hand side) of particular reactions.111All reactions are of course reversible, but we treat each direction as a separate reaction.
An important property of the minimum of Equations 2 and 3 is that it satisfies detailed balance over reactions [8, 6]. For us (Equation 3), for any single reaction , the equilibrium concentrations satisfy:
We say that a reaction is balanced when this equality holds. This notion of balance will be central to our characterization of equilibrium concentrations formed by on-target polymers, which we explore next.
The fact that balance leads to the minimum of the free energy is well-established, and we present its sketch in Appendix B for the completeness of the paper.
3 Characterizing On-target Polymers
We now define the set of on-target polymers – intuitively, these are the high-concentration polymers whose equilibrium concentration we set as input to our algorithm.
Definition 3.1.
Let be a set of polymers, and let be a function assigning a concentration exponent to each polymer in . We say that is on-target with concentration exponents if:
-
1.
For every polymer , there are multisets of polymers and where contains such that .
-
2.
For any two multisets , if , then their concentration exponents are equal . Here, the concentration exponent of a multiset is defined as .
While presented in terms of restricting , another (perhaps more apropos) interpretation of the first condition is that we are restricting to be only those polymers that can be obtained via a reconfiguration of polymers over . The interpretation of the second condition is that every reaction among polymers in is in detailed balance. More precisely:
Remark 3.2.
Let , and suppose every on-target polymer has concentration . Then every reaction , where , satisfies and thus the product of the concentrations of the left-hand side polymers is equal to the product of the concentrations of the right-hand side polymers.222Note that we use symbol for concentration exponents because of the direct parallel to the standard notion of chemical potential, which is expressed with logarithmic terms of concentration: chemical potential , where is the mole fraction.333Our base concentration is smaller than one because the units of concentration are mole fractions – the ratio of polymers to solvent molecules, and the derivation of the energy function Equation 2 only holds in the regime of less polymer than solvent [4].
If then we are done, being guaranteed that all reactions are in detailed balance. However, the case of interest is where we do not know the concentration exponents of all polymers – making it the goal of this paper to compute them or bound them to ensure that the equilibrium concentration of off-target polymers is small.
We now define canonical reactions, which are the reactions with reactants from . These reactions can generate polymers outside of from polymers in . Each canonical reaction has key quantities we call imbalance and novelty which will play a key role in our algorithm.
Definition 3.3.
For an on-target set with , a reaction is called canonical if all its reactants are over (i.e., ). Then the imbalance of is defined as , where . The novelty of is defined by .
Given the definition of multiset subtraction, the novelty is always non-negative.
Intuitively, the imbalance of the reaction represents how far from detailed balance it is if we consider only the polymers whose concentrations are known. The novelty is the number of off-target (new) polymers produced by the same reaction.
Intuitively, a large means a larger bias of the reaction toward its reactants, i.e., elements of , prior to assigning concentration exponents to off-target polymers. This is desired since it implies less pressure to make off-target polymers and gives us more room to maneuver in assigning concentrations to them.
Intuitively, a larger novelty means that the canonical reaction is more entropically favored to its products. Note that the term in Equation 3 is minimized when the concentration is more “spread out” over the different species (like Shannon entropy), and thus there is an entropic force to generate new polymers. A large is thus undesired. As justified in the subsequent results, the ratio captures the effective tradeoff between imbalance and novelty.
Note that summing canonical reactions gives another canonical reaction. While on the one hand this leads to infinitely many canonical reactions, on the other hand, this additive property allows us to analyze the set of all canonical reactions using a Hilbert basis of finite size. We explain how to employ the Hilbert basis in our algorithm to avoid the infinity of canonical reactions in Appendix A.
The notion of stability of , defined below, captures the idea that on-target polymers are in higher concentration compared to off-target polymers. Recall that concentration exponents are for polymers in (Definition 3.1). As shown later by Theorem 5.4, the following definition ensures that all concentration exponents of polymers outside of are greater than by our algorithm. Since the base concentration , this implies that off-target polymers are in smaller concentration.
Definition 3.4.
The on-target set with is called stable if, for every canonical reaction with , the ratio .
While our formalism allows different concentration exponents for different polymers in , nearly all insight can be obtained from the simple case of uniform on-target concentration exponents:
Definition 3.5.
The on-target set with is called uniform if every polymer has concentration exponent .
4 Why Balancing is Nontrivial
Our main goal is to ensure that in equilibrium the concentration of on-target polymers is much higher than the others. This section presents two examples illustrating why this is nontrivial.
Consider a uniform set of on-target polymers , aiming at the equilibrium concentration . For two off-target polymers and , suppose that we figured out a canonical reaction
This reaction is in detailed balance if and only if At this point, it is unclear how to balance and without inspecting the other reactions: For example, the reaction is balanced by assigning , but another canonical reaction would not be if it exists.
The following example exhibits a different potential problem. While Remark 3.2 shows that reactions entirely within are balanced, it is not clear that given our choice of concentration exponents for , the reactions involving polymers outside of could be balanced at all. Suppose there are two reactions
These reactions intuitively say that the off-target polymer is non-interacting to other polymers in these reactions so that the above problem of balancing the concentrations over off-target polymers is absent.444Looking ahead, these non-interacting polymers are indeed easier to assign concentrations to as shown in Section 5.1. However, balancing the reaction suggests the concentration at equilibrium, but the reaction gives . In other words, it is unclear if the concentrations of each of the polymers in can be the same or even close to each other.
Despite these difficulties, our main result guarantees that the configuration on on-target polymers we demand is in equilibrium without the above issue, and can be extended to the configuration over all polymers at equilibrium, and ensures that the off-target polymers remain in very low concentration at equilibrium. Intuitively, we prove that there exists a canonical reaction for which we can assign concentrations to product polymers without creating conflicts with other reactions, and we provide a method to find this reaction. Moreover, as we will see later, these assignments occur in order of decreasing concentration allowing us to restrict the concentration of off-target species.
5 Main Result: Concentration of Off-target Polymers
For the remainder, we fix , , and a particular set of on-target polymers with concentration exponents .
To systematically assign concentration exponents to all polymers, we will organize them into hierarchical groups called levels. The process begins with a designated set of polymers , the on-target polymers, which are assigned initial concentration exponents via .
All other polymers, called off-target polymers, will be partitioned into level sets , constructed inductively based on how these polymers appear in certain canonical reactions. At each level , we will compute a scalar value that serves as the concentration exponent for all polymers newly added at that level.
In this way, we gradually extend the initial function to a global function that assigns a concentration exponent to every polymer in the system. This inductive process and the precise definitions of , , and will be described in detail below.
Definition 5.1.
Given , assume is assigned for any . For a canonical reaction , we define . The th-level imbalance of is defined as , and the th-level novelty by .
While technically depends on the level index , we suppress this dependency in the notation for simplicity; it will be clear from the context that it is updated at each level. Additionally, by the definition of a canonical reaction, we have in the expression above.
Definition 5.2.
Let where the minimum is taken over all canonical reactions with .555The minimum can actually be taken over a finite subset of canonical reactions using Hilbert basis. We refer the reader to Appendix A for more detail. The canonical reactions achieving the minimum are termed -levelizing canonical reactions. The th level set is defined as the set of all polymers appearing in any -levelizing reactions not already in . Every polymer is assigned concentration exponent .
At the heart of this inductive construction is the requirement that each -levelizing reaction maintains balance with respect to the assigned concentration exponents. That is, for every reaction used to define level , the assigned exponents ensure the reaction remains detailed balanced, . This holds because each polymer in that already appeared in previous levels contributes its already-defined exponent, while newly introduced polymers in all receive the same value . Decomposing into previously levelized polymers and new polymers in , we obtain:
ensuring that the total concentration exponent on both sides of the reaction is equal. This confirms that the assignment of for polymers in level is consistent with the balance requirements dictated by the underlying reactions.
The full procedure for determining the level sets and the corresponding concentration exponents is formally specified in Definitions 5.1 and 5.2 and carried out through the step-by-step process described in Algorithm 1. This algorithm inductively builds the extended concentration exponent function by identifying polymers that can be levelized at each step and assigning them an appropriate exponent value based on their participation in canonical reactions.
While the algorithm itself does not explicitly state a stopping condition, its termination is ensured by the structure of the level construction process. Each time a new level is defined, it includes at least one polymer not present in any previous level. Since the set of all polymers is finite by assumption, only finitely many levels can be introduced. As a result, the inductive process must terminate after assigning a level and concentration exponent to each polymer, thus completing the definition of the extended function .
For us, canonical reactions represent the simplest meaningful interactions and serve as building blocks for more complex behavior. Their central role for our results is motivated by the following lemma:
Lemma 5.3.
Let be a vector of concentrations of the monomers. If all canonical reactions are balanced at configuration , then the cost function is minimum subject to .
Proof.
We will show that any arbitrary reaction can be decomposed into a canonical reaction. Consequently, if detailed balance holds for all canonical reactions, it follows that detailed balance holds for all reactions. Per Appendix B, the cost function reaches its minimum – under the constraint – when all reactions are balanced. This confirms the statement of the lemma.
Consider an arbitrary non-canonical reaction with where denotes the union of two multisets and . According to the first condition of Definition 3.1, there exists a canonical reaction that produces . Now, apply and sequentially on the reactants . This gives rise to a new reaction:
Therefore, the reaction , when catalyzed by (i.e., adding to reactants and products) and using the inverse of the canonical reaction , can be replaced by the reaction , which involves fewer reactant polymers outside of than did originally:
By repeating this procedure, decomposes into a canonical reaction. This concludes the proof.
With this groundwork in place, we now state our main theorem, which characterizes the equilibrium distribution of the entire polymer system, both on-target and off-target, in terms of their assigned levels.
Theorem 5.4.
Let be the stable set of on-target polymers with concentration exponents . For the extended concentration exponent generated by Algorithm 1 and for any , there are monomer concentrations such that the configuration with each polymer at concentration is the minimum of subject to (i.e., Equations 3 and 1). Furthermore, for all .
In this configuration, every off-target polymer has strictly lower concentration than any on-target polymer. This is because concentrations scale exponentially with the extended exponent , and off-target polymers are assigned strictly larger exponents than the shared minimal value of the on-target set . As a result, for , off-target concentrations are exponentially suppressed.
Proof of Theorem 5.4.
We use a proof by contradiction to show that each canonical reaction must be balanced in the configuration with , which suffices to conclude the proof thanks to Lemma 5.3.
Suppose that there exists a canonical reaction . We consider two cases or . Recall all polymers are levelized.
Case 1.
Let be the top level among the polymers in . Note that for each . Since is defined as the minimum of over all canonical reactions with , we must have:
which is a contradiction.
Case 2.
Let be the levelizing reaction including as product for each polymer . Consider a canonical reaction obtained by summing copies of which includes as products.666Here we use the linear combination of reactions in a standard way; for example, for and , the reaction denotes the reaction . By applying to this reaction, we have a canonical reaction such that
Therefore this case is reduced to Case 1, leading once again to a contradiction.
The “Furthermore” part is proven in Corollary 6.4.
5.1 Non-Interacting Off-target Polymers
Our main theorem implies that if the on-target polymers and the corresponding concentration exponents are chosen appropriately, the equilibrium exists with the extended concentration exponents consistent with . In this section, we show that exponents for non-interacting off-target polymers can be assigned in a particularly easy way. Non-interacting off-target polymers are those that can be independently made by a canonical reaction:
Corollary 5.5.
Let be a stable set of on-target polymers. Suppose that an off-target polymer is a product of a canonical reaction for some multisets and from .777Note that there may be other reactions involving with other off-target polymers. Then
Proof.
By Theorem 5.4, the reaction is in detailed balance, , and we take the logarithm of both sides.
The same idea extends further. At any point of the execution of the algorithm, suppose that the extended concentration exponents of the set of polymers have been already assigned. For a non-interacting polymer outside of with the reaction for multisets and from , we can assign This assignment is valid by the same reason to the corollary.
Example 5.6.
Consider the set of monomers and polymers given by and ; where , , , , , and . Consider the uniform on-target set . Instead of inspecting all canonical reactions, we can focus on three canonical reactions
Here is the non-interacting off-target polymer in , thus . After assigning , and become non-interacting off-target polymers in and , respectively, so that we derive and .
6 Framework for Upper-Bounding Off-target Polymers
Often we are only interested in an upper bound on the concentration of an off-target undesired polymer. In this case, rather than generating its exact equilibrium concentration via Algorithm 1, we can more efficiently compute an upper bound by narrowing our focus to reactions that directly produce instead of examining the full set of canonical reactions. This leads to a simpler surrogate quantity that approximates from below.
Definition 6.1.
For , let where the minimum is taken over all canonical reactions that include as a product.
Since has not yet been levelized by construction, any such reaction producing , obviously involves at least one unlevelized polymer, ensuring .
Intuitively, captures a conservative estimate of the exponent with which polymer can grow in concentration at level , based only on reactions that actually produce . Since it is defined as a minimum over a subset of possible reactions, it is easier to compute than the full concentration exponent , yet is still meaningful for analysis:
Theorem 6.2.
For any polymer , .
Lemma 6.3.
For a canonical reaction with , it holds that .
Proof.
Note that by definition of . Let be the count of level- product polymers in , which must be smaller than . Then we have
Proof of Theorem 6.2.
Since all polymers will eventually be levelized, there exists a -levelizing canonical reaction that includes as a product for some . Then, where the first inequality follows from Definition 6.1, and the second follows from Lemma 6.3.
Furthermore, the value is computed by taking the minimum of only over canonical reactions that produce (and have ). In contrast, is defined as the minimum over all canonical reactions, regardless of whether they produce or not. In other words, the set of reactions considered when computing is a subset of those considered for , making the search space for more restricted. Since a minimum over a smaller set cannot be smaller than the minimum over a larger set, we conclude that .
To summarize, serves as a computationally efficient lower bound on . When used alongside Theorem 5.4, this upper bounds the equilibrium concentration of .
The usefulness of Theorem 6.2 extends beyond individual estimates. It contributes to structural insights about level assignments during the iterations of our algorithm. Specifically, after levelizing up through , we know that any unlevelized polymer satisfies . This inequality ensures that polymers awaiting level assignment must exhibit smaller concentrations. In particular, because the first level imbalance and novelty are precisely identical to and defined in Definition 3.3, we derive and the following corollary.
Corollary 6.4.
for all .
7 Concentrations of Polymers in the TBN Model
While the TBN model is combinatorial in nature, quantifying over discrete (saturated) configurations, at the end we are most often interested in determining real-valued concentrations, which are accessible to (bulk) wet-lab experiment. The framework developed in this paper helps to bridge this gap between combinatorics of discrete configurations and concentrations.888Of course, much of the heavy lifting in bridging this gap is done by the derivation [4] of the cost function , but our work expands on it beyond numerical simulation.
Consider a saturated (i.e., maximally bonded) configuration in the TBN model. Quantified over all saturated reconfigurations , the key quantity of interest in the TBN model is the “entropy” of , defined as the number of polymers in (i.e., ). The TBN model defines the “stable” configurations to be those that have maximum entropy among all saturated configurations.
Corresponding to the TBN literature [5], a multiset of polymers is called TBN-stable if any reaction has . We say that a set of polymers is TBN-stability closed if every multiset is TBN-stable and further any reaction where contains a polymer outside of (i.e., for some ) satisfies . In other words, producing a polymer outside of a TBN-stability closed set costs entropy. The following lemma connects our notion of stability of on-target polymers (Definition 3.4) to TBN-stability.
Lemma 7.1.
If is TBN-stability closed and satisfies condition (1) of Definition 3.1, then is on-target with for all (i.e., uniform) and stable.
Proof.
We first check is on-target with . For a reaction where and are multisets over , . Applying the same argument to yields . Combining the two gives , proving with is an on-target set.
Now we will prove that uniform on-target is stable. Consider a canonical reaction with . By definition, we have and . Therefore, the condition for being stable, namely , is implied by .
Thus, the polymers in the TBN-stability closed set represent the intended “high-concentration” polymers, while everything outside of is considered undesired (off-target).
Let canonical reaction be ; we define to be the entropy loss of the reaction . Recall that during the first iteration of our algorithm, novelty is the number of off-target polymers generated in canonical reaction . The imbalance in the first iteration can be understood in terms of the entropy loss of the reaction: the decrease in the number of polymers of a reaction is exactly . Thus to have an upper bound on the concentration of off-target polymers via Theorem 6.2, it is sufficient to find the smallest ratio of any reaction:
Corollary 7.2.
Let set be a TBN-stability closed set of polymers. Let minimized over all reactions where , and . For any , there are monomer concentrations and configuration that minimizes subject to where satisfies: each polymer has concentration exactly , and each polymer has concentration not more than .
In other words, if is a TBN-stability closed set of polymers then we can assign concentration to each of them (uniform assignment). Then we consider the “worst” way to generate any polymers outside of (i.e., off-target): the way that has the smallest entropy loss and the largest novelty. The ratio of the entropy loss to novelty gives us an upper bound on the concentration exponent of any off-target polymer, bounding its concentration by . The smallest entropy loss to generate off-target polymers is already the key item of interest in the TBN literature. Thus the above corollary helps to bootstrap concentration bounding arguments via existing arguments based on entropy loss.
8 Example Applications
In this section we show several applications of our mathematical tools in the analysis of existing systems of interest in the DNA molecular programming literature. We base our arguments on previous results proving the entropy loss (quantity of Corollary 7.2) of the systems in producing off-target (leak) species.
We note that while existing literature succeeds in characterizing entropy loss via TBN-like combinatorial arguments, more work is needed to develop similarly rigorous combinatorial arguments on novelty (quantity ); see also the discussion in Section 9. In the examples to follow, we claim to have identified the worst-case canonical reactions – i.e., having the least or ratio – without proof.
We first consider the TBN AND gate introduced in prior work [5, 2] and recently experimentally realized [14]. Figure 1 (left) shows the desired functionality of the AND gate in which inputs and cooperate to produce . We are interested in bounding the concentration of (leak) that can be produced in the absence of inputs and . Phrased in our terminology, combinatorial TBN arguments have shown that is TBN-stability closed, as well as and where one of the two inputs is present. This implies that any canonical reaction producing has entropy loss . However, such arguments do not directly connect entropy loss to leak concentrations, justifying the need for a tool like our Corollary 7.2.
To apply our framework to the TBN AND gate, we use Corollary 7.2. In the absence of both inputs, take . We claim that the worst-case canonical reaction producing is shown in Figure 1 (middle). This reaction has entropy loss and novelty since and are outside of . Thus for any base concentration , there is an equilibrium with where the leak concentration is .
Similarly, consider the case of having input but no input . We take and claim999While we do not provide a formal proof that this reaction is worst-case, our confidence is based on the observation that we cannot increase the novelty further without increasing the entropy loss in proportion. For example, we can combine two copies of reaction to yield reaction but then and , maintaining the same ratio. that the worst-case canonical reaction producing is as shown in Figure 1 (right). This reaction has entropy loss and novelty since , , and are outside of , yielding the equilibrium leak concentration of . Our analysis thus provides concrete polynomial upper bounds on leak, consistent with the qualitative expectation of the TBN model that leak should become comparatively negligible in the limit of decreasing . The bound is smaller when both inputs are absent than when is present.
As the next example, we consider the “leakless” DNA strand displacement system previously theoretically and experimentally studied [15]. That work focuses on a family of “translator” modules that convert an input strand to an output strand of independent sequence. The family is parameterized by the redundancy parameter defined as the number of bound domains in each fuel polymer , such that the number of domains in each signal is . Leak is expected to decrease with decreasing overall concentration (as for the AND gate), as well as with increasing redundancy . The decrease of leak with increasing was confirmed by experiment, at least for small . The reactions of the translator with are shown in Figure 2.
To apply our framework, we consider the system without toeholds, driven solely by entropy; with long domains alone, we are in the enthalpy neutral (athermic) regime. Prior work focused on the case where the system was prepared in a state with only fuel polymers (), all at equal concentration, and zero initial concentration of waste polymers (). Let on-target set be with uniform . Rephrasing in our terminology, ref. [13] proved that any canonical reaction where has entropy loss , and this fact was used to argue that by increasing we can make leak arbitrarily small. We now show that considering novelty in addition to entropy loss makes this argument problematic, and suggest an alternative parameter setting to arbitrarily decrease leak.
Consider a cascade of two translators. Importantly, the number of fuels for a single translator module is ; i.e., and overlap in sequence for but are completely sequence-independent for . We claim that the worst-case leak pathway is where the first translator leaks resulting in the triggering of the second translator. This pathway generates new off-target polymers: for the leaked upstream translator (all fuels coming together), for the triggered fuels of the second translator, and for the output. Thus . By Corollary 7.2, the concentration of the leak product in the uniform setting is bounded above by for . The bound tightens to with increasing ; however, it does not get arbitrarily small. This suggests that we do not decrease equilibrium leak concentration arbitrarily by increasing “redundancy” because while the entropy loss increases, the novelty increases as well.
To arbitrarily decrease leak, we propose to use positive initial concentrations of the waste polymers . The fuel polymers are denoted , and the waste polymers produced after translator triggering are . Let represent the input and represent the output. We have for two translators composed together. This cascade of length proceeds through the following reactions described in Figure 2:
for .
We first consider the triggered system (with-input) showing at least a constant fraction of the signal is propagated to the end as . We then focus on the case without input and bound the leak. Note that this analysis utilizes Theorem 5.4 directly rather than Corollary 7.2 because we will have different concentration exponents for and in .
For the triggered (with-input) system we define our on-target set as . We assign concentrations to the on-target polymers as follows: All fuel concentrations are equal to , and all waste concentrations are equal to . These concentrations correspond to concentration exponents and . Let the concentration of the output in the final layer be and assign balancing concentrations of the other . Since , we have , meaning that more than half of the total signal () is at the output layer.
Now, we investigate the system without input. Let . For the situation to properly correspond to the with-input case, we need to ensure that all monomer concentrations are the same between the two cases, except removing the monomer corresponding to input . Rather than thinking about specific monomers, we start in the with-input case and conceptually run reactions to completely push all to , and then remove from the system.101010More precisely, this ensures that the concentrations of monomers making up on-target polymers are the same between the with-input and without-input case (other than ). Since the total amount of all is less than in the with-input case, this results in: and . These correspond to and .
Recall that redundancy results in entropy penalty . We claim that the reaction with the smallest imbalance-novelty ratio (i.e., worst-case) is reaction :
where is the “large polymer” formed after all displace the top strand from , and is the leak output. The imbalance of this reaction is:
| (4) |
We can ensure that is at least a constant fraction of for small enough . For example, if we let , then for any . The novelty is independent of : since and are not in . Therefore, the imbalance-novelty ratio of the worst case reaction is at least , which increases linearly with (recall for two translators composed together). Applying Theorem 5.4 leads to leak concentration of at most . This upper bound111111Note that this upper bound is loose because of inequalities such as Equation 4. implies smaller-than concentration of leak for , with the leak exponentially decreasing for larger .
To summarize, by increasing redundancy in the appropriate regime, we maintain the property that a constant fraction of the input is converted to output in the with-input case, while arbitrarily (exponentially in ) decreasing leak in the without-input case.
9 Discussion
Our results suggest a few important directions for future work. Given the central role of worst-case canonical reactions – i.e., canonical reactions with the lowest imbalance-novelty ratio (for Algorithm 1 and Theorem 5.4) or entropy loss-novelty ratio (Corollary 7.2) – it is important to develop formal techniques to prove that a given canonical reaction is indeed worst-case overall, at least or for a particular off-target polymer (for Section 6). Note that while combinatorial techniques in prior work in the TBN model have focused on proving entropy loss, more work is needed to study the ratio directly, making our framework more easily applicable. While we believe the canonical reactions highlighted in Section 8 are indeed worst-case, the argument is informal.
Another promising avenue of research is to establish a more direct link between a polymer’s monomer composition and its equilibrium concentration. Our current framework is effectively reaction-centric, inferring concentrations based on how polymers transform into one another. An alternative approach could be to derive concentration bounds directly from the structural properties of the off-target polymers, such as their size (monomer count) or degree of overlap with one another (multiset difference). Nonetheless, we hope that a variety of structure-based results could be proven based on a reduction to our canonical reaction framework.
Finally, this work has focused exclusively on the athermic case, where all molecular interactions are enthalpically neutral. While this is a reasonable and useful abstraction for systems with strong, saturated bonds, many real-world molecular systems, including many popular in DNA molecular programming, involve a range of binding strengths and enthalpic effects (e.g., from toehold binding). Extending our algorithmic framework to incorporate user-specified ’s for each polymer could significantly broaden its applicability, and although this would complicate our algorithm, we do not anticipate any insurmountable difficulties.
References
- [1] Keenan Breik, Cameron Chalk, David Doty, David Haley, and David Soloveichik. Programming substrate-independent kinetic barriers with thermodynamic binding networks. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 18(1):283–295, 2021. doi:10.1109/TCBB.2019.2959310.
- [2] Keenan Breik, Chris Thachuk, Marijn Heule, and David Soloveichik. Computing properties of stable configurations of thermodynamic binding networks. Theor. Comput. Sci., 785:17–29, 2019. doi:10.1016/j.tcs.2018.10.027.
- [3] W. Bruns, B. Ichim, C. Söger, and U. von der Ohe. Normaliz. Algorithms for rational cones and affine monoids. Available at https://www.normaliz.uni-osnabrueck.de.
- [4] Robert M Dirks, Justin S Bois, Joseph M Schaeffer, Erik Winfree, and Niles A Pierce. Thermodynamic analysis of interacting nucleic acid strands. SIAM review, 49(1):65–88, 2007. doi:10.1137/060651100.
- [5] David Doty, Trent A. Rogers, David Soloveichik, Chris Thachuk, and Damien Woods. Thermodynamic binding networks. In DNA Computing and Molecular Programming, 23rd International Conference, DNA 23, pages 249–266, 2017. doi:10.1007/978-3-319-66799-7_16.
- [6] Martin Feinberg. Foundations of chemical reaction network theory. Springer, 2019.
- [7] Stefan Hoops, Sven Sahle, Ralph Gauges, Christine Lee, Jürgen Pahle, Natalia Simus, Mudita Singhal, Liang Xu, Pedro Mendes, and Ursula Kummer. COPASI—a complex pathway simulator. Bioinformatics, 22(24):3067–3074, 2006. doi:10.1093/BIOINFORMATICS/BTL485.
- [8] Fritz Horn and Roy Jackson. General mass action kinetics. Archive for rational mechanics and analysis, 47:81–116, 1972.
- [9] Matthew D Johnston, Stefan Müller, and Casian Pantea. A deficiency-based approach to parametrizing positive equilibria of biochemical reaction systems. Bulletin of Mathematical Biology, 81:1143–1172, 2019.
- [10] A. M. M. Leal. Reaktoro: An open-source unified framework for modeling chemically reactive systems, 2015. URL: https://reaktoro.org.
- [11] John AP Sekar, Justin S Hogg, and James R Faeder. Energy-based modeling in BioNetGen. In 2016 IEEE international conference on bioinformatics and biomedicine (BIBM), pages 1460–1467. IEEE, 2016.
- [12] Elie Soloveichik, Leo Orshansky, Cameron Chalk, and Boya Wang. Concentrat.io, 2025. Accessed 25 June 2025. URL: https://concentrat.io/.
- [13] Chris Thachuk, Erik Winfree, and David Soloveichik. Leakless DNA strand displacement systems. In Andrew Phillips and Peng Yin, editors, DNA Computing and Molecular Programming, pages 133–153, Cham, 2015. Springer International Publishing. doi:10.1007/978-3-319-21999-8_9.
- [14] Boya Wang, Cameron Chalk, David Doty, and David Soloveichik. Molecular computation at equilibrium via programmable entropy. bioRxiv, 2025. doi:10.1101/2024.09.13.612990.
- [15] Boya Wang, Chris Thachuk, Andrew D. Ellington, Erik Winfree, and David Soloveichik. Effective design principles for leakless strand displacement systems. Proc. Natl. Acad. Sci. USA, 115(52):E12182–E12191, 2018. doi:10.1073/pnas.1806859115.
- [16] Boya Wang, Chris Thachuk, and David Soloveichik. Speed and correctness guarantees for programmable enthalpy-neutral DNA reactions. ACS Synthetic Biology, 12(4):993–1006, 2023.
- [17] Joseph N Zadeh, Conrad D Steenberg, Justin S Bois, Brian R Wolfe, Marshall B Pierce, Asif R Khan, Robert M Dirks, and Niles A Pierce. NUPACK: analysis and design of nucleic acid systems. Journal of computational chemistry, 32(1):170–173, 2011. doi:10.1002/JCC.21596.
Appendix A Hilbert Basis Implementation
This section recalls some results on Hilbert bases and explains how to use the Hilbert basis in Algorithm 1. While this section makes the main theorem mathematically rigorous, most readers can safely skip this section.
The main contents of this section are (1) the termination of Algorithm 1 and (2) the use of minimum (versus infimum) in Definition 5.2. We address both concerns by showing that, although there are infinitely many canonical reactions, we can always restrict our attention to a finite subset of them in our analysis and algorithm.
For integral vectors , the set is called a (rational polyhedral) cone, which is also known to be described by a system of inequalities for some matrix . It is known that the set has a finite subset , called Hilbert basis of , that generates with non-negative integer coefficients, that is, for any there are such that
The set of canonical reactions can be precisely described in terms of a rational polyhedral cone. For a canonical reaction , we define a vector
capturing the stoichiometric change of concentrations due to reaction . Note that for , thus for the delta function iff . Definition 2.2 ensures that this vector must satisfy the condition . Combining these, the cone
characterizes the canonical relations: is the set of vectors . Therefore, there exists a finite set of canonical reactions that corresponds to the Hilbert basis .
The following lemma implies that for the purposes of our algorithm and analysis, we can focus on the Hilbert basis of canonical reactions.
Lemma A.1.
Let and be canonical reactions with . Then, for any , it holds that
| (5) |
The equality holds only when , , or .
Proof.
It is not hard to see that and are linear, i.e., and . Then, Equation 5 is identical to the mediant inequality, which states that for with it holds that Consider a canonical reaction for and . If is -levelizing, then the equality must hold, and the equality condition above ensures that for all . In other words, the set of -levelizing reactions is
This allows us to inspect the minimum over the finite set instead of in Definition 5.2. Similarly, as the Hilbert basis can be computed in finite time and implemented in [3], we can run Algorithm 1 with guaranteed termination by computing the minimum over .
Appendix B Detailed Balance and Equilibrium
Recall that is the matrix such that each entry specifies the number of monomers of type in polymer , and that (Equation 1) captures the mass-conservation constraint.
Theorem B.1.
Let be a fixed vector of monomer concentrations. If all reactions are balanced at the configuration of polymer concentrations, then the cost function is minimum subject to .
Proof.
(Sketch) The function is strictly convex since its Hessian is positive definite (specifically diagonal with ). Strict convexity of implies that the local minimum of becomes the unique (global) minimum.
We associate a vector with every reaction , capturing the net stoichiometric effect of reaction .121212This is the same vector as defined in Appendix A. For example, corresponds to for . It is straightforward to show that the function along with the direction of has zero derivative at if and only if the reaction is balanced at . More explicitly, for , the -th entry of the vector is . The directional derivative implies that holds, which means the reaction is balanced.
The set is spanned by the vectors for all reactions by Definition 2.2. Therefore, if all reactions are balanced at , then any directional derivative at vanishes and is a critical point, which is the unique minimum.
