Smaller Circuits for Bit Addition

Goncharov, Mikhail; Kulikov, Alexander S.; Levtsov, Georgie

doi:10.4230/LIPIcs.STACS.2026.46

Smaller Circuits for Bit Addition

Mikhail Goncharov Neapolis University Pafos, Cyprus
JetBrains Research, Pafos, Cyprus Alexander S. Kulikov

JetBrains Research, Pafos, Cyprus Georgie Levtsov Neapolis University Pafos, Cyprus
JetBrains Research, Pafos, Cyprus

Abstract

Bit addition arises virtually everywhere in digital circuits: arithmetic operations, increment/decrement operators, computing addresses and table indices, and so on. Since bit addition is such a basic task in Boolean circuit synthesis, a lot of research has been done on constructing efficient circuits for various special cases of it. A vast majority of these results are devoted to optimizing the circuit depth (also known as delay).

In this paper, we investigate the circuit size (also known as area) over the full binary basis of bit addition. Most of the known circuits are built from Half Adders and Full Adders as suggested by Dadda in 1965 for designing multiplier circuits. Applying these ideas to the bit addition function, one gets a $5n-3m$ upper bound on its circuit size, where $n$ is the number of input bits and $m$ is the number of output bits. We prove an upper bound $4.5n-2m$ . In the regimes where $m$ is small compared to $n$ (for example, for computing the sum of $n$ bits or multiplying two $n$ -bit integers), this leads to $10\%$ improvement. We also show that it is provably impossible to improve the two upper bounds above to $5n-3.01m$ or $4.5n-2.51m$ . We achieve this by establishing that the circuit size of the increment function (a special case of the bit addition function with $m=n+1$ ) is equal to $2n$ .

We complement our theoretical result by an open-source implementation of generators producing circuits for bit addition and multiplication. The generators allow one to produce the corresponding circuits in two lines of code and to compare them to existing designs.

Keywords and phrases:

bit addition, summation, multiplier, multiplication, Boolean, circuit, synthesis, combinational, digital

Copyright and License:

2012 ACM Subject Classification:

Theory of computation

\rightarrow

Logic and verification ; Theory of computation

\rightarrow

Complexity theory and logic ; Theory of computation

\rightarrow

Circuit complexity

Supplementary Material:

Software (Source Code): https://github.com/spbsat/cirbo
archived at

swh:1:dir:bb1d9d50044cfe5a05b7c8dadc7404b306c90cd1

Acknowledgements:

We thank the anonymous reviewers for many helpful comments.

DOI:

10.4230/LIPIcs.STACS.2026.46

Event:

43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026)

Editors:

Meena Mahajan, Florin Manea, Annabelle McIver, and Nguyễn Kim Thắng

Series and Publisher:

Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik

1 Overview

Bit addition arises virtually everywhere in digital circuits: arithmetic operations, increment/decrement operators, computing addresses and table indices, and so on. Three specific scenarios where it is used frequently are listed below.

$\blacksquare$

Adding two $n$ -bit numbers.
$\blacksquare$

Computing a symmetric Boolean function (such as majority or sorting). A natural way of doing this is to first compute the binary representation of the sum of $n$ input bits (that is, to compress $n$ bits into about $\log_{2}n$ bits) and then to compute the function at hand out of the computed binary representation.
$\blacksquare$

To multiply two $n$ -bit numbers, one may first compute all partial products (that is, products of the bits of the two input numbers) and then sum up the resulting bits.

In terms of the dot-notation introduced by Dadda [4], the three scenarios discussed above are visualized as shown in Figure 1. In this notation, one places bits of the same significance on the same vertical layer.

Figure 1: Dot diagrams for three Boolean functions:

\operatorname{ADD}_{5}

adds two five-bit numbers,

\operatorname{SUM}_{5}

adds five bits, and

\operatorname{MULT}_{5}

adds five (appropriately shifted) five-bit numbers.

There are many other cases where one needs to add bits. Say, one may want to add a single bit to an $n$ -bit number (the increment operation is a special case), or to add three $n$ -bit numbers, or to add a few bits of varying significance, see Figure 2.

Figure 2: More scenarios of adding bits of varying significance.

A function capturing all such scenarios is known as bit adder

\operatorname{BA}_{n}^{s_{1},\dotsc,s_{n}}\colon\{0,1\}^{n}\to\{0,1\}^{m}.

It is parameterized by the significance vector $s=(s_{1},\dotsc,s_{n})\in\mathbb{Z}_{\geq 0}^{n}$ , takes $n$ input bits $(x_{1},\dotsc,x_{n})\in\{0,1\}^{n}$ , and outputs the binary representation of

\sum_{i=1}^{n}2^{s_{i}}x_{i}.

This way, $\operatorname{SUM}_{n}=\operatorname{BA}^{0,0,\dotsc,0}_{n}$ and $\operatorname{ADD}_{n}=\operatorname{BA}^{0,0,1,1,\dotsc,n-1,n-1}_{2n}$ .

Since bit addition is such a basic task in Boolean circuit synthesis, a lot of research has been done on constructing efficient circuits for various special cases of it, see, for example, [9, 8, 12, 2]. A vast majority of these results is devoted to optimizing the circuit depth (also known as delay). In this paper, we investigate the circuit size (also known as area) of bit addition. Specifically, we study circuits over the full binary basis.

Two basic building blocks for adding bits are known as Half Adder (HA) and Full Adder (FA). They compute the binary representation of the sum of two and three bits, respectively (that is, $\operatorname{SUM}_{2}$ and $\operatorname{SUM}_{3}$ ). In the full binary basis, they can be implemented in two and five gates, respectively, see Figure 3.

Figure 3: The Half Adder (top) and Full Adder (bottom): dot diagrams and circuits.

Using Half Adders and Full Adders, one can synthesize a bit adder using the following algorithm that goes back to Napier’s Rabdologiæ (1617), as modernized by Dadda [4].

Process the bits layer by layer, in the order of increasing significance. While the current significance layer $i$ contains at least three bits, take three of them and apply the Full Adder to replace them with a pair of bits of significance $i$ and $i+1$ . If there are two bits left at the current layer $i$ , apply the Half Adder to them to get a pair of bits of significance $i$ and $i+1$ .

This algorithm ensures that, for any vector $s\in\mathbb{Z}_{\geq 0}^{n}$ ,

\operatorname{size}(\operatorname{BA}_{n}^{s})\leq 5n-3m.

Indeed, each application of the Full Adder reduces the number of bits by one, hence the total cost of all Full Adders is at most $5(n-m)$ . The Half Adder is applied at most once for each of the significance layers, hence the total cost of all Half Adders is at most $2m$ . Hence, the total size is at most $5(n-m)+2m=5n-3m$ . Note that most of the previous works consider specific special cases of bit addition (like summation or multiplication) where $m$ is a specific function on $n$ . In such cases, the upper bounds are stated in terms of $n$ only. The parameter $m$ arises naturally when one considers the general bit addition function.

By applying this algorithm to partial products of bits of two input $n$ -bit numbers, one gets the well-known Dadda multiplier circuit [4]. For many vectors $s$ , the upper bound $5n-3m$ is loose: it does not match the size of the actual circuit produced by the algorithm. A straightforward example is $s=(0,1,\dotsc,n-1)$ : in this case, no gates are needed whereas the upper bound is $2n$ . It is also worth noting that, in some cases, the resulting circuit is provably optimal. For example, for the $\operatorname{ADD}_{n}$ function (that computes the sum of two $n$ -bit integers), the method constructs a circuit out of a single Half Adder and $(n-1)$ Full Adders. The resulting circuit is known as ripple-carry adder and has size $5n-3$ . Red’kin [10] proved that there is no smaller circuit for this function.

At the same time, in many scenarios, not only the bound $5n-3m$ is loose, but also the circuit produced by the algorithm is suboptimal. For example, for $\operatorname{SUM}_{5}$ , it gives a circuit of size $12$ consisting of two Full Adders and one Half Adder, see Figure 4. However, $\operatorname{SUM}_{5}$ can be computed by a circuit of size $11$ as shown by [7] (see also Figure 7 later in the text). In general, whereas the algorithm produces a circuit of size about $5n$ for $\operatorname{SUM}_{n}$ , this function can be computed by a circuit of size about $4.5n$ as shown by Demenkov et al. [5].

Figure 4: A circuit of size

12

computing

\operatorname{SUM}_{5}

composed of two Full Adders and one Half Adder: dot notation (top left), block structure (bottom left), and a circuit (right).

In this paper, we generalize the construction by Demenkov et al. Namely, we prove an upper bound $4.5n-2m$ for the circuit size of bit addition. In the regimes where $m$ is small compared to $n$ , this gives a circuit that is about $10\%$ smaller. This applies to the Dadda multiplier. We complement our theoretical result by an open source implementation of generators producing circuits for bit addition and multiplication.

We also show that it is provably impossible to improve the two upper bounds above to $5n-3.01m$ or $4.5n-2.51m$ . We achieve this by establishing that the circuit size of the increment function (a special case of the bit addition function with $m=n+1$ ) is equal to $2n$ .

2 General Setting

In this section, we formally introduce the Boolean functions studied in this paper as well as the main building blocks for computing them.

2.1 Boolean Functions

The main Boolean function studied in this paper is bit adder

\operatorname{BA}_{n}^{s_{1},\dotsc,s_{n}}\colon\{0,1\}^{n}\to\{0,1\}^{m}.

It computes the binary representation of the weighted sum of input bits:

\sum_{i=1}^{n}2^{s_{i}}x_{i}.

In most interesting scenarios, all bits of the binary representation of this sum depend on the input and the number of outputs can be expressed as follows:

m=\left\lceil\log_{2}\left(\sum_{i=1}^{n}2^{s_{i}}+1\right)\right\rceil.

In such cases,

\operatorname{BA}(x_{1},\dotsc,x_{n})=(y_{0},\dotsc,y_{m-1})\colon\sum_{i=1}^{% n}2^{s_{i}}x_{i}=\sum_{i=0}^{m-1}2^{i}y_{i}.

However, for some other significance vectors, some of the bits of the binary representation of the sum are identically equal to zero (and thus, do not depend on the input). We exclude such bits from the outputs. Thus, more generally, when we say that

\operatorname{BA}(x_{1},\dotsc,x_{n})=(y_{0},\dotsc,y_{m-1}),

we mean that there exists a vector $t=(t_{0},\dotsc,t_{m-1})\in\mathbb{Z}_{\geq 0}$ such that $t_{0}<t_{1}<\dotsb<t_{m-1}$ and

\sum_{i=1}^{n}2^{s_{i}}x_{i}=\sum_{i=0}^{m-1}2^{t_{i}}y_{i}.

It is not difficult to see that the vector $t$ is unique and that $m\leq n$ .

This way, the goal of bit addition is to “flatten” the distribution of bits, that is, to leave at most one bit at each significance layer. Figure 5 gives an example.

Figure 5: The function

\operatorname{BA}_{7}^{0,1,1,5,5,5,6}\colon\{0,1\}^{7}\to\{0,1\}^{6}

replaces seven bits of significance

(0,1,1,5,5,5,6)

with six bits of significance

(0,1,2,5,6,7)

.

Many practically important Boolean functions can be computed using bit summation.

$\blacksquare$

The function $\operatorname{SUM}_{n}\colon\{0,1\}^{n}\to\{0,1\}^{\lceil\log_{2}(n+1)\rceil}$ computes the sum of $n$ bits:

$\operatorname{SUM}_{n}(x_{1},\dotsc,x_{n})=\operatorname{ADD}_{n}^{0,0,\dotsc,% 0}(x_{1},\dotsc,x_{n}).$
$\blacksquare$

The function $\operatorname{ADD}_{n}\colon\{0,1\}^{2n}\to\{0,1\}^{n+1}$ computes the sum of two $n$ -bit numbers:

$\operatorname{ADD}_{n}(x_{0},\dotsc,x_{n-1},y_{0},\dotsc,y_{n-1})=% \operatorname{BA}_{2n}^{0,\dotsc,n-1,0,\dotsc,n-1}(x_{0},\dotsc,x_{n-1},y_{0},% \dotsc,y_{n-1}).$
$\blacksquare$

The function $\operatorname{MULT}_{n}\colon\{0,1\}^{2n}\to\{0,1\}^{2n}$ computes the product of two $n$ -bit numbers:

$\operatorname{MULT}_{n}(x_{0},\dotsc,x_{n-1},y_{0},\dotsc,y_{n-1})=% \operatorname{BA}_{n^{2}}^{(i+j)_{0\leq i,j<n}}\left(\left(x_{i}\land y_{j}% \right)_{0\leq i,j<n}\right).$

2.2 Boolean Circuits

A circuit is a natural way of computing Boolean functions. It is an acyclic directed graph of in-degree $0$ and $2$ whose $n+2$ source nodes are labeled with input variables $x_{1},\dotsc,x_{n}$ and constants $0$ and $1$ , whereas all other nodes are labeled with binary Boolean operations. The source nodes are called input gates, all other nodes are called internal gates. Each gate computes a (single-output) Boolean function of $x_{1},\dotsc,x_{n}$ . If $m$ gates of the circuit are marked as outputs, it computes a function of the form $\{0,1\}^{n}\to\{0,1\}^{m}$ . For a circuit $C$ , its size, $\operatorname{size}(C)$ , is the number of internal gates of $C$ , whereas its depth, $\operatorname{depth}(C)$ , is the maximum length of a path from an input gate of $C$ to an output gate of $C$ .

2.3 Basic Building Blocks

As discussed before, the Half Adder and Full Adder are basic building blocks for computing bit addition. Figure 6 shows how to synthesize a circuit of size $63$ computing $\operatorname{SUM}_{16}$ out of four Half Adders and eleven Full Adders. It is not difficult to see that a similar block structure can be used for any $n$ yielding a circuit of size at most $5n$ for $\operatorname{SUM}_{n}$ .

Figure 6: A circuit computing

\operatorname{SUM}_{16}

composed out of four Half Adders and eleven Full Adders. Its size is

4\cdot 2+11\cdot 5=63

.

It turns out that better circuit designs are possible for $\operatorname{SUM}_{n}$ as shown by Demenkov et al. [5]. Consider two consecutive Full Adders shown on the top left of Figure 7. The corresponding function DFA (for Double Full Adder) has the following specification:

\operatorname{DFA}(x_{1},x_{2},x_{3},x_{4},x_{5})=(b_{0},b_{1},a_{1})\colon x_% {1}+x_{2}+x_{3}+x_{4}+x_{5}=b_{0}+2(b_{1}+a_{1}).

Then, MDFA (for Modified Double Full Adder) has the following specification:

\operatorname{MDFA}(x_{1}\oplus x_{2},x_{2},x_{3},x_{4},x_{4}\oplus x_{5})=(b_% {0},a_{1},a_{1}\oplus b_{1}).

That is, for pairs of bits $(x_{1},x_{2})$ , $(x_{4},x_{5})$ , and $(a_{1},b_{1})$ it uses a slightly different encoding: $(p,p\oplus q)$ instead of $(p,q)$ . We call such bits paired and show them in gray boxes in dot diagrams. It allows one to compute MDFA in eight gates (whereas the circuit size of DFA is 10). Moreover, the corresponding circuit of size eight is just a part of an optimal circuit of size $11$ computing $\operatorname{SUM}_{5}$ shown on the right of Figure 7.

Figure 7: Two consecutive Full Adders (top left), the MDFA block (bottom left), an optimal circuit for

\operatorname{SUM}_{5}

(top right) whose highlighted part computes MDFA, and a dot diagram for MDFA.

We also need a block called MDFA’ that can be viewed as a subfunction of MDFA:

\operatorname{MDFA^{\prime}}(x_{1}\oplus x_{2},x_{2},x_{4},x_{4}\oplus x_{5})=% \operatorname{MDFA}(x_{1}\oplus x_{2},x_{2},0,x_{4},x_{4}\oplus x_{5}).

It is not difficult to see that one can compute MDFA’ using six gates: when one replaces $x_{3}$ by zero in the circuit for MDFA, the two gates fed by $x_{3}$ can be eliminated. (In the same manner, HA is a subfunction of FA: $\operatorname{HA}(x_{1},x_{2})=\operatorname{FA}(x_{1},x_{2},0)$ .)

Using MDFA and MDFA’ blocks, one can compute $\operatorname{SUM}_{n}$ roughly as follows:

1.

Compute $x_{2}\oplus x_{3},x_{4}\oplus x_{5},\dotsc,x_{n-1}\oplus x_{n}$ ( $n/2$ gates).
2.

Apply at most $n/2$ $\operatorname{MDFA}$ blocks (no more than $4n$ gates).
3.

The last MDFA block outputs two bits: $a$ and $a\oplus b$ . Instead of them, one needs to compute $a\oplus b$ and $a\land b$ . To achieve this, it suffices to apply $x>y=(x\land\overline{y})$ operation: $a\land b=a>(a\oplus b)$ .

This gives an upper bound $4.5n$ for $\operatorname{SUM}_{n}$ , its formal proof can be found in [5]. Figure 8 gives an example of the corresponding design for $\operatorname{SUM}_{16}$ .

Figure 8: A circuit computing

\operatorname{SUM}_{16}

composed out of eight

\oplus

-gates at the top, three MDFA’ blocks, four MDFA blocks, and one final gate. Its size is

8+3\cdot 6+4\cdot 8+1=59

.

3 New Upper Bound for Circuit Size of Bit Addition

In this section, we prove a new upper bound $4.5n-2m$ for the circuit size of bit addition. For regimes where $m$ is small compared to $n$ , this is better than $5n-3m$ by about $10\%$ . This applies to $\operatorname{MULT}_{n}$ and $\operatorname{SUM}_{n}$ .

Theorem 1.

For any vector $s\in\mathbb{Z}_{\geq 0}^{n}$ ,

\operatorname{size}(\operatorname{BA}_{n}^{s})\leq 4.5n-2m.

In the proof, we use the following straightforward observation. Assume that $s_{1}<s_{2},s_{3},\dotsc,s_{n}$ . In this case, the first output is equal to $x_{1}$ , the cost of computing this particular bit of the output is zero, allowing one to forget about it. Thus,

\operatorname{size}(\operatorname{BA}^{s}_{n})=\operatorname{size}(% \operatorname{BA}^{s^{\prime}}_{n-1}),

where $s^{\prime}=(s_{2},\dotsc,s_{n})$ . We call the operation of replacing $s$ by $s^{\prime}$ as shifting. Note that shifting reduces both the number of inputs and the number of outputs by one. Figure 9 gives an example.

Figure 9: Shifting:

\operatorname{size}(\operatorname{BA}_{3}^{2,3,3})=\operatorname{size}(% \operatorname{BA}_{2}^{3,3})

. In turn,

\operatorname{BA}_{2}^{3,3}

can be computed by the Half Adder. Thus,

\operatorname{BA}_{3}^{2,3,3}(x_{1},x_{2},x_{3})=(x_{1},x_{2}\oplus x_{3},x_{2% }\land x_{3})

.

Proof.

As the first step, we do the following: at every significance layer, we break all bits, except for possibly one, into pairs and compute the parity for every pair. This takes at most $n/2$ gates.

Then, it remains to prove that one can compute the sum of $n$ bits using $4n-2m$ gates if every significance layer contains at most one bit without a pair. We prove this by induction on $n$ . The base case $n=1$ is clear: in this case, the circuit size is zero (nothing needs to be summed up) and the upper bound is at least zero since $m\leq n$ . To prove the induction step, denote by $l$ the number of minimum elements in the significance vector $s$ (that is, the number of bits in the rightmost non-empty column in dot-notation).

Consider the following seven cases. (In fact, the first three cases are special cases of the last three cases, but we believe that the presentation is cleaner when they are stated as separate cases.) In each of the cases below, we shift and proceed by induction.

1.

$l=1$ . In this case, we just shift. By the induction hypothesis, the rest can be computed by a circuit of size at most

$4(n-1)-2(m-1)=4n-2m-2<4n-2m.$
2.

$l=2$ . Then, the corresponding two bits $x_{1}$ and $x_{2}$ are paired meaning that their sum $x_{1}\oplus x_{2}$ is computed already. Then, we compute their carry

$c=x_{1}>(x_{1}\oplus x_{2})=x_{1}\land x_{2}$

and transfer it to the next layer. If this layer has an unpaired bit $b$ , we pair $b$ and $c$ by computing $b\oplus c$ . Finally, we shift. By the induction hypothesis, the size of the resulting circuit is at most

$1+1+4(n-1)-2(m-1)=4n-2m.$
3.

$l=3$ . For the corresponding three bits $x_{1},x_{2},x_{3}$ , we have $x_{1}\oplus x_{2}$ , $x_{2}$ , and $x_{3}$ (that is, $x_{1}$ and $x_{2}$ are paired). We apply the Full Adder to the three bits. This costs four gates, as $x_{1}\oplus x_{2}$ is already computed and $x_{1}$ is not needed (recall Figure 3). The sum bit stays on the same layer, whereas the carry bit $c$ goes to the next layer. Then, we pair $c$ with an unpaired bit on the next layer if needed and shift. This gives an upper bound

$4+1+4(n-2)-2(m-1)<4n-2m.$
4.

$l=4k$ . Apply MDFA’ to two pairs to produce an unpaired bit. For the remaining $2k-2$ pairs, keep applying MDFA, each time reusing the unpaired bit. Then, we shift. The upper bound is

$6+8(k-1)+4(n-2k)-2(m-1)=4n-2m.$
5.

$l=4k+1$ . Apply MDFA $k$ times, then shift. The upper bound is

$8k+4(n-2k-1)-2(m-1)<4n-2m.$
6.

$l=4k+2$ . For two bits $a$ and $a\oplus b$ from the same pair, compute the most significant bit of their sum using a single binary gate: $a>(a\oplus b)=a\land b$ . This leaves their sum ( $a\oplus b$ ) at the current layer and puts the just computed carry bit ( $a\land b$ ) to the next layer. If needed, compute the parity of an unpaired bit with the just transferred carry bit. Then, apply MDFA $k$ times and shift. Overall, the upper bound is

$1+1+8k+4(n-2k-1)-2(m-1)=4n-2m.$
7.

$l=4k+3$ . Apply the Full Adder to a pair of bits and the unpaired bit. If needed, pair the just transferred carry bit with an unpaired bit from the next layer. Then, apply MDFA $k$ times and shift. The resulting upper bound is

$4+1+8k+4(n-2k-2)-2(m-1)<4n-2m.$

$\hfill\blacktriangleleft$

4 Lower Bounds and Limitations

The upper bound $\operatorname{size}(\operatorname{BA}_{n}^{s})\leq 5n-3m$ holds for any vector $s$ , but in many scenarios it is loose. For example, for the function

\operatorname{ADD}_{n}=\operatorname{BA}_{2n}^{0,\dotsc,n-1,0,\dotsc,n-1},

this upper bounds turns into $5\cdot 2n-3(n+1)=7n-3$ , whereas $\operatorname{size}(\operatorname{ADD}_{n})=5n-3$ . Interestingly, the term $3m$ in the upper bound $5n-3m$ cannot be increased: for any constant $\alpha>3$ , there exists a vector $s$ such that $\operatorname{size}(\operatorname{BA}_{n}^{s})\geq 5n-\alpha m-O(1)$ . One example of such a vector is $s=(0,0,1,\dotsc,n-1)$ . The corresponding function $\operatorname{BA}_{n}^{s}$ adds a bit to an $n$ -bit number. It is not difficult to see that it can be computed using $n$ Half Adders (see Figure 10), hence its circuit size is at most $2n$ . Below, we show that this straightforward circuit is optimal (up to an additive constant). It also shows that it is impossible to improve our upper bound $4.5n-2m$ to $4.5n-\beta m$ for $\beta>2.5$ .

Figure 10: One can add a bit

t

to a

7

-bit integer

x_{0}\dotsb x_{6}

(to get an

8

-bit integer

y_{0}\dotsb y_{7}

) using

14

gates. A straightforward generalization of this construction ensures that

\operatorname{size}(\operatorname{BA}_{n}^{0,0,1,\dotsc,n-2})\leq 2n-2

.

Theorem 2.

\operatorname{size}(\operatorname{BA}_{n}^{0,0,1,\dotsc,n-1})\geq 2n-O(1).

Proof.

Assume that a bit to be added is equal to one (clearly, this only makes the function easier to compute). In other words, we consider the increment function $\operatorname{INC}_{n}\colon\{0,1\}^{n}\to\{0,1\}^{n+1}$ . Thus, $\operatorname{INC}_{n}(x_{0},\dotsc,x_{n-1})=(y_{0},\dotsc,y_{n})$ such that

1+\sum_{i=0}^{n-1}2^{i}x_{i}=\sum_{i=0}^{n}2^{i}y_{i}.

It is not difficult to write down explicit formulas for all output bits of $\operatorname{INC}_{n}$ . For example, for $n=5$ , they are expressed as follows:

	$\displaystyle y_{0}$	$\displaystyle=1\oplus x_{0}$
	$\displaystyle y_{1}$	$\displaystyle=x_{0}\oplus x_{1}$
	$\displaystyle y_{2}$	$\displaystyle=x_{0}x_{1}\oplus x_{2}$
	$\displaystyle y_{3}$	$\displaystyle=x_{0}x_{1}x_{2}\oplus x_{3}$
	$\displaystyle y_{4}$	$\displaystyle=x_{0}x_{1}x_{2}x_{3}\oplus x_{4}$
	$\displaystyle y_{5}$	$\displaystyle=x_{0}x_{1}x_{2}x_{3}x_{4}$

We prove that $\operatorname{size}(\operatorname{INC}_{n})\geq 2n-2$ by induction on $n$ . The base case $n=1$ is clear. For the induction step, take an optimal circuit computing $\operatorname{INC}_{n}$ and consider its (topologically) first gate $A(x_{i},x_{j})$ .

Now, if both the variables $x_{i}$ and $x_{j}$ had out-degree one, the whole circuit would depend on $x_{i}$ and $x_{j}$ through the gate $A$ only. This would mean that there are two different pairs of constant $(a_{i},a_{j}),(b_{i},b_{j})\in\{0,1\}^{2}$ such that $A(a_{i},a_{j})=A(b_{i},b_{j})$ . This, in turn, would mean that the circuit does not distinguish between assignments

x_{i}\leftarrow a_{i},\,x_{j}\leftarrow a_{j}\,\text{ and }\,x_{i}\leftarrow b% _{i},\,x_{j}\leftarrow b_{j}.

But such a circuit cannot compute the function $\operatorname{INC}_{n}$ as $\operatorname{INC}_{n}$ clearly distinguishes all four different assignments to $x_{i}$ and $x_{j}$ .

Thus, assume that, say, $x_{i}$ feeds at least two gates. Then, assign $x_{i}\leftarrow 1$ and simplify the circuit. During the simplification, the gates fed by $x_{i}$ are eliminated. (If a gate fed by $x_{i}$ becomes a negation of its other input under an assignment $x_{i}\leftarrow 1$ , this negation is incorporated into the Boolean binary operations computed by the successors of this gate.) Also, the resulting circuit computes $\operatorname{INC}_{n-1}$ . To see this, it is instructive to get back to the previous toy example where $n=5$ . Say, we assign $x_{2}\leftarrow 1$ . Then, the outputs are simplified as follows:

	$\displaystyle y_{0}$	$\displaystyle=1\oplus x_{0}$
	$\displaystyle y_{1}$	$\displaystyle=x_{0}\oplus x_{1}$
	$\displaystyle y_{2}$	$\displaystyle=x_{0}x_{1}\oplus 1$
	$\displaystyle y_{3}$	$\displaystyle=x_{0}x_{1}\oplus x_{3}$
	$\displaystyle y_{4}$	$\displaystyle=x_{0}x_{1}x_{3}\oplus x_{4}$
	$\displaystyle y_{5}$	$\displaystyle=x_{0}x_{1}x_{3}x_{4}$

By ignoring the output $y_{2}$ , one gets a function computing $\operatorname{INC}_{4}$ :

(y_{0},y_{1},y_{3},y_{4},y_{5})=\operatorname{INC}_{4}(x_{0},x_{1},x_{3},x_{4}% )\,.

By the induction hypothesis, the simplified circuit contains at least $2(n-1)-2=2n-4$ gates. Thus, the original circuit has size at least $2+2n-4=2n-2$ . $\hfill\blacktriangleleft$

5 Implementation and Experimental Evaluation

We implemented efficient generators of our new circuits in the Cirbo open-source framework [1]. To generate a circuit computing $\operatorname{BA}_{n}^{s}$ , one passes the vector $s$ . Listing 1 shows how to use the generator to produce an efficient circuit computing $\operatorname{SUM}_{31}$ in a single line of code. When the circuit is generated, one can use a wide range of Cirbo methods to analyze and manipulate the circuit.

Listing 1: Generating an efficient circuit for

\operatorname{SUM}_{31}

(that computes the binary representation of the sum of

31

bits). The code also prints the size of the resulting circuit and draws it.

⬇

from cirbo.synthesis.generation.arithmetics.summation

import generate_add_weighted_bits_efficient

ckt = generate_add_weighted_bits_efficient([0] * 31)

print(ckt.gates_number())

ckt.view_graph()

5.1 Adding Bits and Integers

Table 1 compares the size of circuits for $\operatorname{SUM}_{n}$ composed out of Full Adders with circuits composed out of MDFA blocks (that can be generated using our new method), for various $n$ . As the table reveals, for large values of $n$ , the latter circuits are about $10\%$ smaller than the former ones. Also, Listing 2 ensures that for the addition of two $n$ -bit integers the generator produces circuits of size $5n-3$ . Recall that $\operatorname{ADD}_{n}=\operatorname{BA}_{2n}^{0,\dotsc,n-1,0,\dotsc,n-1}$ and that $5n-3$ is provably optimal circuit size for this function as shown by Red’kin [10].

Table 1: Comparing the size of circuits for

\operatorname{SUM}_{n}

composed out of Full Adders with circuits composed out of MDFA. The bottom row shows the improvement in percents.

$n$	7	31	127	511	2047	8191	32767	131071
FA	20	130	600	2510	10180	40890	163760	655270
MDFA	19	119	543	2263	9167	36807	147391	589751
Improvement	5.0%	8.5%	9.5%	9.8%	10.0%	10.0%	10.0%	10.0%

Listing 2: Ensuring that the generator produces circuits of size

5n-3

for

\operatorname{ADD}_{n}

.

⬇

from cirbo.synthesis.generation.arithmetics.summation

import generate_add_weighted_bits_efficient as generate

for n in range(2, 100):

ckt = generate(list(range(n)) + list(range(n)))

assert ckt.gates_number() == 5 * n - 3

5.2 Multiplying Integers

Dadda’s multiplier is one of the first circuit designs for multiplying $n$ -bits integers. Basically, it adds the partial products (conjunctions of the bits of the two input numbers) using Full Adders and Half Adders. Its size is about $n^{2}+5n^{2}=6n^{2}$ . Our method of summing up bits allows to reduce the size to roughly $5.5n^{2}$ . An asymptotically faster method of multiplying $n$ -bit integers was discovered by Karatsuba [6]. It is based on the divide-and-conquer approach: to multiply two $n$ -bit integers, it makes three recursive calls to multiply two $n/2$ -bit integers and then combines them using summation and subtraction only. This way, the running time $T(n)$ of the algorithm satisfies a recurrence $T(n)\leq 3T(n/2)+O(n)$ , hence $T(n)=O(n^{\log_{2}3})$ . As with many other algorithms based on divide-and-conquer, when $n$ becomes small, it is beneficial to multiply the given numbers directly (rather than recursively). We implemented generators based on Karatsuba algorithm that use Dadda multipliers and MDFA-based bit addition when $n$ is smaller than $20$ . Table 2 and Figure 11 compare the size of the corresponding five circuit designs for $40\leq n\leq 300$ . More accurate estimates on the circuits size of multipliers are given by Sergeev [11].

Table 2: Comparing the size of circuits for

\operatorname{MULT}_{n}

. The first multiplier, Dadda, computes the sum of the partial products using Full Adders and Half Adders. The second one, MDFA, sums up the partial products using MDFA blocks. The third one, Karatsuba, makes three recursive calls and recurses all the way down to 4-bit numbers. The fourth and fifth multipliers use Karatsuba algorithm, but switch to Dadda or MDFA multipliers when

n

becomes smaller than

20

. The last row shows size improvement of the fifth multiplier over the fourth one.

$n$	40	80	120	160	200	240	280
Dadda	9280	37760	85440	152320	238400	343680	468160
MDFA	8539	34679	78419	139759	218699	315239	429379
Karatsuba	11789	37836	72209	118152	168200	223093	293405
Karatsuba, Dadda	7522	24770	49200	78598	113870	153948	199102
Karatsuba, MDFA	7155	23642	46504	75168	108284	145787	189657
Improvement	4.9%	4.6%	5.5%	4.4%	4.9%	5.3%	4.7%

Refer to caption — Figure 11: Comparing the size of five circuit designs for $\operatorname{MULT}_{n}$ , for $40\leq n\leq 300$ .

5.3 Logarithmic Depth

The depth of most of the circuits described above is linear, that is, $\Theta(n)$ . With an additional care, one can make the depth logarithmic ( $\Theta(\log n)$ ) by increasing the size slightly. To achieve this, one processes the layers in parallel rather than consecutively. Namely, let $h$ be the maximum height of a significance layer (that is, every layer contains at most $h$ bits). While $h>3$ , apply in parallel as many FA’s as possible to every layer. After one such step, the maximum height becomes at most $2h/3$ (to simplify the exposition, we ignore constant additive terms here): indeed, if there are $k\leq h$ bits on the current layer, then about $k/3\leq h/3$ bits remain after the application of FA’s; also, at most $h/3$ bits are transferred from the next layer. Since the maximum height decreases geometrically, in at most $O(\log n)$ steps, one reaches the case when $h\leq 3$ . This takes depth $O(\log n)$ and size $O(n)$ (since each FA reduces the total number of bits by one). When $h\leq 3$ , apply either HA or FA to every layer. This ensures that every layer has at most two bits, that is, $h\leq 2$ (the size of the resulting circuit is still linear and the depth is still logarithmic). Then, everything boils down to adding two $k$ -bit numbers (with $k\leq n$ ). This can be performed using, for example, the Brent–Kung adder [3] that has size $O(k)$ and depth $O(\log k)$ . By using MDFA’s instead of FA’s, one can further reduce the size of the resulting circuits. Table 3 shows the size and the depth of the circuits generated this way for the three previously considered functions: $\operatorname{SUM}$ , $\operatorname{ADD}$ , and $\operatorname{MULT}$ .

Table 3: The size and the depth of circuits computing

\operatorname{SUM}_{n}

,

\operatorname{ADD}_{n}

, and

\operatorname{MULT}_{n}

.

$n$	10	20	30	40	60	80	160	320
ADD	15	18	23	24	28	31	32	42	depth
ADD	49	101	153	194	297	383	755	1526	size
SUM	10	14	16	18	20	22	26	30	depth
SUM	64	141	215	298	452	615	1252	2529	size
MULT	29	39	46	51	55	62	70	81	depth
MULT	627	2301	5209	9158	20356	35971	142388	566539	size

6 Conclusion and Further Directions

In this paper, we presented smaller circuits for bit addition. In many practically relevant scenarios, the described circuits are about 10% smaller than the known circuits composed out of Half Adders and Full Adders. Also, we implemented generators that allow one to produce the corresponding circuits using a single line of code via the Cirbo open-source package [1].

There are three natural open problems related to the circuit size of bit addition.

1.

What is the largest $\alpha$ such that

$\operatorname{size}(\operatorname{BA}_{n}^{s})\leq 4.5n-\alpha m$

holds for every vector $s$ ? In this paper, we proved that $\alpha\geq 2$ . Theorem 1 shows that $\alpha\leq 2.5$ . An example of a vector where our upper bound $4.5n-2m$ matches the size of the circuit produced by our algorithm is

$s^{*}=\left(0,0,0,0,1,1,2,2,\dotsc,\frac{n}{2}-2,\frac{n}{2}-2\right).$

In this case, our method first spends $n/2$ gates to pair the bits and then applies $n/2$ MDFA’ blocks. The size of the resulting circuit is, up to an additive constant, $n/2+6\cdot n/2=3.5n$ . This matches the upper bound $4.5n-2m$ . Thus, to improve the bound $4.5n-2m$ to $4.5n-\beta m$ , for $\beta>2$ , one needs to find a smaller circuit for this particular vector $s^{*}$ . And vice versa, by proving a lower bound $\operatorname{size}(\operatorname{BA}_{n}^{s^{*}})\geq 3.5n-O(1)$ , one would prove that $\alpha=2$ .
2.

What is the smallest $\gamma$ such that

$\operatorname{size}(\operatorname{BA}_{n}^{s})\leq\gamma n-O(m)$

holds for every vector $s$ ? In this paper, we proved that $\gamma\leq 4.5$ . Improving this seems to be more challenging than just improving $2m$ to $2.5m$ as this would most probably require using new building blocks.
3.

Finally, note also that the upper bounds $5n-3m$ and $4.5n-2m$ hold for all vectors $s$ . It would be interesting to improve known upper and lower bounds for specific vectors. Perhaps, one of the most interesting such functions is $\operatorname{SUM}_{n}$ (here, $s=(0,0,\dotsc,0)$ ). For it, we know an upper bound $4.5n$ (originally proved by Demenkov et al. [5]; also follows from our Theorem 1) and a lower bound $2.5n-O(1)$ due to Stockmeyer [13].

References

[1] Daniil Averkov, Tatiana Belova, Gregory Emdin, Mikhail Goncharov, Viktoriia Krivogornitsyna, Alexander S. Kulikov, Fedor Kurmazov, Daniil Levtsov, Georgie Levtsov, Vsevolod Vaskin, and Aleksey Vorobiev. Cirbo: A new tool for Boolean circuit analysis and synthesis. In AAAI, pages 11105–11112. AAAI Press, 2025. doi:10.1609/AAAI.V39I11.33207.
[2] K’Andrea C. Bickerstaff, Earl E. Swartzlander Jr., and Michael J. Schulte. Analysis of column compression multipliers. In IEEE Symposium on Computer Arithmetic, pages 33–39. IEEE Computer Society, 2001. doi:10.1109/ARITH.2001.930101.
[3] Richard P. Brent and H. T. Kung. A regular layout for parallel adders. IEEE Transactions on Computers, C-31(3):260–264, 1982. doi:10.1109/TC.1982.1675982.
[4] Luigi Dadda. Some schemes for parallel multipliers. Alta Frequenza, 34(5):349–356, 1965.
[5] Evgeny Demenkov, Arist Kojevnikov, Alexander S. Kulikov, and Grigory Yaroslavtsev. New upper bounds on the Boolean circuit complexity of symmetric functions. Inf. Process. Lett., 110(7):264–267, 2010. doi:10.1016/J.IPL.2010.01.007.
[6] Anatoly Karatsuba and Yury Ofman. Multiplication of many-digital numbers by automatic computers. Proceedings of the USSR Academy of Sciences, 145:293–294, 1962.
[7] Alexander S. Kulikov, Danila Pechenev, and Nikita Slezkin. SAT-based circuit local improvement. In MFCS, volume 241 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.MFCS.2022.67.
[8] Charles U. Martel, Vojin G. Oklobdzija, R. Ravi, and Paul F. Stelling. Design strategies for optimal multiplier circuits. In IEEE Symposium on Computer Arithmetic, pages 42–49. IEEE Computer Society, 1995. doi:10.1109/ARITH.1995.465378.
[9] Mike Paterson and Uri Zwick. Shallow circuits and concise formulae for multiple addition and multiplication. Comput. Complex., 3:262–291, 1993. doi:10.1007/BF01271371.
[10] Nikolay Red’kin. Minimal realization of a binary adder. Problemy kibernetiki, 38:181–216, 1981. In Russian.
[11] Igor S. Sergeev. On the circuit complexity of the standard and the Karatsuba methods of multiplying integers. In Information tools and technologies, volume 3, pages 180–187, 2016. In Russian. English version is available at http://arxiv.org/abs/1602.02362. arXiv:1602.02362.
[12] Paul F. Stelling, Charles U. Martel, Vojin G. Oklobdzija, and R. Ravi. Optimal circuits for parallel multipliers. IEEE Trans. Computers, 47(3):273–285, 1998. doi:10.1109/12.660163.
[13] Larry J. Stockmeyer. On the combinational complexity of certain symmetric Boolean functions. Math. Syst. Theory, 10:323–336, 1977. doi:10.1007/BF01683282.

[bib.bib1] [1] Daniil Averkov, Tatiana Belova, Gregory Emdin, Mikhail Goncharov, Viktoriia Krivogornitsyna, Alexander S. Kulikov, Fedor Kurmazov, Daniil Levtsov, Georgie Levtsov, Vsevolod Vaskin, and Aleksey Vorobiev. Cirbo: A new tool for Boolean circuit analysis and synthesis. In AAAI, pages 11105–11112. AAAI Press, 2025. doi:10.1609/AAAI.V39I11.33207.

[bib.bib2] [2] K’Andrea C. Bickerstaff, Earl E. Swartzlander Jr., and Michael J. Schulte. Analysis of column compression multipliers. In IEEE Symposium on Computer Arithmetic, pages 33–39. IEEE Computer Society, 2001. doi:10.1109/ARITH.2001.930101.

[bib.bib3] [3] Richard P. Brent and H. T. Kung. A regular layout for parallel adders. IEEE Transactions on Computers, C-31(3):260–264, 1982. doi:10.1109/TC.1982.1675982.

[bib.bib4] [4] Luigi Dadda. Some schemes for parallel multipliers. Alta Frequenza, 34(5):349–356, 1965.

[bib.bib5] [5] Evgeny Demenkov, Arist Kojevnikov, Alexander S. Kulikov, and Grigory Yaroslavtsev. New upper bounds on the Boolean circuit complexity of symmetric functions. Inf. Process. Lett., 110(7):264–267, 2010. doi:10.1016/J.IPL.2010.01.007.

[bib.bib6] [6] Anatoly Karatsuba and Yury Ofman. Multiplication of many-digital numbers by automatic computers. Proceedings of the USSR Academy of Sciences, 145:293–294, 1962.

[bib.bib7] [7] Alexander S. Kulikov, Danila Pechenev, and Nikita Slezkin. SAT-based circuit local improvement. In MFCS, volume 241 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.MFCS.2022.67.

[bib.bib8] [8] Charles U. Martel, Vojin G. Oklobdzija, R. Ravi, and Paul F. Stelling. Design strategies for optimal multiplier circuits. In IEEE Symposium on Computer Arithmetic, pages 42–49. IEEE Computer Society, 1995. doi:10.1109/ARITH.1995.465378.

[bib.bib9] [9] Mike Paterson and Uri Zwick. Shallow circuits and concise formulae for multiple addition and multiplication. Comput. Complex., 3:262–291, 1993. doi:10.1007/BF01271371.

[bib.bib10] [10] Nikolay Red’kin. Minimal realization of a binary adder. Problemy kibernetiki, 38:181–216, 1981. In Russian.

[bib.bib11] [11] Igor S. Sergeev. On the circuit complexity of the standard and the Karatsuba methods of multiplying integers. In Information tools and technologies, volume 3, pages 180–187, 2016. In Russian. English version is available at http://arxiv.org/abs/1602.02362. arXiv:1602.02362.

[bib.bib12] [12] Paul F. Stelling, Charles U. Martel, Vojin G. Oklobdzija, and R. Ravi. Optimal circuits for parallel multipliers. IEEE Trans. Computers, 47(3):273–285, 1998. doi:10.1109/12.660163.

[bib.bib13] [13] Larry J. Stockmeyer. On the combinational complexity of certain symmetric Boolean functions. Math. Syst. Theory, 10:323–336, 1977. doi:10.1007/BF01683282.