Abstract 1 Overview 2 General Setting 3 New Upper Bound for Circuit Size of Bit Addition 4 Lower Bounds and Limitations 5 Implementation and Experimental Evaluation 6 Conclusion and Further Directions References

Smaller Circuits for Bit Addition

Mikhail Goncharov Neapolis University Pafos, Cyprus
JetBrains Research, Pafos, Cyprus
Alexander S. Kulikov ORCID JetBrains Research, Pafos, Cyprus Georgie Levtsov Neapolis University Pafos, Cyprus
JetBrains Research, Pafos, Cyprus
Abstract

Bit addition arises virtually everywhere in digital circuits: arithmetic operations, increment/decrement operators, computing addresses and table indices, and so on. Since bit addition is such a basic task in Boolean circuit synthesis, a lot of research has been done on constructing efficient circuits for various special cases of it. A vast majority of these results are devoted to optimizing the circuit depth (also known as delay).

In this paper, we investigate the circuit size (also known as area) over the full binary basis of bit addition. Most of the known circuits are built from Half Adders and Full Adders as suggested by Dadda in 1965 for designing multiplier circuits. Applying these ideas to the bit addition function, one gets a 5n3m upper bound on its circuit size, where n is the number of input bits and m is the number of output bits. We prove an upper bound 4.5n2m. In the regimes where m is small compared to n (for example, for computing the sum of n bits or multiplying two n-bit integers), this leads to 10% improvement. We also show that it is provably impossible to improve the two upper bounds above to 5n3.01m or 4.5n2.51m. We achieve this by establishing that the circuit size of the increment function (a special case of the bit addition function with m=n+1) is equal to 2n.

We complement our theoretical result by an open-source implementation of generators producing circuits for bit addition and multiplication. The generators allow one to produce the corresponding circuits in two lines of code and to compare them to existing designs.

Keywords and phrases:
bit addition, summation, multiplier, multiplication, Boolean, circuit, synthesis, combinational, digital
Copyright and License:
[Uncaptioned image] © Mikhail Goncharov, Alexander S. Kulikov, and Georgie Levtsov; licensed under Creative Commons License CC-BY 4.0
2012 ACM Subject Classification:
Theory of computation Logic and verification
; Theory of computation Complexity theory and logic ; Theory of computation Circuit complexity
Supplementary Material:
Software  (Source Code): https://github.com/spbsat/cirbo
  archived at Software Heritage Logo swh:1:dir:bb1d9d50044cfe5a05b7c8dadc7404b306c90cd1
Acknowledgements:
We thank the anonymous reviewers for many helpful comments.
Editors:
Meena Mahajan, Florin Manea, Annabelle McIver, and Nguyễn Kim Thắng

1 Overview

Bit addition arises virtually everywhere in digital circuits: arithmetic operations, increment/decrement operators, computing addresses and table indices, and so on. Three specific scenarios where it is used frequently are listed below.

  • Adding two n-bit numbers.

  • Computing a symmetric Boolean function (such as majority or sorting). A natural way of doing this is to first compute the binary representation of the sum of n input bits (that is, to compress n bits into about log2n bits) and then to compute the function at hand out of the computed binary representation.

  • To multiply two n-bit numbers, one may first compute all partial products (that is, products of the bits of the two input numbers) and then sum up the resulting bits.

In terms of the dot-notation introduced by Dadda [4], the three scenarios discussed above are visualized as shown in Figure 1. In this notation, one places bits of the same significance on the same vertical layer.

Figure 1: Dot diagrams for three Boolean functions: ADD5 adds two five-bit numbers, SUM5 adds five bits, and MULT5 adds five (appropriately shifted) five-bit numbers.

There are many other cases where one needs to add bits. Say, one may want to add a single bit to an n-bit number (the increment operation is a special case), or to add three n-bit numbers, or to add a few bits of varying significance, see Figure 2.

Figure 2: More scenarios of adding bits of varying significance.

A function capturing all such scenarios is known as bit adder

BAns1,,sn:{0,1}n{0,1}m.

It is parameterized by the significance vector s=(s1,,sn)0n, takes n input bits (x1,,xn){0,1}n, and outputs the binary representation of

i=1n2sixi.

This way, SUMn=BAn0,0,,0 and ADDn=BA2n0,0,1,1,,n1,n1.

Since bit addition is such a basic task in Boolean circuit synthesis, a lot of research has been done on constructing efficient circuits for various special cases of it, see, for example, [9, 8, 12, 2]. A vast majority of these results is devoted to optimizing the circuit depth (also known as delay). In this paper, we investigate the circuit size (also known as area) of bit addition. Specifically, we study circuits over the full binary basis.

Two basic building blocks for adding bits are known as Half Adder (HA) and Full Adder (FA). They compute the binary representation of the sum of two and three bits, respectively (that is, SUM2 and SUM3). In the full binary basis, they can be implemented in two and five gates, respectively, see Figure 3.

Figure 3: The Half Adder (top) and Full Adder (bottom): dot diagrams and circuits.

Using Half Adders and Full Adders, one can synthesize a bit adder using the following algorithm that goes back to Napier’s Rabdologiæ (1617), as modernized by Dadda [4].

Process the bits layer by layer, in the order of increasing significance. While the current significance layer i contains at least three bits, take three of them and apply the Full Adder to replace them with a pair of bits of significance i and i+1. If there are two bits left at the current layer i, apply the Half Adder to them to get a pair of bits of significance i and i+1.

This algorithm ensures that, for any vector s0n,

size(BAns)5n3m.

Indeed, each application of the Full Adder reduces the number of bits by one, hence the total cost of all Full Adders is at most 5(nm). The Half Adder is applied at most once for each of the significance layers, hence the total cost of all Half Adders is at most 2m. Hence, the total size is at most 5(nm)+2m=5n3m. Note that most of the previous works consider specific special cases of bit addition (like summation or multiplication) where m is a specific function on n. In such cases, the upper bounds are stated in terms of n only. The parameter m arises naturally when one considers the general bit addition function.

By applying this algorithm to partial products of bits of two input n-bit numbers, one gets the well-known Dadda multiplier circuit [4]. For many vectors s, the upper bound 5n3m is loose: it does not match the size of the actual circuit produced by the algorithm. A straightforward example is s=(0,1,,n1): in this case, no gates are needed whereas the upper bound is 2n. It is also worth noting that, in some cases, the resulting circuit is provably optimal. For example, for the ADDn function (that computes the sum of two n-bit integers), the method constructs a circuit out of a single Half Adder and (n1) Full Adders. The resulting circuit is known as ripple-carry adder and has size 5n3. Red’kin [10] proved that there is no smaller circuit for this function.

At the same time, in many scenarios, not only the bound 5n3m is loose, but also the circuit produced by the algorithm is suboptimal. For example, for SUM5, it gives a circuit of size 12 consisting of two Full Adders and one Half Adder, see Figure 4. However, SUM5 can be computed by a circuit of size 11 as shown by [7] (see also Figure 7 later in the text). In general, whereas the algorithm produces a circuit of size about 5n for SUMn, this function can be computed by a circuit of size about 4.5n as shown by Demenkov et al. [5].

Figure 4: A circuit of size 12 computing SUM5 composed of two Full Adders and one Half Adder: dot notation (top left), block structure (bottom left), and a circuit (right).

In this paper, we generalize the construction by Demenkov et al. Namely, we prove an upper bound 4.5n2m for the circuit size of bit addition. In the regimes where m is small compared to n, this gives a circuit that is about 10% smaller. This applies to the Dadda multiplier. We complement our theoretical result by an open source implementation of generators producing circuits for bit addition and multiplication.

We also show that it is provably impossible to improve the two upper bounds above to 5n3.01m or 4.5n2.51m. We achieve this by establishing that the circuit size of the increment function (a special case of the bit addition function with m=n+1) is equal to 2n.

2 General Setting

In this section, we formally introduce the Boolean functions studied in this paper as well as the main building blocks for computing them.

2.1 Boolean Functions

The main Boolean function studied in this paper is bit adder

BAns1,,sn:{0,1}n{0,1}m.

It computes the binary representation of the weighted sum of input bits:

i=1n2sixi.

In most interesting scenarios, all bits of the binary representation of this sum depend on the input and the number of outputs can be expressed as follows:

m=log2(i=1n2si+1).

In such cases,

BA(x1,,xn)=(y0,,ym1):i=1n2sixi=i=0m12iyi.

However, for some other significance vectors, some of the bits of the binary representation of the sum are identically equal to zero (and thus, do not depend on the input). We exclude such bits from the outputs. Thus, more generally, when we say that

BA(x1,,xn)=(y0,,ym1),

we mean that there exists a vector t=(t0,,tm1)0 such that t0<t1<<tm1 and

i=1n2sixi=i=0m12tiyi.

It is not difficult to see that the vector t is unique and that mn.

This way, the goal of bit addition is to “flatten” the distribution of bits, that is, to leave at most one bit at each significance layer. Figure 5 gives an example.

Figure 5: The function BA70,1,1,5,5,5,6:{0,1}7{0,1}6 replaces seven bits of significance (0,1,1,5,5,5,6) with six bits of significance (0,1,2,5,6,7).

Many practically important Boolean functions can be computed using bit summation.

  • The function SUMn:{0,1}n{0,1}log2(n+1) computes the sum of n bits:

    SUMn(x1,,xn)=ADDn0,0,,0(x1,,xn).
  • The function ADDn:{0,1}2n{0,1}n+1 computes the sum of two n-bit numbers:

    ADDn(x0,,xn1,y0,,yn1)=BA2n0,,n1,0,,n1(x0,,xn1,y0,,yn1).
  • The function MULTn:{0,1}2n{0,1}2n computes the product of two n-bit numbers:

    MULTn(x0,,xn1,y0,,yn1)=BAn2(i+j)0i,j<n((xiyj)0i,j<n).

2.2 Boolean Circuits

A circuit is a natural way of computing Boolean functions. It is an acyclic directed graph of in-degree 0 and 2 whose n+2 source nodes are labeled with input variables x1,,xn and constants 0 and 1, whereas all other nodes are labeled with binary Boolean operations. The source nodes are called input gates, all other nodes are called internal gates. Each gate computes a (single-output) Boolean function of x1,,xn. If m gates of the circuit are marked as outputs, it computes a function of the form {0,1}n{0,1}m. For a circuit C, its size, size(C), is the number of internal gates of C, whereas its depth, depth(C), is the maximum length of a path from an input gate of C to an output gate of C.

2.3 Basic Building Blocks

As discussed before, the Half Adder and Full Adder are basic building blocks for computing bit addition. Figure 6 shows how to synthesize a circuit of size 63 computing SUM16 out of four Half Adders and eleven Full Adders. It is not difficult to see that a similar block structure can be used for any n yielding a circuit of size at most 5n for SUMn.

Figure 6: A circuit computing SUM16 composed out of four Half Adders and eleven Full Adders. Its size is 42+115=63.

It turns out that better circuit designs are possible for SUMn as shown by Demenkov et al. [5]. Consider two consecutive Full Adders shown on the top left of Figure 7. The corresponding function DFA (for Double Full Adder) has the following specification:

DFA(x1,x2,x3,x4,x5)=(b0,b1,a1):x1+x2+x3+x4+x5=b0+2(b1+a1).

Then, MDFA (for Modified Double Full Adder) has the following specification:

MDFA(x1x2,x2,x3,x4,x4x5)=(b0,a1,a1b1).

That is, for pairs of bits (x1,x2), (x4,x5), and (a1,b1) it uses a slightly different encoding: (p,pq) instead of (p,q). We call such bits paired and show them in gray boxes in dot diagrams. It allows one to compute MDFA in eight gates (whereas the circuit size of DFA is 10). Moreover, the corresponding circuit of size eight is just a part of an optimal circuit of size 11 computing SUM5 shown on the right of Figure 7.

Figure 7: Two consecutive Full Adders (top left), the MDFA block (bottom left), an optimal circuit for SUM5 (top right) whose highlighted part computes MDFA, and a dot diagram for MDFA.

We also need a block called MDFA’ that can be viewed as a subfunction of MDFA:

MDFA(x1x2,x2,x4,x4x5)=MDFA(x1x2,x2,0,x4,x4x5).

It is not difficult to see that one can compute MDFA’ using six gates: when one replaces x3 by zero in the circuit for MDFA, the two gates fed by x3 can be eliminated. (In the same manner, HA is a subfunction of FA: HA(x1,x2)=FA(x1,x2,0).)

Using MDFA and MDFA’ blocks, one can compute SUMn roughly as follows:

  1. 1.

    Compute x2x3,x4x5,,xn1xn (n/2 gates).

  2. 2.

    Apply at most n/2 MDFA blocks (no more than 4n gates).

  3. 3.

    The last MDFA block outputs two bits: a and ab. Instead of them, one needs to compute ab and ab. To achieve this, it suffices to apply x>y=(xy¯) operation: ab=a>(ab).

This gives an upper bound 4.5n for SUMn, its formal proof can be found in [5]. Figure 8 gives an example of the corresponding design for SUM16.

Figure 8: A circuit computing SUM16 composed out of eight -gates at the top, three MDFA’ blocks, four MDFA blocks, and one final gate. Its size is 8+36+48+1=59.

3 New Upper Bound for Circuit Size of Bit Addition

In this section, we prove a new upper bound 4.5n2m for the circuit size of bit addition. For regimes where m is small compared to n, this is better than 5n3m by about 10%. This applies to MULTn and SUMn.

Theorem 1.

For any vector s0n,

size(BAns)4.5n2m.

In the proof, we use the following straightforward observation. Assume that s1<s2,s3,,sn. In this case, the first output is equal to x1, the cost of computing this particular bit of the output is zero, allowing one to forget about it. Thus,

size(BAns)=size(BAn1s),

where s=(s2,,sn). We call the operation of replacing s by s as shifting. Note that shifting reduces both the number of inputs and the number of outputs by one. Figure 9 gives an example.

Figure 9: Shifting: size(BA32,3,3)=size(BA23,3). In turn, BA23,3 can be computed by the Half Adder. Thus, BA32,3,3(x1,x2,x3)=(x1,x2x3,x2x3).

Proof.

As the first step, we do the following: at every significance layer, we break all bits, except for possibly one, into pairs and compute the parity for every pair. This takes at most n/2 gates.

Then, it remains to prove that one can compute the sum of n bits using 4n2m gates if every significance layer contains at most one bit without a pair. We prove this by induction on n. The base case n=1 is clear: in this case, the circuit size is zero (nothing needs to be summed up) and the upper bound is at least zero since mn. To prove the induction step, denote by l the number of minimum elements in the significance vector s (that is, the number of bits in the rightmost non-empty column in dot-notation).

Consider the following seven cases. (In fact, the first three cases are special cases of the last three cases, but we believe that the presentation is cleaner when they are stated as separate cases.) In each of the cases below, we shift and proceed by induction.

  1. 1.

    l=1. In this case, we just shift. By the induction hypothesis, the rest can be computed by a circuit of size at most

    4(n1)2(m1)=4n2m2<4n2m.
  2. 2.

    l=2. Then, the corresponding two bits x1 and x2 are paired meaning that their sum x1x2 is computed already. Then, we compute their carry

    c=x1>(x1x2)=x1x2

    and transfer it to the next layer. If this layer has an unpaired bit b, we pair b and c by computing bc. Finally, we shift. By the induction hypothesis, the size of the resulting circuit is at most

    1+1+4(n1)2(m1)=4n2m.
  3. 3.

    l=3. For the corresponding three bits x1,x2,x3, we have x1x2, x2, and x3 (that is, x1 and x2 are paired). We apply the Full Adder to the three bits. This costs four gates, as x1x2 is already computed and x1 is not needed (recall Figure 3). The sum bit stays on the same layer, whereas the carry bit c goes to the next layer. Then, we pair c with an unpaired bit on the next layer if needed and shift. This gives an upper bound

    4+1+4(n2)2(m1)<4n2m.
  4. 4.

    l=4k. Apply MDFA’ to two pairs to produce an unpaired bit. For the remaining 2k2 pairs, keep applying MDFA, each time reusing the unpaired bit. Then, we shift. The upper bound is

    6+8(k1)+4(n2k)2(m1)=4n2m.
  5. 5.

    l=4k+1. Apply MDFA k times, then shift. The upper bound is

    8k+4(n2k1)2(m1)<4n2m.
  6. 6.

    l=4k+2. For two bits a and ab from the same pair, compute the most significant bit of their sum using a single binary gate: a>(ab)=ab. This leaves their sum (ab) at the current layer and puts the just computed carry bit (ab) to the next layer. If needed, compute the parity of an unpaired bit with the just transferred carry bit. Then, apply MDFA k times and shift. Overall, the upper bound is

    1+1+8k+4(n2k1)2(m1)=4n2m.
  7. 7.

    l=4k+3. Apply the Full Adder to a pair of bits and the unpaired bit. If needed, pair the just transferred carry bit with an unpaired bit from the next layer. Then, apply MDFA k times and shift. The resulting upper bound is

    4+1+8k+4(n2k2)2(m1)<4n2m.

4 Lower Bounds and Limitations

The upper bound size(BAns)5n3m holds for any vector s, but in many scenarios it is loose. For example, for the function

ADDn=BA2n0,,n1,0,,n1,

this upper bounds turns into 52n3(n+1)=7n3, whereas size(ADDn)=5n3. Interestingly, the term 3m in the upper bound 5n3m cannot be increased: for any constant α>3, there exists a vector s such that size(BAns)5nαmO(1). One example of such a vector is s=(0,0,1,,n1). The corresponding function BAns adds a bit to an n-bit number. It is not difficult to see that it can be computed using n Half Adders (see Figure 10), hence its circuit size is at most 2n. Below, we show that this straightforward circuit is optimal (up to an additive constant). It also shows that it is impossible to improve our upper bound 4.5n2m to 4.5nβm for β>2.5.

Figure 10: One can add a bit t to a 7-bit integer x0x6 (to get an 8-bit integer y0y7) using 14 gates. A straightforward generalization of this construction ensures that size(BAn0,0,1,,n2)2n2.
Theorem 2.
size(BAn0,0,1,,n1)2nO(1).

Proof.

Assume that a bit to be added is equal to one (clearly, this only makes the function easier to compute). In other words, we consider the increment function INCn:{0,1}n{0,1}n+1. Thus, INCn(x0,,xn1)=(y0,,yn) such that

1+i=0n12ixi=i=0n2iyi.

It is not difficult to write down explicit formulas for all output bits of INCn. For example, for n=5, they are expressed as follows:

y0 =1x0
y1 =x0x1
y2 =x0x1x2
y3 =x0x1x2x3
y4 =x0x1x2x3x4
y5 =x0x1x2x3x4

We prove that size(INCn)2n2 by induction on n. The base case n=1 is clear. For the induction step, take an optimal circuit computing INCn and consider its (topologically) first gate A(xi,xj).

Now, if both the variables xi and xj had out-degree one, the whole circuit would depend on xi and xj through the gate A only. This would mean that there are two different pairs of constant (ai,aj),(bi,bj){0,1}2 such that A(ai,aj)=A(bi,bj). This, in turn, would mean that the circuit does not distinguish between assignments

xiai,xjaj and xibi,xjbj.

But such a circuit cannot compute the function INCn as INCn clearly distinguishes all four different assignments to xi and xj.

Thus, assume that, say, xi feeds at least two gates. Then, assign xi1 and simplify the circuit. During the simplification, the gates fed by xi are eliminated. (If a gate fed by xi becomes a negation of its other input under an assignment xi1, this negation is incorporated into the Boolean binary operations computed by the successors of this gate.) Also, the resulting circuit computes INCn1. To see this, it is instructive to get back to the previous toy example where n=5. Say, we assign x21. Then, the outputs are simplified as follows:

y0 =1x0
y1 =x0x1
y2 =x0x11
y3 =x0x1x3
y4 =x0x1x3x4
y5 =x0x1x3x4

By ignoring the output y2, one gets a function computing INC4:

(y0,y1,y3,y4,y5)=INC4(x0,x1,x3,x4).

By the induction hypothesis, the simplified circuit contains at least 2(n1)2=2n4 gates. Thus, the original circuit has size at least 2+2n4=2n2.

5 Implementation and Experimental Evaluation

We implemented efficient generators of our new circuits in the Cirbo open-source framework [1]. To generate a circuit computing BAns, one passes the vector s. Listing 1 shows how to use the generator to produce an efficient circuit computing SUM31 in a single line of code. When the circuit is generated, one can use a wide range of Cirbo methods to analyze and manipulate the circuit.

Listing 1: Generating an efficient circuit for SUM31 (that computes the binary representation of the sum of 31 bits). The code also prints the size of the resulting circuit and draws it.
from cirbo.synthesis.generation.arithmetics.summation
import generate_add_weighted_bits_efficient
ckt = generate_add_weighted_bits_efficient([0] * 31)
print(ckt.gates_number())
ckt.view_graph()

5.1 Adding Bits and Integers

Table 1 compares the size of circuits for SUMn composed out of Full Adders with circuits composed out of MDFA blocks (that can be generated using our new method), for various n. As the table reveals, for large values of n, the latter circuits are about 10% smaller than the former ones. Also, Listing 2 ensures that for the addition of two n-bit integers the generator produces circuits of size 5n3. Recall that ADDn=BA2n0,,n1,0,,n1 and that 5n3 is provably optimal circuit size for this function as shown by Red’kin [10].

Table 1: Comparing the size of circuits for SUMn composed out of Full Adders with circuits composed out of MDFA. The bottom row shows the improvement in percents.
n 7 31 127 511 2047 8191 32767 131071
FA 20 130 600 2510 10180 40890 163760 655270
MDFA 19 119 543 2263 9167 36807 147391 589751
Improvement 5.0% 8.5% 9.5% 9.8% 10.0% 10.0% 10.0% 10.0%
Listing 2: Ensuring that the generator produces circuits of size 5n3 for ADDn.
from cirbo.synthesis.generation.arithmetics.summation
import generate_add_weighted_bits_efficient as generate
for n in range(2, 100):
ckt = generate(list(range(n)) + list(range(n)))
assert ckt.gates_number() == 5 * n - 3

5.2 Multiplying Integers

Dadda’s multiplier is one of the first circuit designs for multiplying n-bits integers. Basically, it adds the partial products (conjunctions of the bits of the two input numbers) using Full Adders and Half Adders. Its size is about n2+5n2=6n2. Our method of summing up bits allows to reduce the size to roughly 5.5n2. An asymptotically faster method of multiplying n-bit integers was discovered by Karatsuba [6]. It is based on the divide-and-conquer approach: to multiply two n-bit integers, it makes three recursive calls to multiply two n/2-bit integers and then combines them using summation and subtraction only. This way, the running time T(n) of the algorithm satisfies a recurrence T(n)3T(n/2)+O(n), hence T(n)=O(nlog23). As with many other algorithms based on divide-and-conquer, when n becomes small, it is beneficial to multiply the given numbers directly (rather than recursively). We implemented generators based on Karatsuba algorithm that use Dadda multipliers and MDFA-based bit addition when n is smaller than 20. Table 2 and Figure 11 compare the size of the corresponding five circuit designs for 40n300. More accurate estimates on the circuits size of multipliers are given by Sergeev [11].

Table 2: Comparing the size of circuits for MULTn. The first multiplier, Dadda, computes the sum of the partial products using Full Adders and Half Adders. The second one, MDFA, sums up the partial products using MDFA blocks. The third one, Karatsuba, makes three recursive calls and recurses all the way down to 4-bit numbers. The fourth and fifth multipliers use Karatsuba algorithm, but switch to Dadda or MDFA multipliers when n becomes smaller than 20. The last row shows size improvement of the fifth multiplier over the fourth one.
n 40 80 120 160 200 240 280
Dadda 9280 37760 85440 152320 238400 343680 468160
MDFA 8539 34679 78419 139759 218699 315239 429379
Karatsuba 11789 37836 72209 118152 168200 223093 293405
Karatsuba, Dadda 7522 24770 49200 78598 113870 153948 199102
Karatsuba, MDFA 7155 23642 46504 75168 108284 145787 189657
Improvement 4.9% 4.6% 5.5% 4.4% 4.9% 5.3% 4.7%
Refer to caption
Figure 11: Comparing the size of five circuit designs for MULTn, for 40n300.

5.3 Logarithmic Depth

The depth of most of the circuits described above is linear, that is, Θ(n). With an additional care, one can make the depth logarithmic (Θ(logn)) by increasing the size slightly. To achieve this, one processes the layers in parallel rather than consecutively. Namely, let h be the maximum height of a significance layer (that is, every layer contains at most h bits). While h>3, apply in parallel as many FA’s as possible to every layer. After one such step, the maximum height becomes at most 2h/3 (to simplify the exposition, we ignore constant additive terms here): indeed, if there are kh bits on the current layer, then about k/3h/3 bits remain after the application of FA’s; also, at most h/3 bits are transferred from the next layer. Since the maximum height decreases geometrically, in at most O(logn) steps, one reaches the case when h3. This takes depth O(logn) and size O(n) (since each FA reduces the total number of bits by one). When h3, apply either HA or FA to every layer. This ensures that every layer has at most two bits, that is, h2 (the size of the resulting circuit is still linear and the depth is still logarithmic). Then, everything boils down to adding two k-bit numbers (with kn). This can be performed using, for example, the Brent–Kung adder [3] that has size O(k) and depth O(logk). By using MDFA’s instead of FA’s, one can further reduce the size of the resulting circuits. Table 3 shows the size and the depth of the circuits generated this way for the three previously considered functions: SUM, ADD, and MULT.

Table 3: The size and the depth of circuits computing SUMn, ADDn, and MULTn.
n 10 20 30 40 60 80 160 320
ADD 15 18 23 24 28 31 32 42 depth
49 101 153 194 297 383 755 1526 size
SUM 10 14 16 18 20 22 26 30 depth
64 141 215 298 452 615 1252 2529 size
MULT 29 39 46 51 55 62 70 81 depth
627 2301 5209 9158 20356 35971 142388 566539 size

6 Conclusion and Further Directions

In this paper, we presented smaller circuits for bit addition. In many practically relevant scenarios, the described circuits are about 10% smaller than the known circuits composed out of Half Adders and Full Adders. Also, we implemented generators that allow one to produce the corresponding circuits using a single line of code via the Cirbo open-source package [1].

There are three natural open problems related to the circuit size of bit addition.

  1. 1.

    What is the largest α such that

    size(BAns)4.5nαm

    holds for every vector s? In this paper, we proved that α2. Theorem 1 shows that α2.5. An example of a vector where our upper bound 4.5n2m matches the size of the circuit produced by our algorithm is

    s=(0,0,0,0,1,1,2,2,,n22,n22).

    In this case, our method first spends n/2 gates to pair the bits and then applies n/2 MDFA’ blocks. The size of the resulting circuit is, up to an additive constant, n/2+6n/2=3.5n. This matches the upper bound 4.5n2m. Thus, to improve the bound 4.5n2m to 4.5nβm, for β>2, one needs to find a smaller circuit for this particular vector s. And vice versa, by proving a lower bound size(BAns)3.5nO(1), one would prove that α=2.

  2. 2.

    What is the smallest γ such that

    size(BAns)γnO(m)

    holds for every vector s? In this paper, we proved that γ4.5. Improving this seems to be more challenging than just improving 2m to 2.5m as this would most probably require using new building blocks.

  3. 3.

    Finally, note also that the upper bounds 5n3m and 4.5n2m hold for all vectors s. It would be interesting to improve known upper and lower bounds for specific vectors. Perhaps, one of the most interesting such functions is SUMn (here, s=(0,0,,0)). For it, we know an upper bound 4.5n (originally proved by Demenkov et al. [5]; also follows from our Theorem 1) and a lower bound 2.5nO(1) due to Stockmeyer [13].

References

  • [1] Daniil Averkov, Tatiana Belova, Gregory Emdin, Mikhail Goncharov, Viktoriia Krivogornitsyna, Alexander S. Kulikov, Fedor Kurmazov, Daniil Levtsov, Georgie Levtsov, Vsevolod Vaskin, and Aleksey Vorobiev. Cirbo: A new tool for Boolean circuit analysis and synthesis. In AAAI, pages 11105–11112. AAAI Press, 2025. doi:10.1609/AAAI.V39I11.33207.
  • [2] K’Andrea C. Bickerstaff, Earl E. Swartzlander Jr., and Michael J. Schulte. Analysis of column compression multipliers. In IEEE Symposium on Computer Arithmetic, pages 33–39. IEEE Computer Society, 2001. doi:10.1109/ARITH.2001.930101.
  • [3] Richard P. Brent and H. T. Kung. A regular layout for parallel adders. IEEE Transactions on Computers, C-31(3):260–264, 1982. doi:10.1109/TC.1982.1675982.
  • [4] Luigi Dadda. Some schemes for parallel multipliers. Alta Frequenza, 34(5):349–356, 1965.
  • [5] Evgeny Demenkov, Arist Kojevnikov, Alexander S. Kulikov, and Grigory Yaroslavtsev. New upper bounds on the Boolean circuit complexity of symmetric functions. Inf. Process. Lett., 110(7):264–267, 2010. doi:10.1016/J.IPL.2010.01.007.
  • [6] Anatoly Karatsuba and Yury Ofman. Multiplication of many-digital numbers by automatic computers. Proceedings of the USSR Academy of Sciences, 145:293–294, 1962.
  • [7] Alexander S. Kulikov, Danila Pechenev, and Nikita Slezkin. SAT-based circuit local improvement. In MFCS, volume 241 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.MFCS.2022.67.
  • [8] Charles U. Martel, Vojin G. Oklobdzija, R. Ravi, and Paul F. Stelling. Design strategies for optimal multiplier circuits. In IEEE Symposium on Computer Arithmetic, pages 42–49. IEEE Computer Society, 1995. doi:10.1109/ARITH.1995.465378.
  • [9] Mike Paterson and Uri Zwick. Shallow circuits and concise formulae for multiple addition and multiplication. Comput. Complex., 3:262–291, 1993. doi:10.1007/BF01271371.
  • [10] Nikolay Red’kin. Minimal realization of a binary adder. Problemy kibernetiki, 38:181–216, 1981. In Russian.
  • [11] Igor S. Sergeev. On the circuit complexity of the standard and the Karatsuba methods of multiplying integers. In Information tools and technologies, volume 3, pages 180–187, 2016. In Russian. English version is available at http://arxiv.org/abs/1602.02362. arXiv:1602.02362.
  • [12] Paul F. Stelling, Charles U. Martel, Vojin G. Oklobdzija, and R. Ravi. Optimal circuits for parallel multipliers. IEEE Trans. Computers, 47(3):273–285, 1998. doi:10.1109/12.660163.
  • [13] Larry J. Stockmeyer. On the combinational complexity of certain symmetric Boolean functions. Math. Syst. Theory, 10:323–336, 1977. doi:10.1007/BF01683282.