Smaller Circuits for Bit Addition
Abstract
Bit addition arises virtually everywhere in digital circuits: arithmetic operations, increment/decrement operators, computing addresses and table indices, and so on. Since bit addition is such a basic task in Boolean circuit synthesis, a lot of research has been done on constructing efficient circuits for various special cases of it. A vast majority of these results are devoted to optimizing the circuit depth (also known as delay).
In this paper, we investigate the circuit size (also known as area) over the full binary basis of bit addition. Most of the known circuits are built from Half Adders and Full Adders as suggested by Dadda in 1965 for designing multiplier circuits. Applying these ideas to the bit addition function, one gets a upper bound on its circuit size, where is the number of input bits and is the number of output bits. We prove an upper bound . In the regimes where is small compared to (for example, for computing the sum of bits or multiplying two -bit integers), this leads to improvement. We also show that it is provably impossible to improve the two upper bounds above to or . We achieve this by establishing that the circuit size of the increment function (a special case of the bit addition function with ) is equal to .
We complement our theoretical result by an open-source implementation of generators producing circuits for bit addition and multiplication. The generators allow one to produce the corresponding circuits in two lines of code and to compare them to existing designs.
Keywords and phrases:
bit addition, summation, multiplier, multiplication, Boolean, circuit, synthesis, combinational, digitalCopyright and License:
2012 ACM Subject Classification:
Theory of computation Logic and verification ; Theory of computation Complexity theory and logic ; Theory of computation Circuit complexitySupplementary Material:
Software (Source Code): https://github.com/spbsat/cirboarchived at
swh:1:dir:bb1d9d50044cfe5a05b7c8dadc7404b306c90cd1
Acknowledgements:
We thank the anonymous reviewers for many helpful comments.Editors:
Meena Mahajan, Florin Manea, Annabelle McIver, and Nguyễn Kim ThắngSeries and Publisher:
Leibniz International Proceedings in Informatics, Schloss Dagstuhl – Leibniz-Zentrum für Informatik
1 Overview
Bit addition arises virtually everywhere in digital circuits: arithmetic operations, increment/decrement operators, computing addresses and table indices, and so on. Three specific scenarios where it is used frequently are listed below.
-
Adding two -bit numbers.
-
Computing a symmetric Boolean function (such as majority or sorting). A natural way of doing this is to first compute the binary representation of the sum of input bits (that is, to compress bits into about bits) and then to compute the function at hand out of the computed binary representation.
-
To multiply two -bit numbers, one may first compute all partial products (that is, products of the bits of the two input numbers) and then sum up the resulting bits.
In terms of the dot-notation introduced by Dadda [4], the three scenarios discussed above are visualized as shown in Figure 1. In this notation, one places bits of the same significance on the same vertical layer.
There are many other cases where one needs to add bits. Say, one may want to add a single bit to an -bit number (the increment operation is a special case), or to add three -bit numbers, or to add a few bits of varying significance, see Figure 2.
A function capturing all such scenarios is known as bit adder
It is parameterized by the significance vector , takes input bits , and outputs the binary representation of
This way, and .
Since bit addition is such a basic task in Boolean circuit synthesis, a lot of research has been done on constructing efficient circuits for various special cases of it, see, for example, [9, 8, 12, 2]. A vast majority of these results is devoted to optimizing the circuit depth (also known as delay). In this paper, we investigate the circuit size (also known as area) of bit addition. Specifically, we study circuits over the full binary basis.
Two basic building blocks for adding bits are known as Half Adder (HA) and Full Adder (FA). They compute the binary representation of the sum of two and three bits, respectively (that is, and ). In the full binary basis, they can be implemented in two and five gates, respectively, see Figure 3.
Using Half Adders and Full Adders, one can synthesize a bit adder using the following algorithm that goes back to Napier’s Rabdologiæ (1617), as modernized by Dadda [4].
Process the bits layer by layer, in the order of increasing significance. While the current significance layer contains at least three bits, take three of them and apply the Full Adder to replace them with a pair of bits of significance and . If there are two bits left at the current layer , apply the Half Adder to them to get a pair of bits of significance and .
This algorithm ensures that, for any vector ,
Indeed, each application of the Full Adder reduces the number of bits by one, hence the total cost of all Full Adders is at most . The Half Adder is applied at most once for each of the significance layers, hence the total cost of all Half Adders is at most . Hence, the total size is at most . Note that most of the previous works consider specific special cases of bit addition (like summation or multiplication) where is a specific function on . In such cases, the upper bounds are stated in terms of only. The parameter arises naturally when one considers the general bit addition function.
By applying this algorithm to partial products of bits of two input -bit numbers, one gets the well-known Dadda multiplier circuit [4]. For many vectors , the upper bound is loose: it does not match the size of the actual circuit produced by the algorithm. A straightforward example is : in this case, no gates are needed whereas the upper bound is . It is also worth noting that, in some cases, the resulting circuit is provably optimal. For example, for the function (that computes the sum of two -bit integers), the method constructs a circuit out of a single Half Adder and Full Adders. The resulting circuit is known as ripple-carry adder and has size . Red’kin [10] proved that there is no smaller circuit for this function.
At the same time, in many scenarios, not only the bound is loose, but also the circuit produced by the algorithm is suboptimal. For example, for , it gives a circuit of size consisting of two Full Adders and one Half Adder, see Figure 4. However, can be computed by a circuit of size as shown by [7] (see also Figure 7 later in the text). In general, whereas the algorithm produces a circuit of size about for , this function can be computed by a circuit of size about as shown by Demenkov et al. [5].
In this paper, we generalize the construction by Demenkov et al. Namely, we prove an upper bound for the circuit size of bit addition. In the regimes where is small compared to , this gives a circuit that is about smaller. This applies to the Dadda multiplier. We complement our theoretical result by an open source implementation of generators producing circuits for bit addition and multiplication.
We also show that it is provably impossible to improve the two upper bounds above to or . We achieve this by establishing that the circuit size of the increment function (a special case of the bit addition function with ) is equal to .
2 General Setting
In this section, we formally introduce the Boolean functions studied in this paper as well as the main building blocks for computing them.
2.1 Boolean Functions
The main Boolean function studied in this paper is bit adder
It computes the binary representation of the weighted sum of input bits:
In most interesting scenarios, all bits of the binary representation of this sum depend on the input and the number of outputs can be expressed as follows:
In such cases,
However, for some other significance vectors, some of the bits of the binary representation of the sum are identically equal to zero (and thus, do not depend on the input). We exclude such bits from the outputs. Thus, more generally, when we say that
we mean that there exists a vector such that and
It is not difficult to see that the vector is unique and that .
This way, the goal of bit addition is to “flatten” the distribution of bits, that is, to leave at most one bit at each significance layer. Figure 5 gives an example.
Many practically important Boolean functions can be computed using bit summation.
-
The function computes the sum of bits:
-
The function computes the sum of two -bit numbers:
-
The function computes the product of two -bit numbers:
2.2 Boolean Circuits
A circuit is a natural way of computing Boolean functions. It is an acyclic directed graph of in-degree and whose source nodes are labeled with input variables and constants and , whereas all other nodes are labeled with binary Boolean operations. The source nodes are called input gates, all other nodes are called internal gates. Each gate computes a (single-output) Boolean function of . If gates of the circuit are marked as outputs, it computes a function of the form . For a circuit , its size, , is the number of internal gates of , whereas its depth, , is the maximum length of a path from an input gate of to an output gate of .
2.3 Basic Building Blocks
As discussed before, the Half Adder and Full Adder are basic building blocks for computing bit addition. Figure 6 shows how to synthesize a circuit of size computing out of four Half Adders and eleven Full Adders. It is not difficult to see that a similar block structure can be used for any yielding a circuit of size at most for .
It turns out that better circuit designs are possible for as shown by Demenkov et al. [5]. Consider two consecutive Full Adders shown on the top left of Figure 7. The corresponding function DFA (for Double Full Adder) has the following specification:
Then, MDFA (for Modified Double Full Adder) has the following specification:
That is, for pairs of bits , , and it uses a slightly different encoding: instead of . We call such bits paired and show them in gray boxes in dot diagrams. It allows one to compute MDFA in eight gates (whereas the circuit size of DFA is 10). Moreover, the corresponding circuit of size eight is just a part of an optimal circuit of size computing shown on the right of Figure 7.
We also need a block called MDFA’ that can be viewed as a subfunction of MDFA:
It is not difficult to see that one can compute MDFA’ using six gates: when one replaces by zero in the circuit for MDFA, the two gates fed by can be eliminated. (In the same manner, HA is a subfunction of FA: .)
Using MDFA and MDFA’ blocks, one can compute roughly as follows:
-
1.
Compute ( gates).
-
2.
Apply at most blocks (no more than gates).
-
3.
The last MDFA block outputs two bits: and . Instead of them, one needs to compute and . To achieve this, it suffices to apply operation: .
This gives an upper bound for , its formal proof can be found in [5]. Figure 8 gives an example of the corresponding design for .
3 New Upper Bound for Circuit Size of Bit Addition
In this section, we prove a new upper bound for the circuit size of bit addition. For regimes where is small compared to , this is better than by about . This applies to and .
Theorem 1.
For any vector ,
In the proof, we use the following straightforward observation. Assume that . In this case, the first output is equal to , the cost of computing this particular bit of the output is zero, allowing one to forget about it. Thus,
where . We call the operation of replacing by as shifting. Note that shifting reduces both the number of inputs and the number of outputs by one. Figure 9 gives an example.
Proof.
As the first step, we do the following: at every significance layer, we break all bits, except for possibly one, into pairs and compute the parity for every pair. This takes at most gates.
Then, it remains to prove that one can compute the sum of bits using gates if every significance layer contains at most one bit without a pair. We prove this by induction on . The base case is clear: in this case, the circuit size is zero (nothing needs to be summed up) and the upper bound is at least zero since . To prove the induction step, denote by the number of minimum elements in the significance vector (that is, the number of bits in the rightmost non-empty column in dot-notation).
Consider the following seven cases. (In fact, the first three cases are special cases of the last three cases, but we believe that the presentation is cleaner when they are stated as separate cases.) In each of the cases below, we shift and proceed by induction.
-
1.
. In this case, we just shift. By the induction hypothesis, the rest can be computed by a circuit of size at most
-
2.
. Then, the corresponding two bits and are paired meaning that their sum is computed already. Then, we compute their carry
and transfer it to the next layer. If this layer has an unpaired bit , we pair and by computing . Finally, we shift. By the induction hypothesis, the size of the resulting circuit is at most
-
3.
. For the corresponding three bits , we have , , and (that is, and are paired). We apply the Full Adder to the three bits. This costs four gates, as is already computed and is not needed (recall Figure 3). The sum bit stays on the same layer, whereas the carry bit goes to the next layer. Then, we pair with an unpaired bit on the next layer if needed and shift. This gives an upper bound
-
4.
. Apply MDFA’ to two pairs to produce an unpaired bit. For the remaining pairs, keep applying MDFA, each time reusing the unpaired bit. Then, we shift. The upper bound is
-
5.
. Apply MDFA times, then shift. The upper bound is
-
6.
. For two bits and from the same pair, compute the most significant bit of their sum using a single binary gate: . This leaves their sum () at the current layer and puts the just computed carry bit () to the next layer. If needed, compute the parity of an unpaired bit with the just transferred carry bit. Then, apply MDFA times and shift. Overall, the upper bound is
-
7.
. Apply the Full Adder to a pair of bits and the unpaired bit. If needed, pair the just transferred carry bit with an unpaired bit from the next layer. Then, apply MDFA times and shift. The resulting upper bound is
4 Lower Bounds and Limitations
The upper bound holds for any vector , but in many scenarios it is loose. For example, for the function
this upper bounds turns into , whereas . Interestingly, the term in the upper bound cannot be increased: for any constant , there exists a vector such that . One example of such a vector is . The corresponding function adds a bit to an -bit number. It is not difficult to see that it can be computed using Half Adders (see Figure 10), hence its circuit size is at most . Below, we show that this straightforward circuit is optimal (up to an additive constant). It also shows that it is impossible to improve our upper bound to for .
Theorem 2.
Proof.
Assume that a bit to be added is equal to one (clearly, this only makes the function easier to compute). In other words, we consider the increment function . Thus, such that
It is not difficult to write down explicit formulas for all output bits of . For example, for , they are expressed as follows:
We prove that by induction on . The base case is clear. For the induction step, take an optimal circuit computing and consider its (topologically) first gate .
Now, if both the variables and had out-degree one, the whole circuit would depend on and through the gate only. This would mean that there are two different pairs of constant such that . This, in turn, would mean that the circuit does not distinguish between assignments
But such a circuit cannot compute the function as clearly distinguishes all four different assignments to and .
Thus, assume that, say, feeds at least two gates. Then, assign and simplify the circuit. During the simplification, the gates fed by are eliminated. (If a gate fed by becomes a negation of its other input under an assignment , this negation is incorporated into the Boolean binary operations computed by the successors of this gate.) Also, the resulting circuit computes . To see this, it is instructive to get back to the previous toy example where . Say, we assign . Then, the outputs are simplified as follows:
By ignoring the output , one gets a function computing :
By the induction hypothesis, the simplified circuit contains at least gates. Thus, the original circuit has size at least .
5 Implementation and Experimental Evaluation
We implemented efficient generators of our new circuits in the Cirbo open-source framework [1]. To generate a circuit computing , one passes the vector . Listing 1 shows how to use the generator to produce an efficient circuit computing in a single line of code. When the circuit is generated, one can use a wide range of Cirbo methods to analyze and manipulate the circuit.
5.1 Adding Bits and Integers
Table 1 compares the size of circuits for composed out of Full Adders with circuits composed out of MDFA blocks (that can be generated using our new method), for various . As the table reveals, for large values of , the latter circuits are about smaller than the former ones. Also, Listing 2 ensures that for the addition of two -bit integers the generator produces circuits of size . Recall that and that is provably optimal circuit size for this function as shown by Red’kin [10].
| 7 | 31 | 127 | 511 | 2047 | 8191 | 32767 | 131071 | |
| FA | 20 | 130 | 600 | 2510 | 10180 | 40890 | 163760 | 655270 |
| MDFA | 19 | 119 | 543 | 2263 | 9167 | 36807 | 147391 | 589751 |
| Improvement | 5.0% | 8.5% | 9.5% | 9.8% | 10.0% | 10.0% | 10.0% | 10.0% |
5.2 Multiplying Integers
Dadda’s multiplier is one of the first circuit designs for multiplying -bits integers. Basically, it adds the partial products (conjunctions of the bits of the two input numbers) using Full Adders and Half Adders. Its size is about . Our method of summing up bits allows to reduce the size to roughly . An asymptotically faster method of multiplying -bit integers was discovered by Karatsuba [6]. It is based on the divide-and-conquer approach: to multiply two -bit integers, it makes three recursive calls to multiply two -bit integers and then combines them using summation and subtraction only. This way, the running time of the algorithm satisfies a recurrence , hence . As with many other algorithms based on divide-and-conquer, when becomes small, it is beneficial to multiply the given numbers directly (rather than recursively). We implemented generators based on Karatsuba algorithm that use Dadda multipliers and MDFA-based bit addition when is smaller than . Table 2 and Figure 11 compare the size of the corresponding five circuit designs for . More accurate estimates on the circuits size of multipliers are given by Sergeev [11].
| 40 | 80 | 120 | 160 | 200 | 240 | 280 | |
| Dadda | 9280 | 37760 | 85440 | 152320 | 238400 | 343680 | 468160 |
| MDFA | 8539 | 34679 | 78419 | 139759 | 218699 | 315239 | 429379 |
| Karatsuba | 11789 | 37836 | 72209 | 118152 | 168200 | 223093 | 293405 |
| Karatsuba, Dadda | 7522 | 24770 | 49200 | 78598 | 113870 | 153948 | 199102 |
| Karatsuba, MDFA | 7155 | 23642 | 46504 | 75168 | 108284 | 145787 | 189657 |
| Improvement | 4.9% | 4.6% | 5.5% | 4.4% | 4.9% | 5.3% | 4.7% |
5.3 Logarithmic Depth
The depth of most of the circuits described above is linear, that is, . With an additional care, one can make the depth logarithmic () by increasing the size slightly. To achieve this, one processes the layers in parallel rather than consecutively. Namely, let be the maximum height of a significance layer (that is, every layer contains at most bits). While , apply in parallel as many FA’s as possible to every layer. After one such step, the maximum height becomes at most (to simplify the exposition, we ignore constant additive terms here): indeed, if there are bits on the current layer, then about bits remain after the application of FA’s; also, at most bits are transferred from the next layer. Since the maximum height decreases geometrically, in at most steps, one reaches the case when . This takes depth and size (since each FA reduces the total number of bits by one). When , apply either HA or FA to every layer. This ensures that every layer has at most two bits, that is, (the size of the resulting circuit is still linear and the depth is still logarithmic). Then, everything boils down to adding two -bit numbers (with ). This can be performed using, for example, the Brent–Kung adder [3] that has size and depth . By using MDFA’s instead of FA’s, one can further reduce the size of the resulting circuits. Table 3 shows the size and the depth of the circuits generated this way for the three previously considered functions: , , and .
| 10 | 20 | 30 | 40 | 60 | 80 | 160 | 320 | ||
|---|---|---|---|---|---|---|---|---|---|
| ADD | 15 | 18 | 23 | 24 | 28 | 31 | 32 | 42 | depth |
| 49 | 101 | 153 | 194 | 297 | 383 | 755 | 1526 | size | |
| SUM | 10 | 14 | 16 | 18 | 20 | 22 | 26 | 30 | depth |
| 64 | 141 | 215 | 298 | 452 | 615 | 1252 | 2529 | size | |
| MULT | 29 | 39 | 46 | 51 | 55 | 62 | 70 | 81 | depth |
| 627 | 2301 | 5209 | 9158 | 20356 | 35971 | 142388 | 566539 | size |
6 Conclusion and Further Directions
In this paper, we presented smaller circuits for bit addition. In many practically relevant scenarios, the described circuits are about 10% smaller than the known circuits composed out of Half Adders and Full Adders. Also, we implemented generators that allow one to produce the corresponding circuits using a single line of code via the Cirbo open-source package [1].
There are three natural open problems related to the circuit size of bit addition.
-
1.
What is the largest such that
holds for every vector ? In this paper, we proved that . Theorem 1 shows that . An example of a vector where our upper bound matches the size of the circuit produced by our algorithm is
In this case, our method first spends gates to pair the bits and then applies MDFA’ blocks. The size of the resulting circuit is, up to an additive constant, . This matches the upper bound . Thus, to improve the bound to , for , one needs to find a smaller circuit for this particular vector . And vice versa, by proving a lower bound , one would prove that .
-
2.
What is the smallest such that
holds for every vector ? In this paper, we proved that . Improving this seems to be more challenging than just improving to as this would most probably require using new building blocks.
-
3.
Finally, note also that the upper bounds and hold for all vectors . It would be interesting to improve known upper and lower bounds for specific vectors. Perhaps, one of the most interesting such functions is (here, ). For it, we know an upper bound (originally proved by Demenkov et al. [5]; also follows from our Theorem 1) and a lower bound due to Stockmeyer [13].
References
- [1] Daniil Averkov, Tatiana Belova, Gregory Emdin, Mikhail Goncharov, Viktoriia Krivogornitsyna, Alexander S. Kulikov, Fedor Kurmazov, Daniil Levtsov, Georgie Levtsov, Vsevolod Vaskin, and Aleksey Vorobiev. Cirbo: A new tool for Boolean circuit analysis and synthesis. In AAAI, pages 11105–11112. AAAI Press, 2025. doi:10.1609/AAAI.V39I11.33207.
- [2] K’Andrea C. Bickerstaff, Earl E. Swartzlander Jr., and Michael J. Schulte. Analysis of column compression multipliers. In IEEE Symposium on Computer Arithmetic, pages 33–39. IEEE Computer Society, 2001. doi:10.1109/ARITH.2001.930101.
- [3] Richard P. Brent and H. T. Kung. A regular layout for parallel adders. IEEE Transactions on Computers, C-31(3):260–264, 1982. doi:10.1109/TC.1982.1675982.
- [4] Luigi Dadda. Some schemes for parallel multipliers. Alta Frequenza, 34(5):349–356, 1965.
- [5] Evgeny Demenkov, Arist Kojevnikov, Alexander S. Kulikov, and Grigory Yaroslavtsev. New upper bounds on the Boolean circuit complexity of symmetric functions. Inf. Process. Lett., 110(7):264–267, 2010. doi:10.1016/J.IPL.2010.01.007.
- [6] Anatoly Karatsuba and Yury Ofman. Multiplication of many-digital numbers by automatic computers. Proceedings of the USSR Academy of Sciences, 145:293–294, 1962.
- [7] Alexander S. Kulikov, Danila Pechenev, and Nikita Slezkin. SAT-based circuit local improvement. In MFCS, volume 241 of LIPIcs, pages 67:1–67:15. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022. doi:10.4230/LIPIcs.MFCS.2022.67.
- [8] Charles U. Martel, Vojin G. Oklobdzija, R. Ravi, and Paul F. Stelling. Design strategies for optimal multiplier circuits. In IEEE Symposium on Computer Arithmetic, pages 42–49. IEEE Computer Society, 1995. doi:10.1109/ARITH.1995.465378.
- [9] Mike Paterson and Uri Zwick. Shallow circuits and concise formulae for multiple addition and multiplication. Comput. Complex., 3:262–291, 1993. doi:10.1007/BF01271371.
- [10] Nikolay Red’kin. Minimal realization of a binary adder. Problemy kibernetiki, 38:181–216, 1981. In Russian.
- [11] Igor S. Sergeev. On the circuit complexity of the standard and the Karatsuba methods of multiplying integers. In Information tools and technologies, volume 3, pages 180–187, 2016. In Russian. English version is available at http://arxiv.org/abs/1602.02362. arXiv:1602.02362.
- [12] Paul F. Stelling, Charles U. Martel, Vojin G. Oklobdzija, and R. Ravi. Optimal circuits for parallel multipliers. IEEE Trans. Computers, 47(3):273–285, 1998. doi:10.1109/12.660163.
- [13] Larry J. Stockmeyer. On the combinational complexity of certain symmetric Boolean functions. Math. Syst. Theory, 10:323–336, 1977. doi:10.1007/BF01683282.
