# Electron Counting based High-Radix Multiplication in Single Electron Tunneling Technology

Cor Meenderinck and Sorin Cotofana Computer Engineering Lab Delft University of Technology, Delft, The Netherlands {cor,sorin}@ce.et.tudeft.nl

Abstract— This paper investigates the implementation of highradix multiplication based on the Electron Counting (EC) paradigm in Single Electron Tunneling (SET) technology. First we propose a multiplication scheme which conceptually speaking follows the structure of traditional full-tree multipliers. The highradix EC multiplication scheme comprises three steps and of each an implementation is presented. Second, an 8-bit radix 4 EC multiplier is designed and verified by means of simulation. The high-radix multiplication scheme is evaluated in terms of area and delay for different operand sizes and compared with corresponding binary multiplication schemes in SET technology. Both type of implementations prove to have similar delay times, but the EC based scheme requires up to five times less area.

*Index Terms*— single electron tunneling, high-radix multiplication, computer arithmetic.

# I. INTRODUCTION

It is generally expected that current semiconductor technologies, i.e., CMOS, cannot be pushed beyond a certain limit because of problems arising in the area of power consumption and scalability. A promising alternative is Single Electron Tunneling (SET) technology [1], which has the potential of performing computation with lower power consumption than CMOS and is scalable to the nanometer region and beyond [2].

Several proposals have been made to implement computational operations using SET technology and these implementations are mainly categorized in two types (see for example [1], [3]). The first type of implementation represents logic values by voltage (see [3] for an overview) while the second type of implementation represents bits by single electrons. Single Electron Encoded Logic (SEEL) [4] is an example of the latter.

Using the second type of implementation, arithmetic units can be designed in a conventional logic design styles, e.g., using Boolean and/or threshold gates (see for example [5]). The Electron Counting (EC) paradigm [6], on the other hand, uses a novel design style and appears promising as an efficient computational paradigm for the implementation of SET based arithmetic operations, e.g., addition and multiplication. Previous EC based adder and multiplier implementations assumed that an unlimited amount of electrons could be transported within the EC building blocks, which does not hold true in practice. Therefore, a limit to the operand size of the previous proposed schemes is implied by the available SET fabrication technology. One way to alleviate this problem is to do highradix computation [7] and a high-radix addition scheme was proposed in [8]. In this paper we propose a high-radix EC multiplication scheme.

The remainder of this paper is organized as follows. Section II briefly describes the single electron tunnel phenomenon and introduces the EC paradigm. Section III introduces the proposed high-radix multiplication scheme. In Section IV the high-radix EC multiplier scheme is compared with a SEEL multiplication scheme and Section V concludes the paper.

#### II. BACKGROUND

SET circuits are based on tunnel junctions which consist of an ultra-thin insulating layer in a conducting material (see Figure 1). In classical physics no charge transport is possible through an insulator. However, when the insulating layer is thin enough the transport or *tunneling* of charge can be controlled in a discrete and accurate manner, i.e., one electron at a time. Tunneling through a junction becomes possible when the junction's current voltage  $V_j$  exceeds the junction's critical voltage  $V_c = \frac{q_e}{2(C_e + C_j)}$  [9], where  $q_e = 1.602 \cdot 10^{-19}C$ ,  $C_j$  is the capacitance of the junction, and  $C_e$  is the capacitive value of the remainder of the circuit as seen from the junction. In other words, tunneling can occur if and only if  $|V_i| \ge V_c$ .



Fig. 1. Schematic representation of the tunnel junction.

Electron tunneling is stochastic in nature and as such the delay cannot be analyzed in the traditional sense. Instead, for each transported electron one can describe the switching delay as  $t_d = \frac{-ln(P_{error})q_eR_t}{|V_j|-V_c}$ , where  $R_t$  is the junction's resistance and  $P_{error}$  is the chance that the desired charge transport has not occurred after  $t_d$  seconds. In this paper we assume  $R_t = 10^5\Omega$  and  $P_{error} = 10^{-8}$ . Each transported electron reduces the system energy by  $\Delta E = q_e(|V_j| - V_c)$  from which the consumed energy can be calculated.

Note that the implementations discussed in here are technology independent. SET tunnel junctions can for example be implemented by classical semiconductor lithography or by carbon nanotubes [10]. Therefore, circuit area is evaluated in terms the total number of circuit elements (capacitors and junctions).

As mentioned in the introduction, there are many ways to do computation using SET technology of which the Electron Counting paradigm seems to exploit the potential of SET most of all. In the EC paradigm, the ability to control the transport of individual electrons is utilized to encode integer values X directly as a charge  $Xq_e$ . Once binary values have been encoded as a number of electrons, one can perform arithmetic operations directly in electron charges, which reveals a broad range of novel computational schemes, generally referred to as Electron Counting (EC).

## III. HIGH-RADIX EC MULTIPLIER

The high-radix EC multiplication scheme we propose is based on the full-tree multiplication strategy [11] often used in fast multipliers, and comprises three steps. In the first step all partial products are produced at once in parallel. When assuming binary operands this can be done by simple ANDgates and for *n*-bit operands, this first step produces *n* rows of bits. In the second step the number of rows is reduced using one or more stages of counters. With each stage of counters, the number of rows is reduced until only two rows are left over. In the last step these two rows of bits are added, often by using a fast addition scheme like carry look-ahead.

The strategy of the high-radix EC multiplication we propose, which is depicted in Figure 2 for the case of 8-bit radix 4 multiplication, comprises the same three steps with some adjustments. In the next paragraphs each step is explained in more detail.



Fig. 2. 8-bit radix 4 EC multiplication strategy

In the *first step*, the partial products are formed. Since we work in radix r, the operands are split into digits of  $log_2r$  bits. Assuming an operand size of n bits, this results in  $\lceil \frac{n}{log_2r} \rceil$  digits for each operand. The multiplication of these digits is performed by EC multipliers, of which a 2-bit instance is depicted in Figure 3, and which operate as follows. Each bit of operand B is connected to an MVke block [6], which adds  $2^iq_e$  charge to the bottom charge reservoir if input  $b_i$  is logic '1'. Consequently, the charge reservoir contains the intermediate sum  $IS = \sum_{i=0}^{1} b_i 2^i q_e$ . This intermediate sum is feeded into a next set of MVke blocks, of which each block adds  $IS * a_i * 2^i q_e$  charge to the top charge reservoir,



Fig. 3. 2-bit EC multiplication scheme

which consequently contains the result of the multiplication. The value in the top reservoir is analog and it is not converted to the digital domain, since the next step is designed to accept analog values. The multiplication scheme contains an OpAmp which has not been designed yet, but which can potentially be implemented using a hybrid FET-SET technology [12], [13].

The direct application of the EC multiplication scheme for step one requires  $(\frac{n}{log_{2T}})^2$  such multipliers. However, we can reduce the number of elements if we observe that each digit of *B* can be converted to analog once, after which this analog value is used by all multipliers in the same row (see Figure 4).



Fig. 4. Step one of high-radix EC multiplication.

In the *second step* the number of rows is reduced by EC counters, which functionality is similar to normal population counters [14] used for binary operands. However, an EC counter assumes a number (k) of analog high-radix (r) inputs, all having the same weight, i.e., the inputs are all in the same column. The EC counter produces a number of outputs (s) in the same radix as the inputs, representing the sum of the inputs values, i.e., it produces a row. In the remainder of this paper we denote a specific instance of such an EC counter as EC (k,r;s) counter.

An EC counter implementation is depicted in Figure 5 for the case of four radix 16 inputs and operates as follows. Each analog input is buffered and amplified before it is fed into the MPSF blocks [15] in order to eliminate feedback effects. The



Fig. 5. EC (4,16;2) counter

*MPSF* blocks perform both the addition of the inputs and the conversion of the intermediate sum to the digital domain.

An MPSF block implements a Periodic Symmetric Function (PSF)  $F_s$  whose output is logic '1' within an interval from a to b, and with a period T (see Figure 6). Each bit  $s_i$ of a digital representation of a value X can be described as a PSF of X as  $s_i = F_{s,i}(X)$ , where the period is  $2^{i+1}$ . Thus, utilizing an MPSF block for each bit an analog to digital conversion can be performed. Once the intermediate sum is converted to the digital domain by the MPSF blocks, it is split into digits of  $log_2r$  bits, in order to guaranty that the output is in the correct radix. These digits are each converted back to the analog domain by sets of MVke blocks and the final result is stored in charge reservoirs.

Since in our case the maximum sum is 64, six MPSF blocks are required each one producing one bit. The first four of these bits are used to produce the analog sum output, which is done by four MVke blocks [6]. The last two bits are used to produce the carry signal, which is done by two more MVke blocks.



Fig. 6. Periodic Symmetric Function.

We note here that the partial products produced by the first step do not all have the same weight. In Figure 2 this is graphically represented, as the partial products in the bottom four rows have a different alignment as the ones in the top four rows. In order to end up with equal aligned intermediate sums, an adjusted counter can be used for the partial products in the bottom rows. In counter based binary full-tree multiplication, the number of rows is reduced to two, which subsequently is reduced to one row using a fast adder. However, as opposed to standard adders, EC adders can perform k:1 reduction (within certain limits for k) in almost the same delay as 2:1 reduction [15]. Therefore, in the high-radix EC multiplication scheme the number of rows does not have to be reduced to two, and the reduction process can be stopped earlier than in binary multiplication schemes. For example in the 8-bit radix 4 EC multiplication in Figure 2 only one stage of five counters is required.

The *third step* of the high-radix EC multiplication is the final addition of a number of rows of intermediate sums. In general, this step of the multiplication is performed by some fast addition scheme like carry look-ahead, carry-skip, etc. For the EC paradigm, such a fast addition scheme is not designed yet, thus in this paper we use a ripple carry structured adder.

The addition scheme we use in here, consists of several EC addition blocks, as depicted in Figure 7, and which functions as a high radix, multiple input full adder. The addition block has l inputs, a carry-in, and a carry-out all in radix  $r^2$  and  $2\log r$  output bits. The number of inputs l is limited to a maximum of r to ensure that the carry-out does not exceed the maximum possible value  $(r^2)$ . The EC addition block functions similar to the EC counter, excepts that the first part of bits is not converted to the analog domain. The bits of the second part are converted to the analog domain, which forms the carry-out signal. To create an adder, the addition blocks are cascaded in parallel with the carry-out of block i connected to the carry-in of block i + 1.



Fig. 7. EC Addition Block

To verify the high-radix multiplication scheme we simulated the 8-bit radix 4 multiplier in SIMON [16]. We used an approach of partitioning to simulate the whole multiplier, for the following reason. Although SIMON contains the OpAmp as circuit element, using it in SET circuitry causes some random effects to occur. Partitioning the circuit in parts ending with an OpAmp resolves this problem. To simulate the entire circuit, the output of each OpAmp was stored and used as an input in the next part. The simulation results indicated that the multiplier functions correctly.

#### IV. COMPARISON

To assess the efficiency of our proposal, we compared the high-radix EC multiplication scheme, with respect to area and delay, with a binary multiplication scheme based on SET Threshold Logic Gates (TLGs) [17]. The SET TLG multiplier uses a similar strategy as the EC multiplier, i.e., first partial products are calculated, which are next reduced to two rows, which are finally added by a carry look-ahead adder. Each of the components of this multiplier was implemented using TLGs in SET technology.

The reason this comparison was chosen, is to obtain a comparison on paradigm level and to avoid technology specific influences. Both the EC and TLG multiplier were implemented in SET technology, and both implementations were scaled such that the voltage representing the logic value '1' is the same in both cases. Though TLGs are somewhat unconventional in digital circuits, they do belong to the class of binary components, and as this is the only binary multiplier implemented in SET, as far as we know, we chose this one as comparison.

We compared the multipliers for both 16-bit and 64-bit operands. We chose to use radix 4 for the EC multiplier, resulting in radix 16 EC counters. We calculated that a minimum area can be achieved when using counters with 14 inputs, thus we utilized (14,16;2) EC counters. In our design we assumed that all counters are the same. Further optimizations for area could be done by selecting smaller counters where less inputs are required.

The area and delay for all four multipliers are presented in Table I. Since, for the EC multiplier in step three a simple ripple carry adder was used, while a carry look-ahead adder was used in the binary multiplier, a fair comparison of the two paradigms cannot be made on the total area and delay. Therefore, in the table the area and delay of step three is displayed separately while we focus on the area and delay of steps 1 and 2.

The delay of steps 1 and 2 of the 16-bit and 64-bit EC multipliers are equal, because for both designs only one stage of counters was required. We note here that for the EC multipliers, the delay of the OpAmp buffers is not included, since no implementation of these was done. Despite this, we can conclude that the delay of both implementations are in the same order of magnitude. The area of the high-radix EC multiplier is respectively 2 and 5 times smaller than the area of the TLG based multiplier.

# V. CONCLUSIONS

This paper investigated the implementation of high-radix multiplication based on the Electron Counting (EC) paradigm in Single Electron Tunneling (SET) technology. A multiplication scheme was proposed based on the full-tree multiplier strategy often used in binary multiplication. To verify the proposed scheme an 8-bit radix 4 instance was simulated. Evaluation of the proposed scheme in terms of area and delay

|            |        | Area (elements) |        | Delay (ns) |        |
|------------|--------|-----------------|--------|------------|--------|
| multiplier | # bits | step1 & 2       | step 3 | step 1 & 2 | step 3 |
| EC radix 4 | 16     | 3285            | 1715   | 19.6       | 96.8   |
| SET TLG    | 16     | 6766            | 3083   | 18.6       | 16.1   |
| EC radix 4 | 64     | 36420           | 6923   | 19.6       | 413.6  |
| SET TLG    | 64     | 192101          | 8406   | 25.8       | 17.9   |

TABLE I

COMPARISON OF EC AND BINARY MULTIPLIERS.

allowed a comparison to a binary multiplication scheme. Both schemes proved to have similar delay times, but the EC based scheme requires up to five times less area.

#### REFERENCES

- R. Waser, Ed., Nanoelectronics and Information Technology Advanced Electronic Materials and Novel Devices, 1st ed. Wiley-VCH, Berlin, 2003.
- [2] "International Technology Roadmap for Semiconductors, 2003 Edition, Executive Summary," Downloadable from website http://public.itrs.net/Home.htm, 2003, available from SEMATECH, ITRS department, 2706 Montoppolis Drive, Austin TX 78741, USA.
- [3] K. Likharev, "Single-Electron Devices and Their Applications," Proceeding of the IEEE, vol. 87, no. 4, pp. 606–632, April 1999.
- [4] C. Lageweg, S. Cotofana, and S. Vassiliadis, "Static buffered set based logic gates," in 2nd IEEE Conference on Nanotechnology (NANO), August 2002, pp. 491–494.
- [5] —, "Binary Addition based on Single Electron Tunneling Devices," in 4th IEEE Conference on Nanotechnology (NANO), August 2004.
- [6] S. Cotofana, C. Lageweg, and S. Vassiliadis, "On computing addition related arithmetic operations via controlled transport of charge," in proceedings of 16th IEEE Symposium on Computer Arithmetic, June 2003, pp. 245–252.
- [7] —, "Addition Related Arithmetic Operations via Controlled Transport of Charge," *IEEE Transactions of Computers*, vol. 54, no. 3, pp. 243– 256, March 2005.
- [8] C. Meenderinck and S. Cotofana, "Implementing high radix addition via conditional charge transport," in *Proceedings of 16th IEEE conference* on Application-Specific Systems, Architectures and Processors (ASAP), July 2005, pp. 294–299.
- [9] C. Wasshuber, "About single-electron devices and circuits," Ph.D. dissertation, TU Vienna, 1998.
- [10] K. Ishibashi, D. Tsuya, M. Suzuki, and Y. Aoyagi, "Fabrication of a Single-Electron Inverter in Multiwall Carbon Nanotubes," *Applied Physics Letters*, vol. 82, no. 19, pp. 3307–3309, February 2001.
- [11] B. Parhami, Computer Arithmetic. Oxford University Press, 2000.
- [12] S. Banerjee, S. Huang, and S. Oda, "Operation of Nanocrystalline-Silicon-Based Few-Electron Memory Devices in the Light of Electron Storage, Ejection and Lifetime Characteristics," *IEEE Transactions on Nanotechnology*, vol. 2, no. 2, pp. 88–92, 2003.
- [13] K. Likharev and A. Korotkov, "Ultradense Hybrid SET/FET Dynamic RAM: Feasibility of Background Charge Independent Room Temperature Single Electron Digital Circuits," in *Proceedings of the International Semiconductor Device Research Symposium*, Charlottesville, Virginia, October 1995, pp. 355–359.
- [14] L. Dadda, "Some schemes for parallel multipliers," *Alta Frequenza*, vol. 34, pp. 349–356, 1965.
- [15] C. Meenderinck and S. D. Cotofana, "Computing periodic symmetric functions in single electron tunneling technology," in *Proceedings of International Semiconductor Conference (CAS)*, October 2005, pp. 47– 50.
- [16] http://www.lybrary.com/simon/.
- [17] C. Lageweg, S. Cotofana, and S. Vassiliadis, "Binary multiplication based on single electron tunneling," in *proceedings of the 15th International Conference on Application-specific Systems, Architectures and Processors*, September 2004, pp. 152–166.