# Designing Tunable Subthreshold Logic Circuits Using Adaptive Feedback Equalization

Mahmoud Zangeneh, Student Member, IEEE, and Ajay Joshi, Member, IEEE

Abstract-Ultralow-power subthreshold logic circuits are becoming prominent in embedded applications with limited energy budgets. Minimum energy consumption of digital logic circuits can be obtained by operating in the subthreshold regime. However, in this regime process variations can result in up to an order of magnitude variations in  $I_{ON}/I_{OFF}$  ratios leading to timing errors, which can have a destructive effect on the functionality of the subthreshold circuits. These timing errors become more frequent in scaled technology nodes where process variations are highly prevalent. Therefore, mechanisms to mitigate these timing errors while minimizing the energy consumption are required. In this paper, we propose a tunable adaptive feedback equalizer circuit that can be used with a sequential digital logic to mitigate the process variation effects and reduce the dominant leakage energy component in the subthreshold digital logic circuits. We also present detailed energy-performance models of the adaptive feedback equalizer circuit. As part of the modeling approach, we also develop an analytical methodology to estimate the equivalent resistance of MOSFET devices in subthreshold regime. For a 64-bit adder designed in 130 nm, our proposed approach can reduce the normalized variation of the critical path delay from 16.1% to 11.4% while reducing the energy-delay product by 25.83% at minimum energy supply voltage.

Index Terms—Feedback equalizer, leakage energy component, subthreshold.

## I. INTRODUCTION

THE use of subthreshold digital CMOS logic circuits is becoming increasingly popular in energy-constrained applications where high performance is not required. The main idea here is that scaling down the supply voltage can significantly reduce the dynamic energy consumed by digital circuits. Scaling the supply voltage also lowers down the leakage current due to reduction in the drain-induced barrier lowering (DIBL) effect. However, as the supply voltage is scaled below the threshold voltage of the transistors, the propagation delay of the logic gates increases, which in turn increases the leakage energy of the transistors. These two opposite trends in the leakage and the dynamic energy components lead to a minimum energy supply voltage that occurs below the threshold voltage of the transistors for digital logic circuits [1]. However, digital logic circuits operating in the subthreshold region suffer from process variations that directly

The authors are with the Department of Electrical and Computer Engineering, Boston University, Boston, MA 02215 USA (e-mail: zangeneh@bu.edu; joshi@bu.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2015.2421881

affect the threshold voltage  $(V_T)$ . This in turn has a significant impact on the drive current due to the exponential relationship between the drive current and the threshold voltage of the transistors in the subthreshold regime. Moreover, subthreshold digital circuits suffer from the degraded  $I_{\rm ON}/I_{\rm OFF}$ ratios [2] resulting in a failure in providing rail-to-rail output swings when restricted by aggressive timing constraints. These degraded  $I_{\rm ON}/I_{\rm OFF}$  ratios and process-related variations make subthreshold circuits highly susceptible to timing errors that can further lead to complete system failures. Since the standard deviation of  $V_T$  varies inversely with the square root of the channel area [3], one approach to overcome the process variation is to upsize the transistors [2]. Alternately, one can increase the logic path depth to leverage the statistical averaging of the delay across gates [4] to overcome process variations. These approaches, however, increase the transistor parasitics, which in turn increases the energy consumption. In this paper, we first propose the use of a feedback equalizer circuit for lowering the energy consumption of digital logic operating in the subthreshold region while achieving robustness equivalent to that provided by [2]. Here, the feedback equalizer circuit (placed just before the flip-flop) adjusts the switching threshold of its inverter based on the output of the flip-flop in the previous cycle to reduce the charging/discharging time of the flip-flop's input capacitance. Moreover, the smaller input capacitance of the feedback equalizer reduces the switching time of the last gate in the combinational logic block. Overall, this reduces the total delay of the sequential logic, which makes it more robust to timing errors and allows aggressive clocking to reduce the dominant leakage energy. In addition to reducing energy consumption, we also demonstrate how the tuning capability of the equalizer can be used to enable extra charging/discharging paths for the flip-flop input capacitance after fabrication to mitigate timing errors resulting from worse than expected process variations in the subthreshold digital logic. In general, our approach of using feedback equalizer to lower energy consumption and improve robustness is independent of the methodology used for designing a combinational logic block operating in the subthreshold regime. The main contributions of this paper are as follows.

 We propose using an adaptive feedback equalizer circuit in the design of tunable subthreshold digital logic circuits. This adaptive feedback equalizer circuit can reduce energy consumption and improve performance of the subthreshold digital logic circuits. At the same time, the tunability of this feedback equalizer circuit

1063-8210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received November 14, 2014; revised February 13, 2015; accepted March 22, 2015.

enables postfabrication tuning of the digital logic block to overcome worse than expected process variations as well as lower energy and improve performance.

- 2) We present detailed analytical models (AMs) for performance and energy of the adaptive feedback equalizer circuit. These models can be easily used in combination with the existing performance and energy models for subthreshold circuits to generate subthreshold designs that meet energy and/or performance constraints.
- 3) For a 64-bit adder example circuit, we show that compared with [2], the use of our proposed adaptive feedback equalizer circuit can reduce the energy-delay product (EDP) by 25.83% and also reduce the normalized variation (3σ/μ) of the critical path delay from 16.1% to 11.4%. In addition, in case of worse than expected process variations, we show that the tuning capability of the equalizer circuit can be used postfabrication to reduce the normalized variation (3σ/μ) of the critical path delay with minimal increase in energy.

The rest of this paper is organized as follows. Section II discusses the related work in the design of low-power robust subthreshold circuits. A detailed description of the operation of the adaptive feedback equalizer circuit in subthreshold regime is presented in Section III. In Section IV, we present detailed circuit-level performance and energy models for the equalized circuits in subthreshold regime. In Section V, we explore the use of the adaptive feedback equalizer circuit in various digital logic circuits to improve energy efficiency and mitigate process variation effects.

# II. RELATED WORK

Several techniques have been proposed to design robust ultralow power subthreshold circuits. As described earlier, transistor upsizing [2] and increasing the logic path depth [4], [5] can be used to overcome process variations. The use of gates of different drive strengths has also been proposed to overcome process variations [6]. A detailed analysis on the timing variability and the metastability of the flip-flops designed in the subthreshold region has been presented in [7] and [8], respectively. Lotze and Manoli [9] have used the Schmitt trigger structures in subthreshold logic circuits to improve the  $I_{\rm ON}/I_{\rm OFF}$  ratio and effectively reduce the leakage from the gate output node. Pu et al. [10] proposed a design technique that uses a configurable  $V_T$  balancer to mitigate the  $V_T$  mismatch of transistors operating in subthreshold regime. Zhou et al. [11] propose to boost the drain current of the transistors using minimum-sized devices with fingers to mitigate the inverse narrow width effect in subthreshold domain. An analytical framework for subthreshold logic gate sizing based on statistical variations has been proposed in [12], which provides narrower delay distributions compared with the state-of-the-art approaches. Body-biasing has also been proposed to mitigate the impact of variations [13]. A controller that uses a sensor to first quantify the effect of process variations on subthreshold circuits and then generates an appropriate supply voltage to overcome that effect has been proposed in [14]. De Vita and Iannaccone [15] have used a current reference circuit to design a voltage regulator providing a supply voltage that makes the propagation delay of the subthreshold digital circuits almost insensitive to temperature and process variations. Using a differential dynamic logic in standby mode, Liu and Rabaey [16] propose to suppress leakage in the subthreshold circuits.

Error detection and correction techniques have been widely used to design resilient, energy-efficient abovethreshold architectures [17]-[20]. Tschanz et al. [17] and Bowman and Tschanz [18] have used a tunable replica circuit (with 3.5% leakage power overhead, 2.2% area overhead), and error-detection sequentials (with 5.1% leakage power overhead and 3.8% area overhead) to monitor critical path delays and mitigate dynamic variation guardbands for maximum throughput in the above-threshold regime. Using an adaptive clock controller based on error statistics, the proposed processor architecture operates at maximum efficiency across a range of dynamic variations. Bull et al. [19] applied Razor error correction technique (with 9.4% power overhead and 6.9% area overhead) to a 32-bit ARM processor with a microarchitecture design for energy-efficient operation through the elimination of timing margins. Whatmough et al. [20] applied Razor (with 16.9% power overhead and 1.59% area overhead) to a 16-tap finite-impulse response (FIR) filter realizing a 37% improvement in energy efficiency. These error correction techniques could be potentially used in combination with our feedback equalization technique to improve robustness in sequential logic blocks operating in the subthreshold regime.

We propose a circuit-level scheme that uses a communication-inspired feedback equalization technique in the critical path to mitigate the timing errors rising from aggressive voltage scaling and process variations in subthreshold digital logic circuits. It should be noted that we are not designing subthreshold communication circuits. We are proposing the design of subthreshold logic circuits that leverage principles of communication theory. Several authors have already used feedback-based techniques to boost the weak low-voltage signals in global interconnections [21]-[25]. Seo et al. [21] proposed the self-timed regenerator technique to improve the speed and power for on-chip global interconnects leading to 14% delay improvement over the conventional repeater design in the above-threshold regime. Schinkel et al. [22] presented a pulsewidth preemphasis equalization approach with lower latency compared with the classic repeater insertion technique. Kim and Seok [23] proposed a reconfigurable interconnect design technique based on regenerators for ultradynamicvoltage-scaling systems to improve performance and energy efficiency across a large range of above-threshold supply voltages. Seo et al. [24] proposed the design of an adaptively controlled preemphasis transceiver to reduce intersymbol interference in on-chip signaling. Kim and Stojanović [25] presented an energy-efficient transceiver design that performs feedforward equalization for repeaterless, high-performance on-chip communication.

Equalization techniques have been proposed to design energy-efficient logic circuits operating in the above-threshold regime. Takhirov *et al.* [27] proposed to use the feedback equalizer circuit with Schmitt trigger to mitigate timing errors



Fig. 1. Adaptive feedback equalizer circuit with multiple feedback paths (designed using a variable threshold inverter [26]) can be combined with a traditional master–slave flip-flop to design an adaptive E-flip-flop.

resulting from voltage scaling and in turn improve energy efficiency for the above-threshold logic circuits. Similarly, Zangeneh and Joshi [28] used feedback equalization to reduce the dominant leakage energy of subthreshold logic circuits. However, this technique is static and it does not have the capability to handle worse than expected intradie and interdie process variations. We propose using an adaptive feedback equalizer circuit in the design of tunable subthreshold digital logic circuits. This adaptive feedback equalizer circuit can reduce energy consumption and improve performance of the subthreshold digital logic circuits. Moreover, the tunability of this feedback equalizer circuit to overcome worse than expected process variations as well as improve energy and performance.

# III. ADAPTIVE EQUALIZED FLIP-FLOP VERSUS CONVENTIONAL FLIP-FLOP

In this section, we first explain the use of the adaptive feedback equalizer circuit in the design of an adaptive equalized flip-flop (E-flip-flop) and then provide a detailed comparison of the E-flip-flop with the conventional flip-flop in terms of area, setup time, and performance. We propose the use of a variable threshold inverter [26] (Fig. 1) as an adaptive feedback equalizer along with the classic master-slave positive edge-triggered flip-flop [29] (Fig. 2) to design an adaptive E-flip-flop. This adaptive feedback equalizer circuit consists of two feedforward transistors (M1 and M2 in Fig. 1) and four control transistors (M3 and M4 for feedback path 1 that is always ON and M5 and M6 for feedback path 2 that can be conditionally switched ON postfabrication in Fig. 1) that provide extra pull-up/pull-down paths in addition to the pull-up/pull-down path in the static inverter for the Data Flip-Flop input capacitance. The extra pull-up/pull-down paths are enabled whenever the output of the critical path in the combinational logic changes. The control transistors M5 and M6 are enabled/disabled through transistor switches (M7 and M8)



Fig. 2. Circuit diagram of classic master-slave positive edge-triggered flip-flop [29].

that are controlled by an asynchronous control latch. The value of the static control latch is initially reset to 0 during chip bootup. After bootup, if required a square pulse is sent to the En terminal to set the output of the latch to 1 to switch ON M7 and M8, which enables feedback path 2.

The adaptive E-flip-flop effectively modifies the switching threshold of the static inverter in the feedback equalizer based on the output of flip-flop in the previous cycle. If the previous output of the flip-flop is a 0, the switching threshold of the static inverter is lowered, which speeds up the transition of the flip-flop input from 0 to 1. Similarly if the previous output is 1, the switching threshold is increased, which speeds up the transition to 0. Effectively, the circuit adjusts the switching threshold and facilitates faster high-to-low and low-to-high transitions of the flip-flop input. Moreover, the smaller input capacitance of the feedback equalizer reduces the switching time of the last gate in the combinational logic block. Overall, this reduces the total delay of the sequential logic. The dc response of the adaptive feedback equalizer circuit with two different feedback paths in the subthreshold regime is shown in Fig. 3.

The adaptive E-flip-flop has eight more transistors than the conventional master–slave flip-flop [29]. Compared with a classic master–slave flip-flop with 22 transistors [seven inverters and four transmission gates (TGs)], the area overhead of the adaptive E-flip-flop is 36%. The area overhead of the



Fig. 3. DC response of the adaptive feedback equalizer circuit with two different feedback paths in the subthreshold regime. The switching threshold of the inverter is modified based on the previous sampled output data.

control latch with ten transistors (three inverters and two TGs) is 45%. This area overhead gets amortized across the entire sequential logic block.

The total energy consumed by a digital circuit in the subthreshold regime can be calculated using

$$E_T = E_{\text{DYN}} + E_L = C_{\text{eff}} V_{\text{DD}}^2 + I_{\text{leak}} V_{\text{DD}} T_D.$$
(1)

In (1),  $E_{\text{DYN}}$  and  $E_L$  are the dynamic and leakage energy components, respectively.  $C_{\text{eff}}$  is the total capacitance of the entire circuit,  $V_{\text{DD}}$  is the supply voltage, and  $T_D = 1/f$  is the total delay along the path of the digital logic block. Feedback equalization enables us to reduce the delay of the path in the digital logic block, which in turn reduces the leakage energy. In (1),  $I_{\text{leak}}$  is the leakage current and can be written as

$$I_{\text{leak}} = \mu_0 C_{\text{ox}} \frac{W}{L} (n-1) V_{\text{th}}^2 e^{\frac{\eta V_{\text{DS}} - V_T}{nV_{\text{th}}}}.$$
 (2)

In (2),  $V_T$  is the transistor threshold voltage,  $V_{\text{th}}$  is the thermal voltage, n is the subthreshold slope factor, and  $\eta$  is the DIBL coefficient. There is an exponential relationship between the leakage current and the supply voltage (due to the DIBL effect and because  $V_{\text{DS}} \approx V_{\text{DD}}$ ). Using the E-flip-flop, we can scale down the supply voltage while maintaining the zero-error rate at a given operating frequency and achieve lower dynamic energy consumption (due to the quadratic relationship between the dynamic energy and the supply voltage) as well as lower leakage energy (due to smaller DIBL effect that exponentially decreases the leakage current). Similar to the area overhead, the dynamic energy as well as the leakage energy overhead of the variable threshold inverter gets amortized across the entire sequential logic block.

The setup time of the conventional master–slave positive edge-triggered flip-flop is  $t_{s-t} = 3t_{inv} + t_{TG}$  [29]. Since the adaptive E-flip-flop uses an extra variable-threshold inverter at its input, the setup time of the adaptive E-flip-flop will be larger  $t_{s-t-equ} \approx 4t_{inv} + t_{TG}$  [28]. The clk-to-q delay of the conventional flip-flop is  $t_{c-q} = t_{inv} + t_{TG}$ . Since the E-flip-flop has the variable threshold inverter as extra load at the output, the  $t_{c-q}$  delay of the E-flip-flop is  $t_{c-q-equ} = t_{inv} + t_{TG} + \Delta t_{c-q}$ , which is slightly larger than the  $t_{c-q}$  delay of the conventional flip-flop. Here,  $\Delta t_{c-q}$  is the increase in inverter delay due to the extra load of the adaptive feedback equalizer circuit. However, the adaptive feedback equalizer circuit can significantly lower down the propagation delay of the critical path because the small input capacitance of the feedback equalizer reduces the switching time of the last gate in the combinational logic. The hold time of the classic master–slave positive edge-triggered flip-flop is zero [29]. Therefore, the adaptive feedback equalizer circuit does not impact the hold time violations. Table I compares the propagation delay, setup time, and the  $t_{c-q}$  delay of the two 64bit adders designed with the conventional flip-flop and E-flipflop in United Microelectronics Corporation (UMC) 130-nm process when operating with different supply voltages in the subthreshold regime.

We analyze the capability of the adaptive feedback equalizer circuit to reduce the transition time of the last gate in critical path of the subthreshold logic and make a comparison with the original nonequalized design, and the buffer-inserted nonequalized design (Fig. 4). The classic buffer insertion technique [Fig. 4(c)] will reduce the total delay along critical path of the subthreshold logic. Like the gates in the combinational logic, the buffer used in Fig. 4(c) is upsized to account for the process variation effects based on the design methodology proposed in [2]. Using a minimum-sized inverter instead of an upsized inverter would further lower down the delay but has lower reliability with respect to the dominant process variation effects in the subthreshold regime. So, we propose to use a combination of minimum-sized inverter and feedback equalizer circuit along the critical path of the subthreshold logic. Minimum-sized inverter reduces the total delay and the feedback equalizer mitigates the effect of process variation. Table II compares the timing characteristics of the original nonequalized logic (NE-logic) design, the bufferinserted NE-logic, and the E-logic design with one feedback path ON. The adaptive feedback equalizer circuit reduces the propagation delay along the critical path of the digital subthreshold logic while ensuring reliable operation compared with the NE-logic and the buffer-inserted design. Our analysis shows that the classic buffer insertion technique reduces the transition time of the last gate in critical path of the NE-logic by more than half and the proposed adaptive feedback equalizer circuit could further reduce the delay by 1/4. The setup time and the clk-to-q delay of the E-flip-flop is larger than that of the conventional flip-flop, but the total delay of the E-logic is smaller than the total delay of the NE-logic.

Fig. 5 shows the timing waveforms of the output carry bit of a 64-bit adder implemented in UMC 130-nm process using NE-logic and E-logic. In the figure, we show the waveform of clock signal, the input node of the non-E-flip-flop (NE-flip-flop), the input node of the E-flip-flop and the flip-flop output for both cases. Compared with the signal at the input node of the NE-flip-flop, the variable threshold circuit enables sharper transitions and decreases the propagation delay of the critical path of the subthreshold logic.

However, it should be noted that the E-flip-flop might sample the glitches due to the change in switching threshold. In order to avoid sampling of the glitch by the E-flip-flop, the positive edge of the clock signal should arrive after the

### TABLE I

COMPARISON BETWEEN THE TIMING CHARACTERISTICS OF THE E-LOGIC DESIGN WITH THE CONVENTIONAL NE-LOGIC DESIGN OF A 64-bit Adder at Different Supply Voltages Operating in Subthreshold Regime. Feedback Equalization Technique Reduces the Propagation Delay of the 64-bit Adder, But the Setup Time and clk-to-q Delay of the E-Flip-Flop Is Larger Than That of the Conventional Flip-Flop. Here, the Feedback Path 2 Is off

| Supply<br>voltage (mV) | Propagation delay<br>NE-logic (ns) | Propagation delay<br>E-logic (ns) | $t_{c-q}$<br>NE-flip flop (ns) | $\begin{array}{c} t_{c-q-eq} \\ \text{E-flip flop (ns)} \end{array}$ | $\begin{array}{c} t_{s-t} \\ \text{NE-flip flop (ns)} \end{array}$ | $t_{s-t-eq}$<br>E-flip flop (ns) |
|------------------------|------------------------------------|-----------------------------------|--------------------------------|----------------------------------------------------------------------|--------------------------------------------------------------------|----------------------------------|
| 350                    | 35                                 | 27                                | 3.82                           | 4.06                                                                 | 6.07                                                               | 8.70                             |
| 330                    | 49                                 | 38                                | 5.66                           | 6.51                                                                 | 9.01                                                               | 13.51                            |
| 310                    | 70                                 | 58                                | 8.23                           | 10.90                                                                | 13.11                                                              | 19.74                            |
| 290                    | 107                                | 80                                | 12.61                          | 16.71                                                                | 20.09                                                              | 30.25                            |
| 270                    | 150                                | 117                               | 17.87                          | 23.67                                                                | 28.49                                                              | 42.73                            |
| 250                    | 248                                | 210                               | 27.89                          | 36.95                                                                | 44.46                                                              | 66.69                            |



Fig. 4. Block diagrams of (a) original nonequalized design, (b) equalized design with one feedback path ON, and (c) buffer-inserted nonequalized design.

TABLE II Comparison Between the Timing Characteristics of the Original Nonequalized Design, the Equalized Design With One Feedback Path on, and the Buffer-Inserted Nonequalized Design

| Design methodology       | Transition time<br>(ns) | $t_{c-q}$ (ns) | $\begin{array}{c} t_{s-t} \\ (\mathrm{ns}) \end{array}$ |
|--------------------------|-------------------------|----------------|---------------------------------------------------------|
| NE-logic                 | 25.03                   | 10.7           | 14.1                                                    |
| Buffer-inserted NE-logic | 11.38                   | 10.7           | 20.2                                                    |
| E-logic                  | 2.9                     | 11.5           | 21                                                      |

occurrence of the glitch. Moreover, the switching threshold of the adaptive feedback equalizer circuit should still be larger than the amplitude of the glitch. This would specify the maximum allowable feedback strength of the adaptive feedback equalization technique (maximum tolerable glitch amplitude shown in Fig. 6). The sampling of a glitch leads to the marginal increase in the dynamic energy of the sequential logic block (0.72% increase in the 64-bit adder), but it has a negligible impact on the overall energy consumption as it is not the dominant energy component in the subthreshold regime. The feedback equalizer circuit also reduces the pulse width of



Fig. 5. Comparison between the timing waveforms of the NE-logic design and the E-logic design of a 64-bit adder. Here, the waveforms include the clock signal (A), input node of the conventional flip-flop (B), output node of the conventional flip-flop (C), input node of the E-flip-flop (D), output node of the E-flip-flop (E). Feedback circuit enables sharper transitions in the waveforms of the combinational logic output node helping the E-flip-flop sample the correct data. Here, the feedback path 2 is OFF.



Fig. 6. Maximum feedback strength in adaptive E-flip-flop. The switching threshold of the adaptive E-flip-flop should be larger than the maximum amplitude of the glitch.

the glitch (by 41%) (Fig. 5). This decreases the required guardband in the clock period to avoid sampling the glitch (and hence we can reduce the clock period), which ultimately reduces the dominant leakage energy component of the subthreshold logic block by 5.1% in the 64-bit adder at minimum-energy supply voltage. To avoid the metastability problem in the E-flip-flop, both the setup time and hold

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

time constraints should be satisfied. The setup time and the clk-to-q delay of the adaptive E-flip-flop are larger than that of the classic master–slave positive edge-triggered flip-flop. However, the feedback equalizer circuit can lower down the propagation delay of the critical path since it significantly reduces the switching time of the last gate in the combinational logic. Therefore, if we match the clock period for both the NE-logic and the E-logic, then the setup time condition is easily met. In fact, it should be noted that the E-logic enables a reduction in the clock period (Table I).

The hold time constraint of the flip-flop is as follows:

$$t_{\rm hold} < t_{\rm cdFF} + t_{\rm collogic} \tag{3}$$

where  $t_{cdFF}$  is the minimum propagation delay of the flip-flop and  $t_{cdlogic}$  is the minimum propagation delay of logic. The hold time of the E-flip-flop is zero. So, the hold time constraint is also fulfilled, which insures the stability of feedback equalizer circuit in the subthreshold regime.

#### IV. MODELING OF FEEDBACK EQUALIZER CIRCUITS

In this section, we present detailed AMs for the performance and the energy of adaptive equalizer circuits operating in the subtreshold regime. Using these models, we determine the sizes for feedforward transistors and control transistors in the feedback equalizer circuit that minimize total delay and leakage energy for the equalized subthreshold logic. Without loss of generality, we choose minimum-sized transistors for matching high-to-low and low-to-high propagation delay in the static inverter of the feedback equalizer circuit. As part of the effort, we first develop an analytical methodology to calculate the equivalent channel resistance of active MOSFET devices operating in the subthreshold regime. The proposed model is validated against HSPICE simulations (HSs) using UMC 130-nm process.

The average channel resistance of MOSFET devices in the subthreshold regime can be approximated as

$$R_{\rm eq} = \frac{1}{t_2 - t_1} \int_{t_1}^{t_2} R_{\rm ON}(t) \, dt = \frac{1}{t_2 - t_1} \int_{t_1}^{t_2} \frac{V_{\rm DS}(t)}{I_D(t)} \, dt \quad (4)$$

where  $R_{ON}(t)$  is the finite switching resistance,  $V_{DS}(t)$  is the drain to source voltage, and  $I_D(t)$  is the drain current. Assuming for the case of an nMOS discharging a load capacitor from  $V_{DD}$  to  $V_{DD}/2$  (this is virtually the definition of the propagation delay), we can derive the value of the equivalent resistance using

$$R_{\rm eq} = \frac{1}{V_{\rm DD}/2} \int_{V_{\rm DD}/2}^{V_{\rm DD}} \frac{v}{I_D} \, dv \tag{5}$$

where v is the auxiliary variable that accounts for the change in the  $V_{\text{DS}}$  voltage. The equivalent channel resistance in (5) can be approximated as

$$R_{\rm eq} \approx \frac{1}{I_0 \times V_{\rm DD}/2} \int_{V_{\rm DD}/2}^{V_{\rm DD}} \frac{v}{1 - e^{-v/V_{\rm th}}} \, dv \tag{6}$$

where the constant  $I_0 = \mu_0 C_{\text{ox}}(W/L)(n-1)$  $V_{\text{th}}^2 e^{(V_{\text{DD}}-V_T/nV_{\text{th}})}$ . The equivalent channel resistance in (6) is valid near the minimum energy point where the rise time



Fig. 7. Comparison between AM and HSs for equivalent channel resistance of MOSFET devices operating in subthreshold regime. The average error between the derived model and the HS results is 6.96% in the entire subthreshold regime.

of the input signal is smaller than the propagation delay of the logic gate in the subthreshold regime. Fig. 7 compares the channel resistance of NMOS devices operating in the subthreshold regime calculated using (6) with HSs for three different channel widths using UMC 130-nm process. The average error between the derived model and the HS is 6.96% in the entire subthreshold regime.

The clock period constraint of a typical sequential digital logic block can be written as

$$t_{\rm clk} > t_{\rm PD} + t_{s-t} + t_{c-q} \tag{7}$$

where  $t_{clk}$  is the clock period,  $t_{PD}$  is the propagation delay of logic,  $t_{s-t}$  is the setup time, and  $t_{c-q}$  is the clk-to-qdelay of the flip-flop. In an equalized sequential logic block, the propagation delay of the E-logic can be written as

$$t_{\rm PD-equ} = t'_{\rm PD} + 0.69R_{\rm out} \times C_{\rm in-equ}$$
(8)

where  $t'_{PD}$  is the propagation delay of the combinational logic part excluding the final gate,  $R_{out}$  is the output resistance of the final gate in the critical path of NE-logic, and  $C_{in-equ}$  is the input capacitance of the feedback equalizer circuit and can be written as (Fig. 1)

$$C_{\text{in-equ}} = C_{\text{stat-inv-}g} + C_{M1-g} + C_{M2-g}.$$
 (9)

In (9),  $C_{\text{stat-inv}-g}$  is the input capacitance of the static inverter,  $C_{M1-g}$  and  $C_{M2-g}$  are the gate capacitance of feedforward transistors. The setup time of the E-flip-flop can be written as  $t_{s-t-\text{equ}} = t_{s-t} + \Delta t_{s-t}$ , where  $t_{s-t}$  is the setup time of the conventional NE-flip-flop and  $\Delta t_{s-t}$  is due to the equalization overhead.  $\Delta t_{s-t}$  for a falling transition can be written as

$$\Delta t_{s-t} = 0.69[R_{M1} \times (C_{M1-d} + C_{M3,5-d}) + (R_{\text{stat-inv}} ||(R_{M1} + R_{M3(5)})) \times C_T] \quad (10)$$

where  $R_{\text{stat-inv}}$  and  $R_{M1}$  are the equivalent resistance of the typical static inverter and feedforward transistor, respectively.  $R_{M3(5)}$  is the equivalent resistance of the control transistor for feedback path 1 or is the equivalent resistance of the control transistors for both feedback paths (if the second path is activated).  $C_{M1-d}$  is the drain junction capacitance of feedforward transistor,  $C_{M3,5-d}$  is the junction capacitance



Fig. 8. Contour plots for the  $\Delta t_{s-t}$  (nanosecond),  $\Delta t_{c-q}$  (nanosecond) of the adaptive E-flip-flop, and  $t_{PD-equ}$  (nanosecond) of the critical path in the E-logic (64-bit adder). Control path strength and feedforward path strength values are normalized to minimum-sized transistor sizes.

of control transistor for feedback path 1 and 2, and  $C_T = (C_{\text{stat-inv}-d} + C_{M3,4,5,6-d} + C_{\text{in}-FF})$  is the total capacitance at the output node of the variable threshold inverter. Here,  $C_{\text{stat-inv}-d}$  is the drain junction capacitance of typical static inverter,  $C_{M3,4,5,6-d}$  is the drain junction capacitance of control transistors for feedback path 1 and 2, and  $C_{\text{in}-FF}$  is the input capacitance of conventional NE-flip-flop.

As it was mentioned in Section III, the clk-to-q delay of the E-flip-flop is  $t_{c-q-\text{equ}} = t_{\text{inv}} + \Delta t_{c-q} + t_{\text{TG}}$  where  $\Delta t_{c-q}$  is the increase in inverter delay due to the extra load of the adaptive feedback equalizer circuit. The  $\Delta t_{c-q}$  in the E-flip-flop can be written as

$$\Delta t_{c-q} = 0.69[R_{\text{out}-\text{FF}} \times (C_{M7,8-d} + C_{M3,4-g}) + (R_{\text{out}-\text{FF}} + R_{M7}) \times C_{M5-g} + R_{\text{out}-\text{FF}} \times C_{M6-g}].$$
(11)

Here,  $R_{out-FF}$  is the output resistance of NE-flip-flop,  $R_{M7}$  and  $C_{M7,8-d}$  are the equivalent resistance and drain/source capacitance of the M7 and M8 transistor switches, which enable/disable the control transistors.  $C_{M3,4-g}$  and  $C_{M5,6-g}$  are the gate capacitance of control transistors for feedback path 1 and 2, respectively. The total gate capacitance of the MOSFET in the subthreshold regime is size-dependent and can be written as [30]

$$C_g = WC_{gso} + WC_{gdo} + WLC_{ox}(1 - 1/n)$$
(12)

where  $C_{gso}$  and  $C_{gdo}$  are the overlap capacitance per unit length at the source and drain, respectively, and *n* is the subthreshold slope factor. The total source or drain junction capacitance of the MOSFET in the subthreshold regime can be written as

$$C_i = A \cdot C_1 + (W + 2L_D) \cdot C_2 \tag{13}$$

where A represents the source or drain diffusion areas,  $C_1$  represents the capacitance per unit area from the bottom of the source/drain diffusion region pointing into the bulk,  $C_2$  is the capacitance per unit length of the sidewall regions,  $L_D$  is the length of the diffusion regions, and  $W + 2L_D$  represents the perimeter of the side wall.

To better understand the timing issues in E-logic, the contour plots for the  $\Delta t_{s-t}$ ,  $\Delta t_{c-q}$  of the adaptive E-flip-flop, and  $t_{PD-equ}$  of the critical path in an equalized 64-bit adder designed in UMC 130-nm process are shown in Fig. 8.



Fig. 9. Comparison between AM contour plots for the total delay (nanosecond) of the critical path in an equalized 64-bit adder with HSs.

The contour plots are for different strengths of feedforward path and control path (normalized to minimum-sized transistor) of the feedback equalizer circuit. For this analysis, we assume that only feedback path 1 is ON. From the delay models described in (10) and (11), we can see that increasing the size of feedforward and control transistors (i.e., feedback strength) reduces the  $\Delta t_{s-t}$  overhead of the E-flip-flop. However, the increase in the control path strength increases the  $\Delta t_{c-q}$  overhead (due to larger control transistors— M3, M4, M5, and M6) of the E-flip-flop. The change in the feedforward path strength does not have any impact on the clk-to-q delay (11). Similarly, the increase in the feedforward path strength increases the propagation delay of the logic (due to larger feedforward transistors-M1 and M2) and correspondingly increases the total delay of the critical path. The change in the control path strength does not have any impact on the critical path delay (8).

The contour plots for the total delay calculated from the AMs of the different delay components for an equalized 64-bit adder designed in UMC 130-nm process are shown in Fig. 9. The total delay is plotted for different normalized strengths of feedforward path (M1 and M2) and control path (M3, M4, M5, and M6) of the feedback equalizer circuit. Fig. 9 also shows the total delay values from HSs for various combinations of feedforward and control path strength. We can see that our models match well with HSs. In addition, Fig. 9 shows that choosing the minimum possible size for the feedforward and control transistors will lead to the minimum latency for the E-logic designed in the subthreshold regime.

8

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS



Fig. 10. Comparison between AM contour plots for the total energy (femtojoule/operation) of the equalized 64-bit adder with HSs.

The total energy consumed in the E-logic circuit can be calculated as

$$E'_T = E_T + E'_{\text{leak}} + E'_{\text{dyn}} \tag{14}$$

where  $E_T$  is the energy consumption of the E-logic circuit excluding the feedback equalizer circuit and can be calculated using (1).  $E'_{leak}$  is the leakage energy in the feedback equalizer circuit and can be calculated as  $I'_{leak}V_{DD}T_{D-equ}$ , where  $T_{D-equ}$ is the total latency of the E-logic and can be written as  $T_{D-equ} = t_{PD-equ} + t_{s-t-equ} + t_{c-q-equ}$  and  $I'_{leak}$  is the leakage current overhead of the adaptive feedback equalizer circuit and can be calculated as

$$I_{\text{leak}}' = \mu_0 C_{\text{ox}} \frac{\Sigma W_i}{L} (n-1) V_{\text{th}}^2 e^{\frac{\eta V_{\text{DS}} - V_T}{nV_{\text{th}}}}$$
(15)

where  $\Sigma W_i$  is sum of the widths for all of the transistors in the adaptive feedback equalizer circuit. The dynamic energy of the adaptive equalizer circuit  $(E'_{dyn})$  can be calculated as  $\Sigma C_{eff}(W_i)V_{DD}^2$ , where  $\Sigma C_{eff}(W_i)$  is the total parasitic capacitance due to all the transistors of the feedback equalizer circuit. A comparison between the AM contour plots for the total energy of the equalized 64-bit adder in UMC 130-nm process with HSs is shown in Fig. 10. The leakage energy component is directly proportional to the latency of the subthreshold logic. Therefore, using larger feedforward and control transistors increases the dominant leakage energy component of the digital logic in the subthreshold regime.

# V. EVALUATION

In this section, using a 64-bit adder designed in UMC 130-nm process as a sample circuit, we first explore the use of the feedback equalizer circuit to reduce energy consumption while maintaining reliable operation of the 64-bit adder. This is followed by the evaluation of the postfabrication tunability property of the adaptive equalizer circuit to manage the occurrence of worse than expected process variations in the 64-bit adder circuit after fabrication. In addition, we provide an evaluation of the use of feedback equalizer circuit in the 64-bit adder designed using aggressive technology nodes.

# A. Improvement of Energy Efficiency

We first explore the case where the feedback equalizer circuit reduces the rise/fall time of the last gate, and hence



Fig. 11. Operating frequency of the 64-bit adder for zero word error rate as function of different subthreshold supply voltages. The E-logic can run 18.91% (on average) faster than the NE-logic.



Fig. 12. Comparison between the total consumed energy as well as the dynamic/leakage components of the 64-bit adder for different supply voltages. Operating at the respective minimum energy supply voltage, the E-logic is burning 10.85% less total energy compared with the NE-logic.

the delay of the critical path of the combinational logic block leading to a higher operating frequency without any change in supply voltage. In general, the variable threshold inverter can be used to reduce the propagation delay of the critical path at any operating supply voltage. Fig. 11 shows the operating frequency of the 64-bit adder for different subthreshold supply voltages at zero-error rate for the E-logic and NE-logic when only the first feedback path is ON. Here, we determined the optimum sizing for the feedback equalizer circuit that minimizes the propagation delay of the critical path and avoids sampling of glitches to achieve zero-error rate operation at each supply voltage. The sizing of the combinational logic block is the same for both the E-logic and NE-logic and is determined using the design methodology described in [2] (assuming  $\sigma_{V_T} = 10$  mV) to address the degraded noise margin levels in subthreshold regime. The operating frequency of the E-logic is 18.91% (on average) higher than the NE-logic over the range of 250-350 mV.

By reducing the propagation delay of the critical path, the feedback equalizer circuit is capable of reducing the dominant leakage energy consumption of the digital logic in the subthreshold regime. Fig. 12 shows a head-to-head comparison between the total energy, the dynamic energy, and the leakage energy of the 64-bit adder for different supply voltages for the E-logic and NE-logic. By adding the feedback equalizer to the conventional flip-flop, the dynamic energy of the



Fig. 13. Block diagram of the 32-bit array multiplier.



Fig. 14. Block diagram of the three-tap 16-bit FIR filter.

E-logic is 2.69% (on average) larger than the NE-logic. This is negligible compared with the 18.5% reduction in the leakage energy (on average) of the design. The feedback circuit drops the minimum energy supply voltage of the E-logic by 10 mV while maintaining the zero-error rate operation. If operated at the respective minimum energy supply voltage, the E-logic consumes 10.85% less total energy compared with the NE-logic and runs 8.04% faster. If both designs are operated at the minimum energy supply voltage of the NE-logic, the E-logic runs 19.1% faster and consumes close to 10% less energy.

By decreasing the dominant leakage energy component of the subthreshold logic together with reducing the propagation delay of the critical path, the feedback equalization technique lowers the EDP of the logic designed in weak inversion region. On average, the E-logic design of the 64-bit adder has 24.4% smaller EDP value compared with the NE-logic design over the range of 250-350 mV for zero word error rate operation. If we compare the EDP at the respective minimum energy supply voltages, the use of E-flip-flop reduces the EDP of the 64-bit adder by 25.83%. To further evaluate the viability of E-logic, we consider a 32-bit array multiplier and a three-tap 16-bit FIR filter. In general, our methodology will be applicable to other types of binary multipliers, such as Wallace tree multiplier, Dadda multiplier, and so on, and other digital signal processing blocks with similar improvements. The block diagram of the 32-bit array multiplier and the three-tap 16-bit FIR filter are shown in Figs. 13 and 14, respectively.

TABLE III Comparison Between the Minimum Energy Point and the Corresponding Operating Frequency of the NE-Logic Versus E-Logic Design of Various Logic Blocks

| Logic block       | NE-logic   | E-logic    | NE-logic | E-logic |
|-------------------|------------|------------|----------|---------|
|                   | Energy     | Energy     | Freq.    | Freq.   |
|                   | (fJ/cycle) | (fJ/cycle) | (MHz)    | (MHz)   |
| 64-bit Adder      | 57.1       | 50.9       | 7.69     | 9.52    |
| 32-bit Multiplier | 319        | 298        | 3.18     | 3.44    |
| 16-bit FIR filter | 503        | 470        | 2.78     | 3.01    |



Fig. 15. EDP of the scaled-down equalized 64-bit adder for zero word error rate operation. We can achieve reliable operation even when the transistors in the E-logic design are scaled down to as small as  $75\% \times W_{\text{baseline}}$ .

Table III compares the minimum energy point and the corresponding operating frequency of E-logic design versus NE-logic design of a 64-bit adder, 32-bit array multiplier, and three-tap 16-bit FIR filter all designed using Cadence Encounter in UMC 130-nm process. On an average, the E-logic design has 18.45% lower EDP than the NE-logic design.

Using the proposed feedback-based technique, the critical sizing approach proposed in [2] for designing the subthreshold logic circuits can be relaxed while ensuring the reliable operation in presence of process variations. Fig. 15 compares the EDP of the scaled down E-logic and NE-logic of the 64-bit adder in UMC 130 nm for different subthreshold supply voltages and assuming a  $3\sigma_{V_T} = 30$  mV systematic variability in threshold voltage. Here, the transistors sized using [2]  $(W_{\text{baseline}})$  for the NE-logic can be scaled down to  $75\% \times W_{\text{baseline}}$  when using E-logic while ensuring reliable operation (no timing errors) at any given voltage. As a result the dynamic energy of E-logic decreases due to decrease in the transistor parasitic capacitances. For a given supply voltage, all E-logic designs are operated at the same frequency. The E-logic with transistor sizing <75% of  $W_{\text{baseline}}$  cannot operate at this frequency and has timing errors. Table IV summarizes the amount of energy savings of the E-logic with scaled down transistors compared with the NE-logic and E-logic. Overall the feedback equalization along with transistor size scaling consumes up to 19.39% lower total energy compared with the NE-logic in the subthreshold regime.

TABLE IV ENERGY SAVINGS IN SCALED-DOWN E-LOGIC COMPARED WITH BASELINE NE-LOGIC AND E-LOGIC AT THE MINIMUM ENERGY SUPPLY VOLTAGE WITH ZERO WORD ERROR RATE OPERATION

| Scaled-down                | Energy saving  | Energy saving |
|----------------------------|----------------|---------------|
| E-logic size               | w.r.t NE-logic | w.r.t E-logic |
| $95\% \times W_{baseline}$ | 9.63%          | 4.38%         |
| $85\% \times W_{baseline}$ | 14.61%         | 9.75%         |
| $75\% \times W_{baseline}$ | 19.39%         | 14.71%        |



Fig. 16. Delay distribution of the critical path in the 64-bit adder designed in UMC 130-nm process. The  $3 \times \sigma/\mu$  of the NE-logic, the E-logic with two different feedback strengths, and the buffer-inserted NE-logic are 16.1%, 11.4%, 7.14%, and 15% for  $\sigma_{VT} = 10$  mV at the minimum energy supply voltage, respectively. Here, E-logic designs are operating at 300 mV.

### B. Maintaining Robustness Using Postfabrication Tuning

In this section, we explore the use of the adaptive feedback equalizer circuit to mitigate worse than expected process variations. As described earlier, the adaptive feedback equalizer circuit dynamically modifies the switching threshold of the inverter driving the flip-flop and at the same time the smaller input capacitance of the feedback equalizer reduces the switching time of last gate in the combinational logic. This reduces the standard deviation  $\sigma$  of the total delay in the critical path. Fig. 16 shows the distribution of total delay of the critical path in the 64-bit adder designed in UMC 130-nm process for different standard deviation values of threshold voltage. The delay distributions are shown for the NE-logic, for the buffer-inserted NE-logic, for the E-logic when only one feedback path is ON (1-FB), and when both feedback paths are ON (2-FB). The sizing of the combinational logic block is the same for both the E-logic and the NE-logic and is determined using the design methodology described in [2] and assuming  $\sigma_{V_T} = 10$  mV. Considering  $\Delta V_T = 3 \times$  $\sigma_{V_T} = 30 \text{ mV}$  variation in the threshold voltage of the transistors, the normalized delay variation  $(3 \times \sigma/\mu)$  of the NE-logic, E-logic (1-FB), E-logic (2-FB), and the buffer-inserted NE-logic are 16.1%, 11.4%, 7.14%, and 15%, respectively, at the minimum energy supply voltage. Both the equalized designs have lower delay and lower total energy than the NE-logic designs. Between the two equalized designs, the E-logic (2-FB) design has lower normalized delay variation due to the extra pull-up/pull-down path in the feedback equalizer circuit. However, it has higher energy consumption

TABLE V COMPARISON BETWEEN THE TOTAL DELAY, TOTAL ENERGY, AND DELAY VARIATION OF THE DIGITAL LOGIC (64-bit ADDER) AT MINIMUM ENERGY SUPPLY VOLTAGE WHEN THE CONVENTIONAL UPSIZING METHOD [2] HAS BEEN USED TOGETHER WITH THE ADAPTIVE FEEDBACK EQUALIZER CIRCUIT IN THE SUBTHRESHOLD REGIME

| Method                   | $\sigma_{V_T}$ (mV) | $3\sigma/\mu$ (delay) | Energy<br>(fJ/cycle) | Delay<br>(ns) |
|--------------------------|---------------------|-----------------------|----------------------|---------------|
| NE-logic Upsized [2]     | 10                  | 16.1%                 | 57.1                 | 130           |
| Buffer-inserted NE-logic | 10                  | 15%                   | 53                   | 113           |
| E-logic Upsized + 1-FB   | 10                  | 11.4%                 | 50.9                 | 105           |
| E-logic Upsized + 2-FB   | 10                  | 7.14%                 | 52.4                 | 105           |
| NE-logic Upsized [2]     | 15                  | 20.8%                 | 57.1                 | 130           |
| Buffer-inserted NE-logic | 15                  | 19.6%                 | 53                   | 113           |
| E-logic Upsized + 1-FB   | 15                  | 16%                   | 50.9                 | 105           |
| E-logic Upsized + 2-FB   | 15                  | 11.7%                 | 52.4                 | 105           |

due to more parasitics and higher dynamic/leakage energy components. Table V provides a head-to-head comparison of normalized delay variation, energy and delay of NE-logic, buffer-inserted NE-logic, E-logic (1-FB), and E-logic (2-FB). In the E-logic design, the control latch consumes 2.43 nW on an average.

In our feedback equalizer circuit, we propose that the second feedback path is switched ON postfabrication if the  $\sigma_{V_T}$  variations are worse than expected. The second feedback path compensates for the increase in the variation in logic path delays due to worse than expected  $\sigma_{V_T}$  variations and reduces the normalized  $3 \times \sigma / \mu$  of the total delay for the E-logic. As an example, say we design a 64-bit adder using E-logic assuming a  $\sigma_{V_T} = 10$  mV. With only one feedback path ON, the design has a  $3 \times \sigma/\mu$  of 11.4% for the delay. If postfabrication the  $\sigma_{VT}$  is larger and is equal to 15 mV, then we can switch ON the second feedback path to achieve a  $3 \times \sigma/\mu$  of close to 11.4% for the delay (see Fig. 16 and Table V). This will result in a 2.94% increase in energy. One could argue that we could design the 64-bit adder upfront to achieve a  $3 \times \sigma/\mu$  for the delay that is <11.4% and that way even if  $\sigma_{VT}$  is larger than expected, then we can still have a  $3 \times \sigma/\mu$  closer to 11.4%. However, to do this we will need to use larger M3 and M4 transistors (Fig. 1) resulting in higher energy consumption in the baseline 1-FB E-logic design. Thus, we propose that the first feedback path should be designed to achieve a target  $3 \times \sigma/\mu$  specification for the delay for an expected  $\sigma_{V_T}$ . Our proposed feedback equalizer then provides the option of switching ON the second feedback path to achieve the target  $3 \times \sigma/\mu$  specification for the delay in case  $\sigma_{V_T}$  turns out to be worse than expected. The size of the second feedback path is determined considering the worse than expected process variations.

Table VI compares the normalized delay variation and the EDP of the NE-logic design, buffer-inserted NE-logic, E-logic (1-FB) design, and E-logic (2-FB) design of a 64-bit adder, 32-bit array multiplier, and 16-bit FIR filter all designed using Cadence Encounter in UMC 130-nm process. In each case, both the E-logic approaches have lower  $3 \times \sigma/\mu$  delay

TABLE VIComparison Between the Normalized Delay Variation and the EDP of the E-Logic Versus NE-Logic and<br/>Buffer-Inserted Nonequalized Design of Various Logic Blocks Assuming  $\sigma_{VT} = 10 \text{ mV}$ 

|                                                        | NE-logic                                                               | Buffer-inserted<br>NE-logic   | E-logic<br>(1-FB)                                                      | E-logic<br>(2-FB)             | NE-logic                 | Buffer-inserted<br>NE-logic | E-logic<br>(1-FB)       | E-logic<br>(2-FB)  |
|--------------------------------------------------------|------------------------------------------------------------------------|-------------------------------|------------------------------------------------------------------------|-------------------------------|--------------------------|-----------------------------|-------------------------|--------------------|
| Logic block                                            | $\begin{array}{c c} 3 \times \sigma/\mu \\ \text{(delay)} \end{array}$ | $3 \times \sigma/\mu$ (delay) | $\begin{array}{c c} 3 \times \sigma/\mu \\ \text{(delay)} \end{array}$ | $3 \times \sigma/\mu$ (delay) | EDP<br>(fJ.µs)           | EDP<br>(fJ.µs)              | EDP<br>(fJ.µs)          | EDP<br>(fJ.µs)     |
| 64-bit Adder<br>32-bit Multiplier<br>16-bit FIR filter | 16.1%<br>12.2%<br>10.1%                                                | 15%<br>10.6%<br>8.2%          | 11.4%<br>8.5%<br>6.7%                                                  | 7.14%<br>6.1%<br>4.8%         | 7.42<br>100.16<br>180.57 | 5.98<br>96<br>172.4         | 5.34<br>86.42<br>156.04 | 5.5<br>89<br>160.6 |



Fig. 17. Delay distribution of the critical path in the 64-bit adder designed in UMC 130-nm process considering supply voltage variation.

variation than the NE-logic. Between the two E-logic designs, E-logic (2-FB) provides more robustness (smaller  $3 \times \sigma/\mu$ ) but higher energy compared with E-logic (1-FB).

#### C. Mitigating Voltage/Temperature Variations

In this section, we explore the use of the equalization technique to mitigate the effect of voltage and temperature variations on the performance of digital logic designed in the subthreshold regime. Fig. 17 shows the distribution of total delay of the critical path in the 64-bit adder designed in UMC 130-nm process in case of supply voltage variations. The delay distributions are shown for the NE-logic, for the buffer-inserted NE-logic, for the E-logic when only one feedback path is ON (1-FB), and when both feedback paths are ON (2-FB). Considering  $\Delta V_{dd} = 10$  mV supply voltage variation, the feedback equalization technique reduces the worst case delay of the subthreshold logic by 20.44% compared with the original NE-logic (3.1% smaller  $3 \times \sigma/\mu$ ) and by 8.8% compared with the buffer-inserted NE-logic. Considering  $\Delta V_{\rm dd} = 20$  mV supply voltage variation, the feedback equalization technique reduces the worst case delay of the subthreshold logic by 22.23% compared with the original NE-logic (4.7% smaller  $3 \times \sigma/\mu$ ) and by 9.27% compared with the buffer-inserted NE-logic. Here, there is not much difference between the results from E-logic (1-FB) and E-logic (2-FB).

Fig. 18 shows the distribution of total delay of the critical path in the 64-bit adder designed in UMC 130-nm process



Fig. 18. Delay distribution of the critical path in the 64-bit adder designed in UMC 130-nm process considering temperature variation.

in case of temperature variations. The delay distributions are shown for the NE-logic, for the buffer-inserted NE-logic, for the E-logic when only one feedback path is ON (1-FB), and when both feedback paths are ON (2-FB). Considering  $\Delta T = 10$  K temperature variation, the feedback equalization technique reduces the worst case delay of the subthreshold logic by 21.27% compared with the original NE-logic (2.3% smaller  $3 \times \sigma/\mu$ ) and by 7.6% compared with the buffer-inserted NE-logic. Considering  $\Delta T = 20$  K temperature variation, the feedback equalization technique reduces the worst case delay of the subthreshold logic by 22.42% compared with the original NE-logic (4.3% smaller  $3 \times \sigma/\mu$ ) and by 9.17% compared with the buffer-inserted NE-logic. Here, there is not much difference between the results from E-logic (1-FB) and E-logic (2-FB).

## D. Effect of Technology Scaling

In this section, we analyze the effect of technology scaling on the performance improvement and the energy reduction that can be obtained using the feedback equalization technique in the subthreshold regime. In scaled technology nodes, the contribution of leakage energy component increases due to larger DIBL effect as well as smaller  $V_T$  values. By running the subthreshold logic faster, the feedback equalizer can reduce the leakage energy component and in turn decrease the EDP in scaled technology nodes. Table VII compares the minimum energy supply voltage, contribution of dynamic/leakage energy components and delay of the 64-bit

#### IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

#### TABLE VII

COMPARISON IN TERMS OF THE MINIMUM ENERGY SUPPLY VOLTAGE, THE CONTRIBUTION OF DYNAMIC/LEAKAGE ENERGY COMPONENTS, AND DELAY OF THE E-LOGIC VERSUS NE-LOGIC AT DIFFERENT TECHNOLOGY NODES

| Technology Node<br>(nm) | Logic Style | Min-Energy Supply Voltage<br>(mV) | Leakage Energy<br>(fJ/cycle) | Dynamic Energy<br>(fJ/cycle) | Delay<br>(ns) |
|-------------------------|-------------|-----------------------------------|------------------------------|------------------------------|---------------|
| 45                      | NE-logic    | 287                               | 11                           | 9                            | 52            |
| 45                      | E-logic     | 287                               | 8.5                          | 10.1                         | 40.1          |
| 65                      | NE-logic    | 290                               | 11.9                         | 12.9                         | 56.1          |
| 65                      | E-logic     | 290                               | 9.5                          | 14.1                         | 43.9          |
| 90                      | NE-logic    | 295                               | 15                           | 18.9                         | 61            |
| 90                      | E-logic     | 295                               | 12.4                         | 20                           | 50            |
| 130                     | NE-logic    | 300                               | 21.1                         | 28.9                         | 69.5          |
| 130                     | E-logic     | 300                               | 17.9                         | 30.1                         | 59.1          |



Fig. 19. EDP of a 64-bit adder designed using E-logic versus NE-logic at zero word error rate at different technology nodes. The E-logic approach reduces the EDP of the subthreshold logic by up to 23.6% across all technology nodes in the minimum energy supply voltage.

adder designed using the E-logic and the NE-logic at different technology nodes using Predictive Technology Model [31]. Fig. 19 shows the value of the EDP of the 64-bit adder for the four different technology nodes. Here, we assume the second feedback path is switched OFF. Compared with the NE-logic design, the EDP of the E-logic design is 18.37%, 22.02%, 25.34%, and 28.66% smaller in 130-, 90-, 65-, and 45-nm technology nodes, respectively.

## E. Comparison With Other Subthreshold Design Techniques

In this section, we compare different techniques proposed in [2], [5], and [13] with our adaptive feedback equalizer circuit to mitigate process variations in digital subthreshold logic circuits. Feedback equalization complements these existing techniques and can be used along with these techniques for the subthreshold circuit design. The upsizing design methodology proposed in [2] increases the device parasitics, which in turn increases the dynamic and leakage energy components of the entire digital subthreshold logic block. As discussed in Section V-A, the feedback equalization technique relaxes the critical transistor upsizing method for subthreshold logic design proposed in [2] by 25% while ensuring the reliable operation in presence of process variations. For the 64-bit adder, the proposed feedback equalization technique has 10.8% lower total energy and 8.9% lower delay variation compared with the upsizing methodology proposed in [2]. As proposed in [5], increasing the logic path depth requires

inserting of additional buffers in the critical path of the subthreshold design to reduce the normalized  $(\sigma/\mu)$ delay variation. This increases the parasitics and the dominant leakage energy of the design with 33% overhead in the critical path area and 24% reduction in the normalized variation [5]. As discussed in Section V-B, compared with the bufferinserted NE-logic, feedback equalization technique has 8.02% lower EDP and 7.86% lower delay variation. As described in [32], body-biasing necessitates extra complex on-chip circuitry to generate the required voltage for the substrate terminal of the CMOS devices to reduce the dominant leakage energy of the subthreshold logic. In the processor example discussed in [32], body-biasing reduces the normalized variation by 3.1% but results in a 2% area overhead. The proposed adaptive feedback equalization technique reduces the normalized delay variation by 8.9% with 0.56% area overhead in the entire 64-bit adder. The proposed adaptive feedback equalizer circuit has simple topology, negligible area, and energy overhead and the capability to reduce the normalized delay variations postfabrication.

# VI. CONCLUSION

We proposed the application of a tunable adaptive feedback equalizer circuit to reduce the normalized variation of total delay along the critical path and the dominant leakage energy of the digital CMOS logic operating in the subthreshold regime. Adjusting the switching thresholds of the gates before the flip-flop based on the gate output in the previous cycle, the adaptive feedback equalizer circuit enables a faster switching of the gate outputs and provides the opportunity to reduce the leakage energy of digital logic in weak inversion region. We implemented a nonequalized and an equalized design of a 64-bit adder in UMC 130-nm process using static complementary CMOS logic. Using the equalized design the normalized variation of the total critical path delay can be reduced from 16.1% (nonequalized) to 11.4% (equalized) while reducing the EDP by 25.83% at minimum energy supply voltage. Moreover, we showed that in case of worse than expected process variation, the tuning capability of the equalizer circuit can be used postfabrication to reduce the normalized variation  $(3\sigma/\mu)$  of the critical path delay with minimal increase in energy. We also presented detailed delay and energy models of the equalized digital logic circuit operating in the subthreshold regime.

#### References

- A. Wang and A. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 310–319, Jan. 2005.
- [2] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65 nm sub-V<sub>t</sub> microcontroller with integrated SRAM and switched capacitor DC-DC converter," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 115–126, Jan. 2009.
- [3] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, Oct. 1989.
- [4] N. Verma, J. Kwong, and A. P. Chandrakasan, "Nanometer MOSFET variation in minimum energy subthreshold circuits," *IEEE Trans. Electron Devices*, vol. 55, no. 1, pp. 163–174, Jan. 2008.
- [5] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, Aug. 2005, pp. 20–25.
- [6] S. H. Choi, B. C. Paul, and K. Roy, "Novel sizing algorithm for yield improvement under process variation in nanometer technology," in *Proc. 41st Design Autom. Conf.*, Jul. 2004, pp. 454–459.
- [7] N. Lotze, M. Ortmanns, and Y. Manoli, "Variability of flip-flop timing at sub-threshold voltages," in *Proc. ACM/IEEE Int. Symp. Low Power Electron. Design (ISLPED)*, Aug. 2008, pp. 221–224.
- [8] D. Li, P. I.-J. Chuang, D. Nairn, and M. Sachdev, "Design and analysis of metastable-hardened flip-flops in sub-threshold region," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, Aug. 2011, pp. 157–162.
- [9] N. Lotze and Y. Manoli, "A 62 mV 0.13 μm CMOS standard-cellbased design technique using Schmitt-trigger logic," *IEEE J. Solid-State Circuits*, vol. 47, no. 1, pp. 47–60, Jan. 2012.
- [10] Y. Pu, J. P. de Gyvez, H. Corporaal, and Y. Ha, "An ultra-low-energy multi-standard JPEG co-processor in 65 nm CMOS with sub/near threshold supply voltage," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 668–680, Mar. 2010.
- [11] J. Zhou, S. Jayapal, B. Busze, L. Huang, and J. Stuyt, "A 40 nm inverse-narrow-width-effect-aware sub-threshold standard cell library," in *Proc. 48th ACM/EDAC/IEEE Design Autom. Conf. (DAC)*, Jun. 2011, pp. 441–446.
- [12] B. Liu, M. Ashouei, J. Huisken, and J. P. de Gyvez, "Standard cell sizing for subthreshold operation," in *Proc. 49th ACM/EDAC/IEEE Design Autom. Conf. (DAC)*, Jun. 2012, pp. 962–967.
  [13] N. Jayakumar and S. P. Khatri, "A variation-tolerant sub-threshold
- [13] N. Jayakumar and S. P. Khatri, "A variation-tolerant sub-threshold design approach," in *Proc. 42nd Design Autom. Conf.*, Jun. 2005, pp. 716–719.
- [14] B. Mishra, B. M. Al-Hashimi, and M. Zwolinski, "Variation resilient adaptive controller for subthreshold circuits," in *Proc. Design, Autom.*, *Test Eur. Conf. Exhibit. (DATE)*, Apr. 2009, pp. 142–147.
- [15] G. De Vita and G. Iannaccone, "A voltage regulator for subthreshold logic with low sensitivity to temperature and process variations," in *IEEE Int. Solid-State Circuits Conf., Dig. Tech. Papers (ISSCC)*, Feb. 2007, pp. 530–620.
- [16] T.-T. Liu and J. M. Rabaey, "A 0.25 V 460 nW asynchronous neural signal processor with inherent leakage suppression," in *Proc. Symp. VLSI Circuits (VLSIC)*, 2012, pp. 158–159.
- [17] J. Tschanz et al., "A 45 nm resilient and adaptive microprocessor core for dynamic variation tolerance," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC)*, Feb. 2010, pp. 282–283.
- [18] K. A. Bowman and J. W. Tschanz, "Resilient microprocessor design for improving performance and energy efficiency," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, Nov. 2010, pp. 85–88.
- [19] D. Bull, S. Das, K. Shivashankar, G. S. Dasika, K. Flautner, and D. Blaauw, "A power-efficient 32 bit ARM processor using timing-error detection and correction for transient-error tolerance and adaptation to PVT variation," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 18–31, Jan. 2011.
- [20] P. N. Whatmough, S. Das, and D. M. Bull, "A low-power 1 GHz razor FIR accelerator with time-borrow tracking pipeline and approximate error correction in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC)*, Feb. 2013, pp. 428–429.
- [21] J. Seo, P. Singh, D. Sylvester, and D. Blaauw, "Self-timed regenerators for high-speed and low-power interconnect," in *Proc. 8th Int. Symp. Quality Electron. Design (ISQED)*, Mar. 2007, pp. 621–626.

- [22] D. Schinkel, E. Mensink, E. A. M. Klumperink, E. van Tuijl, and B. Nauta, "A 3-Gb/s/ch transceiver for 10-mm uninterrupted RC-limited global on-chip interconnects," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 297–306, Jan. 2006.
- [23] S. Kim and M. Seok, "Reconfigurable regenerator-based interconnect design for ultra-dynamic-voltage-scaling systems," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, 2014, pp. 99–104. [Online]. Available: http://doi.acm.org/10.1145/2627369.2627632
- [24] J.-S. Seo, R. Ho, J. Lexau, M. Dayringer, D. Sylvester, and D. Blaauw, "High-bandwidth and low-energy on-chip signaling with adaptive pre-emphasis in 90 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC)*, Feb. 2010, pp. 182–183.
- [25] B. Kim and V. Stojanović, "A 4 Gb/s/ch 356 fJ/b 10 mm equalized on-chip interconnect with nonlinear charge-injecting transmit filter and transimpedance receiver in 90 nm CMOS," in *IEEE Int. Solid-State Circuits Conf.-Dig. Tech. Papers (ISSCC)*, Feb. 2009, pp. 66–67, 67a.
- [26] S. R. Sridhara, G. Balamurugan, and N. R. Shanbhag, "Joint equalization and coding for on-chip bus communication," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 3, pp. 314–318, Mar. 2008.
- [27] Z. Takhirov, B. Nazer, and A. Joshi, "Error mitigation in digital logic using a feedback equalization with Schmitt trigger (FEST) circuit," in *Proc. 13th Int. Symp. Quality Electron. Design (ISQED)*, Mar. 2012, pp. 312–319.
- [28] M. Zangeneh and A. Joshi, "Sub-threshold logic circuit design using feedback equalization," in *Proc. Design, Autom., Test Eur. Conf. Exhibit. (DATE)*, Mar. 2014, pp. 1–6.
- [29] J. M. Rabaey, A. Chandrakasan, and B. Nikolić, *Digital Integrated Circuits: A Design Perspective*. Upper Saddle River, NJ, USA: Pearson Education, 2003.
- [30] R. Sarpeshkar, Ultra Low Power Bioelectronics: Fundamentals, Biomedical Applications, and Bio-Inspired Systems. Cambridge, U.K.: Cambridge Univ. Press, 2010.
- [31] (2011). *Predictive Technology Model*. [Online]. Available: http://ptm.asu.edu/
- [32] J. W. Tschanz et al., "Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage," in *IEEE Int. Solid-State Circuits Conf.*, *Dig. Tech. Papers (ISSCC)*, vol. 1. Feb. 2002, pp. 422–478.



Mahmoud Zangeneh (S'08) received the B.S. degree in electrical engineering from the Amirkabir University of Technology, Tehran, Iran, in 2007, and the M.S. degree in electrical engineering from the University of Tehran, Tehran, in 2010. He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA.

His current research interests include low-power subthreshold design techniques, analog and

RF integrated circuits and systems design for wireless communications, and the design of nonvolatile resistive random-access memory array architectures.



Ajay Joshi (S'99–M'07) received the B.Eng. degree in computer engineering from the University of Mumbai, Mumbai, India, in 2001, and the M.S. and Ph.D. degrees from the Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA, in 2003 and 2006, respectively.

He was a Post-Doctoral Researcher with the Massachusetts Institute of Technology, Cambridge, MA, USA, from 2006 to 2009. He is currently an Assistant Professor with the Department of

Electrical and Computer Engineering, Boston University, Boston, MA, USA. His current research interests include various aspects of VLSI design, including circuits and architectures for communication and computation, and emerging device technologies, including silicon photonics and memristors.

Dr. Joshi was a recipient of the NSF CAREER Award in 2012.