# **Energy-Efficient Pass-Transistor-Logic Using Decision Feedback Equalization**

Zafar Takhirov, Bobak Nazer, and Ajay Joshi Electrical and Computer Engineering Department, Boston University, Boston, MA, USA {zafar, bobak, joshi}@bu.edu

*Abstract*—Decision feedback equalization (DFE) has been used to improve energy efficiency and/or reduce error rate in communication links. We propose a novel circuit technique which applies DFE techniques to pass transistor logic (PTL)-based computational circuits to mitigate errors, and reduce energy per computation or improve performance. We also present an optimization framework for designing low energy equalized PTL circuits that meet target performance and error rate specifications. On average, for the same operating frequency and error rate, the equalized PTL design consumes between 15% and 30% lower energy per operation than PTL and static complementary logic, respectively.

# I. INTRODUCTION

The performance of CMOS-based computing systems is being increasingly constrained by limited power budgets [1]. In addition, the aggressive scaling of CMOS devices has increased the probability of occurrence of faults/defects, which has further exacerbated the situation [2], [3]. For lowering power dissipation, several techniques including the supply voltage scaling, the use of sleep transistors, the use of passtransistor logic (PTL), the use of multiple threshold voltage  $(V_{th})$  devices and the dynamic scaling of  $V_{th}$  are being widely deployed. To mitigate/detect-and-correct the errors manifesting due to the faults/defects in the CMOS devices various techniques including redundant latches/paths, slack redistribution, and confidence-driven computation have been proposed.

We take inspiration from communication theory and propose a novel circuit design technique that uses equalization to lower energy consumption, improve performance and manage errors in digital logic circuits. Equalization techniques have been proposed to shape the signals that are transmitted through lossy on-chip communication links [4], [5], [6]. This signal shaping mitigates errors due to any "inter-symbol interference" (ISI), and creates opportunities for lowering the link energy consumption and/or improving the throughput. We propose to apply these equalization principles to shape the signals that are transmitted through digital logic blocks to mitigate errors (resulting from aggressive supply voltage scaling or over-clocking) and at the same time lower energy consumption and/or improve performance.

We propose a novel differential equalized pass-transistor logic (E-PTL) that dynamically adjusts the strength of the currents in its internal paths to ease the logic circuit output transitions, and in turn mitigates timing errors and creates opportunities for lowering power dissipation and/or improving performance. Our proposed E-PTL can be readily incorporated into the digital flow for designing both low-power custom ASIC and general-purpose processors. The main contributions of our paper are as follows:

- We propose a novel differential E-PTL circuit design technique that enables aggressive voltage scaling to lower energy consumption and/or enables aggressive overclocking to improve performance, while mitigating the occurrence of timing errors by dynamically adjusting the strength of the current in its internal paths.
- We present detailed circuit-level power, error and delay models for the E-PTL circuit. We present the formulation of a convex optimization problem using these models to determine the minimum energy design for a given performance and error constraint. We solve the optimization problem using the CVX toolbox [7], [8] and validate our model-based design against HSPICE simulations for an example 16-bit adder.
- We compare E-PTL, conventional PTL and static complementary CMOS logic (SCL)-based designs of four different arithmetic blocks. Our proposed technique reduces energy consumption by up to 30% on average while sustaining the circuit throughput and maintaining target error rates. We also evaluate the variability tolerance of our proposed design technique.

The rest of the paper is organized as follows. Section II focuses on related work and Section III provides a detailed description of the E-PTL circuit design technique. This is followed by a detailed explanation of the power, error and delay models for E-PTL circuits and the optimization framework to determine the minimum energy point for a given throughput and error rate constraint in Section IV. Section V provides a head-to-head comparison of the SCL design, PTL design and E-PTL design of large digital logic blocks.

# II. RELATED WORK

The traditional circuit-level low-power design techniques including scaling of supply voltage [9], the use of sleep transistors [10], the use of pass-transistor logic (PTL) [11], the use of multiple threshold voltage ( $V_{th}$ ) devices [12] and the dynamic scaling of  $V_{th}$  [13] have been widely deployed in today's semiconductor systems. To tackle (with minimal power/performance overhead) the errors manifesting due to ever-increasing unreliability of the CMOS devices, various circuit-level mechanisms including redundant latches/paths [14], [15], [16], [17], [18], slack redistribution [19], and confidence-driven computation [20] have been proposed.

Equalization is a well-established technique for decreasing the probability of error in communication links subject to ISI [21]. Recently, several groups have explored equalization for on-chip communication as part of the design of low energy transceiver circuits [4], [5], [6]. The key distinction between this body of work and our own is that we focus on equalization for computation, rather than communication.

We propose to explore the application of these equalization techniques for designing low energy digital logic circuits. The use of an equalizer and a Schmitt Trigger for error mitigation in digital logic circuits was proposed in [22]. However, this work focused on static complementary CMOS logic, and equalization was applied to only the last gate in the combinational logic path resulting in sub-optimal low power design. We target pass-transistor logic (PTL) as its equivalent RC model closely resembles the RC model of an on-chip interconnect. Hence, we can treat the PTL circuit as a communication channel and apply equalization techniques to reduce energy consumption and/or improve performance while mitigating timing errors.

#### III. EQUALIZED PASS-TRANSISTOR LOGIC (E-PTL)

We propose the Equalized Pass-Transistor-Logic (E-PTL) as a low-power alternative to conventional SCL. The choice of PTL for designing equalized digital logic circuits is driven by the fact that the equivalent resistance-capacitance (RC) model of a PTL design closely resembles the RC model of on-chip communication links, which makes it more amenable to the application of equalization techniques.

Figure 1 shows the circuit topology for our proposed E-PTL design technique. The circut consists of two stages -PTL network and sense-amplifier + DFE (SA-E). The nonequalized PTL design is same as E-PTL except that the SA does not have any DFE. The PTL circuit is based on DCVSL family, and has PMOS transistors with gates connected to logic 0 acting as pull-up transistors. The PTL network consists of two sub-networks, one each for the complemented and non-complemented implementation of the minimized sum-ofproducts (SOP) form of the logic function. The product term (AND) is implemented using pass transistors in series, while the sum operation (OR) is implemented by connecting the product implementations in parallel. The gate inputs of the NMOS devices in the PTL sub-networks are controlled by the outputs of the sense amplifiers in the previous pipeline stages. Both PTL and SA-E have their own dedicated supply voltages –  $V_{PTL}$  and  $V_{SA}$ . In the circuit shown in Figure 1, one sub-network of the PTL stage has been designed to perform  $A_n \oplus B_n \oplus C_n$  operation, while the other sub-network performs  $\overline{A_n \oplus B_n \oplus C_n}$  operation. These sub-networks complete their operation during the positive half of the clock cycle, and the outputs are fed to the differential input NC and C of the SA-E. In the negative half of every clock cycle, the DFE in E-PTL is used to dynamically adjust the strength of the current in each arm of the SA based on the data sampled in the previous clock cycle.

Figure 2 shows the timing waveforms of the non-equalized and equalized design of our sample circuit, where the expected output bit stream is 1110111. At 1.5 ns the  $V_C - V_{NC}$  value is negative, which corresponds to an expected output of logic 0. Note that the rise/fall times of the  $V_C - V_{NC}$  are very steep. The reason for this is that due to tight error rate constraints  $(BER \approx 0$  in the worst case scenario) the optimization process tends to converge to larger channel width transistors, thus in average cases making the rise/fall times steep. In the non-equalized case, the strength of current in Arm 2 (i.e.,  $I_0 + I_C$ ) is not sufficient to trip the cross-coupled inverters in a timely manner. Here  $I_0$  is the current through the two arms when  $V_C = V_{NC} = V_{SA}$ ,  $I_C$  is the current due to the voltages at nodes C. This lower current strength is due to the fact that  $V_C - V_{NC}$  has not completely reversed within the allocated half clock period. This partial reversal is due to the slow switching of the transistors in the PTL. This phenomenon leads to the latch maintaining its previous output of logic 1. In the equalized case, the logic 1 output from previous cycle switches ON transistor M2, which provides a boost to the current in arm 2  $(I_0 + I_C + I_{FB}^{Q'})$ , and trips the cross-coupled inverter at an earlier time than that in the non-equalized case (in spite of the transistors in the PTL switching equally slowly). The SR latch is then able to correctly sample the data. Here,  $I_{FB}^{Q'}$  is the current through transistor M2. The current boost provided in the SA through equalization provides opportunities for aggressive voltage scaling to lower energy consumption and/or over-clocking of the circuit to improve performance. It should be noted that the transistors M1 and M2 need to be sized carefully in order to avoid under- and over-equalization. Under-equalization can lead to a situation where the amount of feedback current is not sufficient to ensure correct operation. On the other hand, over-equalization can lead to larger than required boost to the current leading to incorrect tripping of the cross-coupled inverters. In addition, we also compared a single-ended E-PTL (i.e., sense amplifier receives the one input from PTL and the other input is a threshold voltage) with our differential E-PTL. Unlike the differential E-PTL, in the single-ended E-PTL approach different threshold voltages were required for different PTL networks, which led to a non-trivial overhead. On the other hand, using same threshold voltage for all PTL networks led to a sub-optimal design.

# IV. MODELING AND DESIGN AUTOMATION OF E-PTL

In this section, we present detailed models for the power dissipation, bit error rate, and performance of E-PTL logic. We also present an automated toolflow that uses these models to generate an energy-efficient design that meets error rate constraints.

## A. Power Modeling

Our E-PTL circuit consists of two stages: the PTL stage and the SA-E stage. The different components of power dissipation in the PTL stage can calculated as

$$P_{PTL}^{dynamic} = \left( V_{SA}^2 \sum_i C_{g,PTL}^i + V_{PTL}^2 \sum_i C_{d,PTL}^i \right) \cdot f \cdot \alpha \quad (1)$$



Fig. 1: Schematic diagram of a Equalized Pass-Transistor Logic (E-PTL) for  $Sum_n = A_n \oplus B_n \oplus C_n$ , where  $C_n = A_{n-1} \cdot B_{n-1} + A_{n-1} \cdot C_{n-1} + B_{n-1} \cdot C_{n-1}$ .



Fig. 2: Timing diagram of a non-equalized (left) and equalized (right) system. The highlighted waveforms in row 4 show the boosted current in equalized system compared to the non-equalized system.  $f_{\text{form}}(u|x)$ 

$$P_{PTL}^{static} = \frac{V_{PTL}^2}{\sum_i (R_{on}^i + R)} \left\| \sum_i (R_{off}^i + R) \right\|_i^2}$$
(2)  
$$P_{PTL}^{leak} = V_{PTL} \mu_0 C_{ox} \sum_i \frac{W_i}{L_i} (n-1) V_T^2 \cdot e^{\frac{-V_{th}}{nV_T}}$$
(3)

where  $V_{SA}$  is the sense amplifier supply voltage,  $V_{PTL}$  is the PTL supply voltage,  $V_{th}$  is the threshold voltage of the transistors,  $C_g^i$  is the gate capacitance of the transistor,  $C_d^i$  is the diffusion capacitance of the transistor,  $R_{on}^i$  is the resistance of the transistor in saturation,  $R_{off}^i$  is the resistance of the transistor in cutoff, R is the resistance of the pull up transistors M3 and M4 (see Figure 1), f is the operating frequency,  $\alpha$ is the activity factor,  $\mu_0$  is the carrier mobility,  $C_{ox}$  is the oxide capacitance of a transistor, n is a technology dependent parameter,  $V_T$  is the thermal voltage, and  $W_i$  and  $L_i$  are the width and length of the transistor i.

The power consumed in SA-E stage can be calculated as

$$P_{SA} = \alpha \cdot 2V_{PTL}^2 C_{g,in}^{SA} \cdot f \ . + V_{SA}I_{min}\beta \ + P_{latch}.$$
 (4)

The first term corresponds to the dynamic power consumed in charging/discharging the gate capacitance of the PMOS transistors that receive C or NC as inputs. The second component is the static power consumed in the sense-amplifier. Here,  $I_{min}$  is the total current passing through the sense-amplifier when all transistors are minimum sized and  $\beta$  is the scaling factor corresponding to the sizing up of all the transistors (by the same scale) to scale the current.  $P_{latch}$  corresponds to the (dynamic and static) power consumed by the latch. We have



Fig. 3: Conditional probability density function for the noisy PTL output y = x + z where x is either a or -a and z is zero-mean Gaussian noise.

ignored the power consumed in the wires and clock as we expect the equalized PTL and non-equalized PTL designs to have minimal difference in these two power components.

#### B. Error Rate Modeling

Timing errors in a circuit are caused by inter-symbol interference (ISI) due to variations in circuit RC delay. The change in the delay of a circuit can be due to voltage scaling, process-voltage-temperature variations, negative bias temperature instability, cosmic radiation, noise, etc, which change the RC properties of a system, and thus affect the transition time of the various nodes of the circuit. Below, we model the probability of error, first in the absence of ISI and then in its presence.

1) Noise Model: Our model focuses on the impact of noise at the SA stage, where the differential output of the PTL stage is thresholded into a logical output and latched. For now, assume that the observation at the input of the SA stage, sampled once per clock period, can be written as  $y_i = x_i + z_i$ where  $x_i$  is the output of the PTL stage and  $z_i$  is noise. For a logical 1, the differential PTL output is  $x_i = a$  and, for a



Fig. 4: Model of our proposed E-PTL. R and C represent equivalent parasitics in the PTL. Thresholding and latching is performed using the sense-amplifier.

logical 0, it is  $x_i = -a$  for some positive value a. The noise  $z_i$  is assumed to be independent of  $x_i$  and Gaussian with mean zero and variance  $\sigma^2$  (see Figure 3).

The SA stage simply thresholds its observation: it latches a logical 1 if  $y_i \ge 0$  and a 0 if  $y_i < 0$ . Clearly, the probability of error will decrease if the strength a of the differential PTL output increases. The probability of making an incorrect decision is given by the probability that the noise pushes the PTL output across the decision threshold  $p_{\text{error, no ISI}} = Q(a/\sigma)$  where  $Q(v) \triangleq \int_v^\infty \frac{1}{\sqrt{2\pi}} \exp(-\frac{u^2}{2}) du$ . 2) *ISI Model:* The error model above ignores the possibility

2) ISI Model: The error model above ignores the possibility of ISI. That is, it assumes that the previous PTL differential output has been completely dissipated when the SA thresholds the current output. However, in our considerations, we assume that the supply voltage is scaled to the point that ISI is a significant factor. To quantify the effect of ISI, consider a simple low-pass RC filter (see Figure 4). The voltage across the resistor R is defined by  $i(t)R = V_{in}(t) - V_{out}(t)$ , where  $i(t) = \frac{dQ_c}{dt}$  is the current flowing through the resistor, and  $Q_c = CV_{out}(t)$  is the charge at the capacitor. Rearranging the equations, we get  $dV_{ex}(t)$ 

$$V_{in}(t) - V_{out}(t) = RC \frac{dV_{out}(t)}{dt}$$
(5)

In discrete time (with a sampling period of  $\Delta t$ ), this becomes

$$V_{in,i} - V_{out,i} = RC \frac{V_{out,i} - V_{out,i-1}}{\Delta t}.$$
 (6)

Equivalently,

$$t_{i,i} = (1-\omega)V_{in,i} + \omega V_{out,i-1}$$
(7)

$$\omega = \frac{RC}{RC + \Delta t} . \tag{8}$$

As the RC-delay increases (or the clock period decreases),  $\omega$  approaches 1, and the previous PTL output starts affecting the current output. However, if  $\Delta t \gg RC$ , the effect of the ISI approaches zero. Notice that the input-output relationship in (7) has an infinite impulse response. For our purposes, we can safely assume that RC and  $\Delta t$  are such that all but the first-order ISI term have a negligible effect,

$$V_{out,i} \approx (1-\omega)V_{in,i} + \omega V_{in,i-1} .$$
(9)

3) Probability of Error: Combining our ISI and noise models, we arrive at the following model of the input to the SA stage

$$y_i = (1 - \omega)x_i + \omega x_{i-1} + z_i , \qquad (10)$$

where  $x_i \in \{-a, a\}$  is the differential output of the PTL stage at clock period *i* and  $z_i$  is independent zero-mean Gaussian noise with variance  $\sigma^2$ . The SA thresholds its observation  $y_i$ and latches it. If we make no attempt to mitigate the ISI, it can significantly increase the probability of error. In the worst case, the current and previous PTL outputs have opposite signs. For instance, if  $x_i = a$  and  $x_{i-1} = -a$ , then the SA observation will be

$$y_i = (1 - 2\omega)a + z_i.$$
(11)

Thus, probability of error with ISI can be upper bounded as

$$p_{\text{error, ISI}} \le Q\left(\frac{(1-2\omega)a}{\sigma}\right)$$
 (12)

The differential PTL output a is a function of

$$= |I_C - I_{NC}| = \mu_0 C_{ox} \frac{W}{L} (n-1) V_T^2 \left( e^{\frac{V_{GS}^+ - V_{th}}{nV_T}} - e^{\frac{V_{GS}^- - V_{th}}{nV_T}} \right)$$

where,

a

$$V_{GS}^{+} = V_{PTL} \cdot \frac{\sum_{i} R_{on}^{i}}{\sum_{i} R_{on}^{i} + R}, \quad V_{GS}^{-} = V_{PTL} \cdot \frac{\sum_{i} R_{off}^{i}}{\sum_{i} R_{off}^{i} + R}$$
(13)

Here the values for  $\{R_{on}, R_{off}, R\}$  are dependent on the corresponding widths and lengths of the transistors in the PTL.

4) Impact of Decision Feedback Equalization: To counter the effects of ISI, our E-PTL architecture employs decision feedback equalization (DFE) prior to the SA threshold. Specifically, the circuit uses its estimate  $\hat{x}_{i-1}$  of the previous PTL output  $x_{i-1}$  to remove the ISI from the SA observation. This results in the following new observation at the SA

$$\tilde{y}_i = (1 - \omega)x_i + \omega x_{i-1} + z_i - \omega \hat{x}_{i-1} .$$
(14)

If  $\hat{x}_{i-1} = x_{i-1}$  (meaning the DFE prediction was correct), then the SA will observe the current PTL output free of ISI, which significantly decreases the likelihood of an error. However, since  $\hat{x}_{i-1}$  is the result of thresholding the previous noisy observation  $\tilde{y}_{i-1}$ , it is definitely possible that it is in error. If this is the case, the likelihood of an error will increase as the signal strength will be further diminished by DFE. The key point is that *on average* the error probability will decrease. This is well-established in the communication theory literature [21] and this analysis can be carried out for our system model as well. However, owing to the non-linearity of the thresholding step used to produce  $\hat{x}_{i-1}$ , it is not possible to write down the error probability in closed form, although it can be accurately characterized using numerical methods.

# C. Delay Modeling

τ

The overall delay of the E-PTL circuit can be written as the sum of the PTL delay  $\tau_{PTL}$  and the SA delay  $\tau_{SA}$ . The PTL can be modeled as a simple RC network and its delay can be calculated using the Elmore delay technique. The SA delay consists of the delay from the falling edge of the clock until the input of one of the inverters increases above  $V_{th}$  as well as the setup time of the latch. It can be written as

$$T_{SA} = \frac{V_{th}^{P}C}{I_0 + I_{C/NC} + I_{FB}^{Q'/Q}} + t_{RS}$$
 (15)

where,  $I_{C/NC}$  is the current contributed by the input,  $I_{FB}^{Q'/Q}$  current contributed by the feedback, and  $I_0$  is the default current offset in the modified Strongarm SA. The switch time of the RS latch is  $t_{RS}$  and  $I_{C/NC} + I_{FB}^{Q'/Q}$  defines how fast the cross-coupled inverters switch. Note that whichever inverter reaches  $V_{th}^P$  at its source terminal first will dominate the cycle.



Fig. 5: Model vs. Simulation for 16-bit CLA. (a) The PTL voltage is fixed at its optimal value, while the sizing of the transistors is free for optimization. (b) The sizing of the transistors is fixed at its optimal value, while the PTL voltage is free for optimization. (c) The SA voltage is fixed at its optimal value, while the sizing of the transistors is free for optimization.

# D. Optimization Toolflow

The optimization toolflow consists of several steps. The first step is converting a combinational function into a minimized sum-of-products (SOP) form. We used the Quine-McCluskey algorithm to generate minimized expressions. The second step is the formulation of the min  $\|\mathbf{Ax} - \mathbf{b}\|_1$  problem where the matrix **A** represents equations governing the energy dissipation, critical delays, error rates, etc. of the circuit, vector **b** represents the design goals and circuit constraints and vector **x** has the free parameters. We used the CVX optimization toolbox [7], [8] that solves min  $\|\mathbf{Ax} - \mathbf{b}\|_1$  and returns the optimal transistor parameters **x**. The results of the CVX optimization toolbox as well as the previously generated minimized SOP can be fed into the subcircuit netlister which generates a SPICE netlist for further verification.

# E. Modeling vs. Simulation

To validate our modeling approach, we designed a 16-bit carry-lookahead adder (CLA) via the optimization approach above and compared the energy usage predicted by our model to that obtained from HSPICE simulations for a 22nm PTM [23] technology model. Figure 5a shows the modelbased optimization and simulation results for the 16-bit CLA when the  $V_{PTL}$  is kept fixed at the optimal value, and we sweep the transistor sizes for each value of  $V_{SA}$  to determine the minimum energy per operation. This figure shows that  $V_{SA}$  obtained from our model-based optimization matches that obtained by exhaustively searching the parameter space. Similarly in Figure 5b, we held the transistor sizing fixed and swept the  $V_{PTL}$  for each value of  $V_{SA}$  to determine the minimum energy point. Finally, for Figure 5c, we kept  $V_{SA}$  fixed at the optimal value and swept transistor sizing for each value of  $V_{PTL}$  to determine the minimum energy point. Overall, the design parameters obtained using our model-based optimization approach closely match the design parameters obtained through sweeping the design space via simulation.

#### V. EVALUATION

In this section, we first compare the E-PTL design with the non-equalized PTL design for different target frequencies and error rates using a 16-bit adder example. Then we compare



Fig. 6: Energy vs Operating frequency for a 16-bit carrylookahead adder. All design points have 1% word error rate.



Fig. 7: Energy dissipation vs word error rate for a 16-bit carrylookahead adder. All design points can operate at 2 GHz.

E-PTL, PTL and SCL designs a 16-bit CLA circuit, an 8bit multiplier circuit, an 8-bit 3-tap FIR filter circuit, and a 64-bit CRC circuit. Figure 6 shows the energy versus target operating frequency plot for both PTL and E-PTL design approaches for a 16-bit CLA. For each target frequency, we determined the PTL and E-PTL designs that consumed the least amount of energy per operation and had 1% word error rate. These designs were generated using the automated tool flow described in Section IV. On average, the E-PTL design consumes 20% less energy than the PTL design. This lower energy consumption is due to the fact that more aggressive voltage scaling is possible in E-PTL. Figure 7 shows a plot of energy per operation versus word error rate for the PTL and E-PTL designs for a 16-bit CLA operating at 2 GHz frequency. As the target word error rate increases, the two plots diverge because the E-PTL design has more flexibility in sizing the transistors in the PTL stage. On average, the E-PTL design consumes 45% less energy than the PTL design over a target word error rate range of 0% to 2%.



Fig. 8: Monte-Carlo simulation results for delay and energy a 16-bit CLA. PTL and E-PTL 1 are the designed to operate at a fixed energy budget. E-PTL 2 is designed to have  $\sim 0\%$ failure rate in presence of variations at 2 GHz.

Table I shows a comparison of the energy per operation for SCL, PTL and E-PTL designs of a 16-bit CLA circuit, an 8-bit multiplier circuit, an 8-bit 3-tap FIR filter circuit, and a 64-bit CRC circuit. The adder and multiplier circuits were designed for 2 GHz, while the FIR filter and CRC circuits were designed for 500 MHz. All circuits were designed to have zero error rate, and the optimization problem objective was set to minimize energy-per-bit figure of merit. Compared to the SCL designs, the E-PTL designs have 30% lower energy per operation on average due to lower supply voltage in the computational part of the circuit ( $V_{PTL} \ll$  nominal voltage). Similarly, compared to the PTL designs, the E-PTL designs have 15% lower energy per operation on average. The lower energy per operation in E-PTL is mainly due to the fact that the equalization technique enables more aggressive scaling of the supply voltage.

We have also compared the robustness of a 16-bit CLA designed using PTL and E-PTL. A Monte Carlo simulation was performed with 22 nm PTM models and a  $\pm 10\%$  variation in supply voltage, channel length and temperature. Figure 8 shows the variation in delay and energy. The PTL design was optimized to have a delay of 500 psec. We considered two different E-PTL designs. E-PTL 1 design was optimized to have the same energy consumption (29.6 fJ/op) as the optimized PTL design. The mean delay for E-PTL 1 design was  $\approx 365$  ps. The E-PTL 2 design was optimized such that its worst-case delay (under variations) was less than 500 psec. The energy consumption in the E-PTL design was approximately 27.5 fJ/op. Thus, the E-PTL 2 design creates a win-win situation which can tolerate variations in delay (i.e., it meets target performance) and simultaneously provides 7% lower energy consumption than PTL design.

#### VI. CONCLUSION

We proposed an equalized pass-transistor logic (E-PTL) design technique for digital CMOS logic. The equalizer in the proposed technique mitigates timing errors occurring due to ISI and noise, and in turn creates opportunities for reducing power and/or improving performance. We have presented detailed circuit-level models for the power, error-rate, and delay of an E-PTL circuit. Using these models, we use an optimization approach to determine the most energy-efficient design point for a target operating frequency and error rate. Using a 16-bit CLA as a test case, the energy-efficient design

TABLE I: Comparison of the minimum energy in SCL, PTL and E-PTL designs of various digital logic blocks. Word Error Rate is set to 0.

| Goal: f | Digital block    | SCL          | PTL          | E-PTL        |
|---------|------------------|--------------|--------------|--------------|
| 2 GHz   | 16-bit CLA       | 45.1 fJ/op   | 29.6 fJ/op   | 21.1 fJ/op   |
| 2 GHz   | 8-bit Multiplier | 285.1 fJ/op  | 219.6 fJ/op  | 204.3 fJ/op  |
| 500 MHz | 8-bit 3-tap FIR  | 1750 fJ/op   | 1590 fJ/op   | 1360 fJ/op   |
| 500 MHz | 64-bit CRC       | 259.2 fJ/bit | 237.1 fJ/bit | 217.0 fJ/bit |

generated using our optimization framework was validated against SPICE simulations. As a case study, we compared the SCL, PTL and E-PTL designs of 16-bit CLA, 8-bit multiplier, 8-bit 3-tap FIR filter, and 64-bit CRC circuits. On average, for the same operating frequency and error rate, the E-PTL design consumed between 15% and 30% lower energy per operation than PTL and SCL, respectively.

#### REFERENCES

- S. Fuller and L. Millett, "Computing performance: Game over or next level?" *Computer*, vol. 44, no. 1, pp. 31–38, 2011.
- [2] S. Borkar, T. Karnik, and V. De, "Design and reliability challenges in nanometer technologies," in *Proc. DAC*, 2004, pp. 75–75.
- [3] G. Gielen *et al.*, "Emerging yield and reliability challenges in nanometer cmos technologies," in *Proc. DATE*, 2008, pp. 1322–1327.
- [4] B. Kim et al., "An energy-efficient equalized transceiver for rc-dominant channels," *IEEE JSSC*, vol. 45, no. 6, pp. 1186–1197, 2010.
- [5] E. Mensink et al., "A 0.28pj/b 2gb/s/ch transceiver in 90nm cmos for 10mm on-chip interconnects," in Proc. ISSCC, 2007, pp. 414–612.
- [6] D. Schinkel *et al.*, "A 3-gb/s/ch transceiver for 10-mm uninterrupted rc-limited global on-chip interconnects," *IEEE JSSC*, vol. 41, no. 1, pp. 297–306, 2006.
- [7] CVX Research, Inc., "CVX: Matlab software for disciplined convex programming, version 2.0 beta," http://cvxr.com/cvx, Sep. 2012.
- [8] M. Grant and S. Boyd, "Graph implementations for nonsmooth convex programs," in *Recent Advances in Learning and Control*. Springer-Verlag Limited, 2008, pp. 95–110.
- [9] S. Jain et al., "A 280mV-to-1.2V wide-operating-range IA-32 processor in 32nm CMOS," in Proc. ISSCC, San Francisco, CA, Feb. 2012.
- [10] S. R. Vangal *et al.*, "An 80-tile sub-100-W teraFLOPS processor in 65-nm CMOS," *IEEE JSSC*, vol. 43, no. 1, pp. 29 – 41, Jan. 2008.
- [11] L. P. Alarcón et al., "Exploring very low-energy logic: A case study," Journal of Low Power Electronics, vol. 3, pp. 223–233, 2007.
- [12] Jotwani *et al.*, "An x86-64 core in 32 nm SOI CMOS," *IEEE JSSC*, vol. 46, no. 1, pp. 162 – 172, Jan. 2011.
- [13] B. Nezamfar, E. Alon, and M. Horowitz, "Energy-performance tunable logic," *IEEE JSSC*, vol. 44, no. 9, pp. 2554 – 2567, Sep. 2009.
- [14] D. Ernst *et al.*, "Razor: Circuit-level correction of timing errors for lowpower operation," *Micro, IEEE*, vol. 24, pp. 10–20, Nov.-Dec. 2004.
- [15] S. Das et al., "A self-tuning DVS processor using delay-error detection and correction," *IEEE JSSC*, vol. 41, no. 4, pp. 792 – 804, Apr. 2006.
- [16] —, "RazorII: In situ error detection and correction for PVT and SER tolerance," *IEEE JSSC*, vol. 44, no. 1, pp. 32 – 48, Jan. 2009.
- [17] H. Naeimi and A. DeHon, "Fault-tolerant sub-lithographic design with rollback recovery," *Nanotechnology*, vol. 19, no. 11, p. 115708, 2008.
- [18] K. Bowman *et al.*, "Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance," *IEEE JSSC*, vol. 44, no. 1, pp. 49 – 63, Jan. 2009.
- [19] A. B. Kahng *et al.*, "Slack redistribution for graceful degradation under voltage overscaling," in *Proc. ASP-DAC*, Taipei, Taiwan, Jan. 2010.
  [20] C.-H. Chen *et al.*, "A confidence-driven model for error-resilient com-
- [20] C.-H. Chen *et al.*, "A confidence-driven model for error-resilient computing," in *Proc. DATE*, Grenoble, France, Mar. 2011.
- [21] C. A. Belfiore and J. H. Park Jr., "Decision feedback equalization," *Proceedings of the IEEE*, vol. 67, no. 8, pp. 1143–1156, Aug. 1979.
- [22] Z. Takhirov *et al.*, "Error mitigation in digital logic using a feedback equalization with Schmitt trigger (FEST) circuit," in *Proc. ISQED*, Santa Clara, CA, Mar. 2012.
- [23] ASU. (2011) Predictive technology model. [Online]. Available: http://ptm.asu.edu/