Abstract: A power-gating scheme was presented to support multiple power-off modes and reduce the leakage power during short periods of inactivity. However, this scheme can suffer from high sensitivity to process variations, which impedes manufacturability. Recently, a new power-gating technique that is tolerant to process variations and scalable to more than two intermediate power-off modes. However this scheme can suffer from Increase in the lower threshold voltage, devices leads increased sub threshold leakage and hence more standby power consumption. We propose body biasing technique used to reduce the power. The proposed design requires less design effort and offers greater power reduction and smaller area cost than the previous method. In addition, it can be combined with existing techniques to offer further static power reduction benefits. Analysis and extensive simulation results demonstrate the effectiveness of the proposed design.

Keywords: Leakage Power, Multi-Mode VTCMOS Switches, Power Consumption Reduction, Process Variation, Reconfigurable Power-Gating Structure.

I. INTRODUCTION

As Chip density increases relentlessly along Moore’s law, power consumption is emerging as a major burden for contemporary systems [1]. Dynamic energy is proportional to the square of the supply voltage. Thus, a lower voltage level yields a quadratic reduction in the energy consumption. To further reduce the dynamic power, systems-on-chip (SoCs) are partitioned into voltage islands with separate supply rail and unique power characteristics [2]–[4]. Moreover, as devices keep shrinking, the channel length shortens and the gate oxide thickness reduces, increasing the gate-induced drain leakage, the gate oxide tunneling current, and the junction leakage [5]. Many techniques have been presented in the literature for reducing static power. One common approach is to synthesise the circuit using dual-Vₜ libraries [6]. High-Vₜ cells reduce the leakage current at the expense of reduced performance; thus their use on noncritical circuit domains reduces the leakage power considerably without affecting circuit performance. Another technique exploits the fact that the leakage power consumed by each gate strongly depends on the input vector applied at the gate. Therefore, in order to reduce static power, it controls the input vector and the internal state of the circuit during periods of inactivity. Various techniques reduce peak rush current.

A special class of these techniques reduces the large current rush by using one intermediate power off mode, while the methods presented in and apply a three step wake-up process. Intermediate power-off modes overcome another limitation of power switches, i.e., the time required for recovering from the idle mode, referred to as the wake-up time. Long wake-up time prohibits the use of power switches during short periods of inactivity. In addition; there are applications that can exploit static power savings in parts of the system provided that these parts can wake up fast upon request. The long wake-up time of power switches prohibits their use in such cases too. In particular, this technique requires that the memory elements (flip-flops) are forced to predictable states after the system provides that these parts can wake up fast upon request. The long wake-up time of power switches prohibits their use in such cases too. In particular, this technique requires that the memory elements (flip-flops) are forced to predictable states after the system.

Finally, dedicated design automation tools, which are not commonly available, are needed to support this design style. Increased overhead is also imposed by the method proposed, which requires additional power rails and extra bypass switches. The method proposed requires the intelligent placement of keepers on selected circuit lines. Besides the additional overhead, the keepers cannot be easily placed in non regular structures. The authors proposed a structure with intermediate power-off mode, which reduces the wake-up time at the expense of reduced leakage current suppression. Similar structures were proposed. The authors extended this
tradeoff between wake-up overhead and leakage power savings into multiple power-off modes. Using these techniques, instead of consuming power by remaining in the active mode during the short periods of inactivity, the circuit is put into an appropriate power-off mode (i.e., low-power state), which is determined by both the wake-up time and the length of the idle period. The longer the period of inactivity, the higher are the power savings achieved by using the most aggressive power-off mode that can be tolerated.

Even though the architecture proposed is efficient for reducing leakage power during short periods of inactivity, it has several drawbacks that limit its applicability. First, it cannot be easily extended to support more than two intermediate power-off modes and thus it cannot fully exploit the power reduction potential of the power-gating structure, especially for high-performance circuits. Second, the architecture consumes a significant amount of power, and this reduces the benefits offered by the power switches.

![Fig.1. Multi-Mode power gating Architectures: a) Snore mode b) Dream mode c) Sleep mode.](image)

Third, this structure is very sensitive to process variations, which can adversely affect its manufacturability and predictability. Finally, it is not easily testable, as it consists of analog components. In this paper we present an effective body biasing architecture that has none of the above drawbacks of the architecture proposed. The proposed structure requires minimal design effort since it is very simple, and with no analog components. It is considerably smaller than the architecture proposed and offers greater power savings for similar wake-up times.

The proposed architecture is also more tolerant to process variations; thus its operation is more predictable. Finally, a reconfigurable version of the proposed architecture is also proposed, which can tolerate even greater process variations, enabling the utilization of the proposed architecture for newer technologies. The organization of the rest of this paper is as follows: Section II presents background material to place the proposed work in an appropriate context. Section III introduces the proposed body biasing architecture, the design method, and the reconfigurable architecture. Section IV presents an evaluation of the proposed architecture, including comparisons with previous work. Finally, Section V concludes this paper.

### II. BACKGROUND

Fig. 1 presents Multi-mode power gating Architectures. It consists of the main power switch transistor $M_0$ and two small transistors $M_0$ and $M_1$, each corresponding to an intermediate power-off mode ($M_0$ corresponds to the dream mode and $M_1$ corresponds to the sleep mode). Transistor MP is a high-Vt transistor and it remains on only during the active mode. Transistors $M_0$ and $M_1$ are small low-Vt transistors that are turned on only during the corresponding power-off mode. (i.e., $M_0$ is turned on during the dream mode and $M_1$ is turned on during the sleep mode). In proposed system, VTCMOS technique threshold voltage of low threshold devices is varied by applying variable substrate bias voltage from a control circuit

- Increase in the lower threshold voltage, devices leads increased sub threshold leakage and hence more standby power consumption.
- To reduce static power reduction is to use low supply voltage and low threshold voltage without losing speed performance.
- It provides power in reduction only 10%. Try. It has major advantages.

### III. BODY BIASING TECHNIQUE

#### A. Proposed Architecture

Fig 2 presents the proposed design. It consists of the main power switch transistor $M_0$ and two small transistors $M_0$ and $M_1$, each corresponding to an intermediate power-off mode ($M_0$ corresponds to the dream mode and $M_1$ corresponds to the sleep mode). Transistor MP is a high-Vt transistor and it remains on only during the active mode. Transistors $M_0$ and $M_1$ are small low-Vt transistors that are turned on only during the corresponding power-off mode. (i.e., $M_0$ is turned on during the dream mode and $M_1$ is turned on during the sleep mode). The various modes of operation are as follows.

**Active Mode:** Transistors $M_P$, $M_0$, $M_1$ are on.

**Snore Mode:** Transistors $M_P$, $M_0$, and $M_1$ are off as shown in Fig 2(a). In this case, the leakage current of the core, $I_{\text{core}}$, is equal to the aggregate leakage current flowing through transistors $M_0$, $M_1$, $M_P$ ($I_{\text{core}} = I_{L_0} + I_{L_1} + I_{L_P}$), which is very small (note that $M_0$, $M_1$ are small transistors and $M_P$ is a high- Vt transistor). Thus the voltage level at $V_{L_{\text{GND}}}$ is close to Vdd and the circuit consumes a negligible amount of energy, but the wake-up time is high.

**Dream Mode:** Transistor $M_0$ is on and transistors $M_P$ and $M_1$ are off as shown in Fig 2(b). In this case the current flowing through transistor $M_0$ (and thus the aggregate current flowing through $M_0$, $M_1$ and $M_P$) increases because $M_0$ is on ($I_{L_0} < I_{L_P}$). The exact value of $I_{L_0}$ depends on the size of transistor $M_0$, and it sets the virtual ground node at a voltage level which is lower than $V_{dd}$ (i.e., $V_{L_{\text{GND}}} < V_{dd}$). Thus the Static
power consumed by the core is higher compared to the snore mode, but the wake-up time is less.

**Sleep Mode:** Transistor \( M_1 \) is on, and \( M_P \), \( M_0 \) are off as shown in Fig. 2(c). Provided that transistor \( M_1 \) has larger aspect ratio than \( M_0 \) (\( W_{M_1}/L_{M_1} > W_{M_0}/L_{M_0} \)), the agree \( M_P \) increases even more when \( M_1 \) is on (note that \( I_{M_1} > I_{M_0} \)). Consequently, the voltage level at the virtual ground node is further reduced compared to the dream mode and thus the wake-up time decreases at the expense of increased power consumption at current flowing through \( M_0 \), \( M_1 \).

![Fig.2. Proposed architecture: (a) Snore mode (b) Dream mode (c) Sleep mode.](image)

**B. Design Method**

Body biasing has been demonstrated to be effective in addressing process variability in a variety of simple chip designs. However, for modern microprocessor ICs with multiple cores and dynamic voltage/frequency scaling (DVFS), the use of body biasing has significant implications. For a 16-core chip-multiprocessor implemented in a high-performance 22 nm technology, the body biases required to meet the frequency target at the lowest and highest voltage/frequency levels differ by an average of 0.7 V, implying that per-level biases are required to fully leverage body biasing. The need to make abrupt changes in the body biases when the voltage/frequency level changes affects the cost/benefit analysis of body biasing schemes. It is demonstrated that computing unique body biases for each voltage/frequency level at chip power-on offers the best tradeoff among a variety of methods in terms of area, performance and power. While continuously adjusting the body biases during operation offers improvements in energy/efficiency, these benefits were outweighed by the implementation costs.

The implementation costs of continuously adjusting the body biases are dominated by the settling time of the controller. Existing controllers designed for simple general-purpose microprocessors do not optimize for settling time, and require D/A converters with high time constants. We propose a fully-analog controller that is able to achieve significantly lower settling time for a fixed area and power than previous controllers. With the proposed controller, continuously computing the body biases offers a better tradeoff in terms of area, performance, and power than computing unique body biases for each voltage/frequency level at chip power-on. Further improvements in energy/efficiency can be achieved with an integrated approach to body biasing and DVFS. Because \( V_{DD} \) is scaling and body biasing has different effects on static versus dynamic power, the operating point yielding the lowest overall power is dependent on the percentage of total power due to leakage. Leakage power, in turn, is strongly influenced by process variations.

**C. Body Biasing**

Body biasing is another method of improving energy/efficiency, by reclaiming performance lost to margins due to variations. After fabrication, the threshold voltage \( (V_{TH}) \) of transistors can be modulated by changing the body-to-source voltage. In bulk MOSFETs, the \( V_{TH} \) is given by:

\[
V_{TH} = V_{TH0} + \gamma \left( \sqrt{2\Phi_F - V_{BS}} - \sqrt{2\Phi_F} \right)
\]

Where \( V_{TH0} \) is the device threshold voltage with no body bias applied, \( \Phi_F \) is the surface potential at strong inversion, and \( \gamma \) is the body effect coefficient. For simplicity, we examine this equation for the case of an NFET with the source tied to ground. If a negative voltage is applied to the body then the depletion width increases, which means that a higher gate voltage is required to form an inversion layer and thus the \( V_{TH} \) increases; this is known as a reverse body bias (RBB).

Similarly, if a positive voltage is applied to the body while the source is grounded, then the depletion width decreases, and thus the \( V_{TH} \) decreases; this is known as a forward body bias (FBB). Throughout this work, \( V_{BSS} \) and \( V_{BSP} \) will represent the body to source voltage of NFETs and PFETs, respectively. Negative values of these parameters will indicate RBB and a positive one FBB, regardless of which direction the body-to-source voltage must actually be shifted. There are several technology issues with body biasing in bulk MOS RBB increases short channel effects, which increases variability within devices sharing a bias. This is especially problematic in circuits that are sensitive to device matching, such as SRAMs. FBB improves short channel effects, but also increases junction leakage, potentially to the point where the source-to-bulk junction is forward biased. Additionally, an analog signal, the body bias, must be distributed a significant distance – in the extreme, across the entire die. This becomes increasingly problematic with scaling because cross-talk between wires worsens. Finally, the sensitivity of \( V_{TH} \) to the body bias decreases with scaling, because the channel doping increases. Body biasing is limited in the magnitude of the \( V_{TH} \) shift that can be induced.

The maximum forward-bias is limited by current flows across the P-N junction formed between the n-well and p-well. A thyristor-like device is formed in the substrate by the two bipolar transistors, as shown in Fig. 3. Finally, it was found that there was no latch-up effect in FETs. Body biasing is limited in the magnitude of the \( V_{TH} \) shift that can be induced. The maximum forward-bias is limited by current flows across the
P-N junction formed between the n-well and p-well. A thyristor-like device is formed in the substrate by the two bipolar transistors, as shown in Fig. 3. Oowaki et al. found that there was no latch-up effect with up to 0.5 V forward bias (assumed by Miyazaki et al., Tachibana et al., and Narendra et al.). The maximum reverse-bias is limited by high leakage and possible breakdown across the reverse biased drain body junction, particularly during burning. The sensitivity of threshold voltage to the body bias for NFETs and PFETs is shown in Figure for the 90 nm, 45 nm, and 22 nm predictive technologies. While the sensitivity of $V_{TH}$ to the body biases does decrease as technology scales, the decrease from 90 nm to 22 nm (4 technology generations) is only 12% for the NFET and 10% for the PFET.

![Fig. 3. Leakage path in forward body biasing.](image)

**IV. EVALUATION AND COMPARISONS**

In this section, we present simulation results and comparisons against other techniques presented in the literature.

**A. Results and Comparisons Using a Large Logic Core**

The target of the first subsection is to evaluate the proposed method when it is applied to large logic cores that are comparable in size to real designs from industry. To this end, we present simulation results on a large logic core consisting of 9 million transistors. This core consists of multiple inverters of various sizes which are driven by various input vectors. Even though it is not a real circuit, it is representative of a realistic industrial circuit in terms of static power consumption during dc operation in power-off mode. We used the 45-nm predictive technology with 1.1-V power supply. The leakage power consumption of the core in idle mode with no power gating is equal to 10 mW. All simulations were done using the Synopsis HSpice simulation engine. We note that, because of the use of a different core with respect as well as different experimental parameters (such as the technology, the voltage setting, and the input vector), we cannot directly compare the experimental results of our method with the results presented. Therefore, we implemented both the architecture [see Fig. 1(c)] and the proposed architecture (see Fig. 2) for the aforementioned logic core. As suggested, the width of the main power switches (Transistor denoted as $M_p$).

For both architectures, we assume that the voltage at the $V_{GND}$ node settles to the expected value before the waking up process begins. In addition, the core is considered as fully operational after the virtual ground node is discharged to the value of 1% of $V_{dd}$. First, we compare both architectures in terms of area overhead measured as transistor sizes. The width of transistors $M_0$ and $M_1$ have been selected in such a way as to provide the same voltage level at the virtual ground node with the scheme proposed at each power-off mode and for the same input vector. Thus, the logic core consumes the same amount of static power in both architectures at each power-off mode. For example, considering an input vector that drives the two-thirds of the transistors to logic “1” and the rest of the transistors to logic 0,” the voltage level at the $V_{GND}$ node is equal to 217.1, 415.8, 541.8, and 668.5 mV at four intermediate power-off modes.

Note that the schemes proposed support only one intermediate power-off mode, which is denoted as Dream for comparison purposes. Entries in Table II corresponding to the second intermediate power-off mode (i.e., the Sleep mode) which is not applicable for the schemes proposed, are

**TABLE I: Static Power Dissipation**

<table>
<thead>
<tr>
<th>Device</th>
<th>Mode of operation</th>
<th>Power dissipation in Watts</th>
<th>For variable threshold</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inv1</td>
<td>Normal inverter</td>
<td>6.83E-03</td>
<td>3.29E-03</td>
</tr>
<tr>
<td>Inv2</td>
<td>With W footer</td>
<td>7.83E-03</td>
<td>1.78E-03</td>
</tr>
<tr>
<td>Inv3</td>
<td>With transmission gate</td>
<td>8.26E-03</td>
<td>2.43E-03</td>
</tr>
<tr>
<td>Inv4</td>
<td>With bias network</td>
<td>1.91E-02</td>
<td>6.59E-03</td>
</tr>
<tr>
<td>Inv5</td>
<td>Sleep mode</td>
<td>2.50E-02</td>
<td>1.02E-02</td>
</tr>
<tr>
<td>Inv6</td>
<td>Dream mode</td>
<td>3.35E-01</td>
<td>2.67E-05</td>
</tr>
<tr>
<td>Inv7</td>
<td>Sleep mode</td>
<td>3.35E-01</td>
<td>2.82E-05</td>
</tr>
<tr>
<td>Inv8</td>
<td>Active mode</td>
<td>1.77E-04</td>
<td>5.15E-05</td>
</tr>
</tbody>
</table>

Was set equal to 12% of the total width of the n MOS transistors in the logic core. For the logic core that we used, the width/length ratio of transistor $M_0$ is calculated as equal to $43.2 \times 106$ nm/45 nm and it is implemented as the parallel connection of a number of smaller transistors. In order to provide fair comparison, the transistor sizes in both architectures were selected in such a way as (a) to be of minimum size required and (b) to provide similar wake-up times, in both architectures. In dream and sleep mode the power dissipation is same, as only one transistor is in on mode in the network apart from core logic. Moreover, in the proposed scheme, the sizes of transistors $M_0$ and $M_1$ have been selected in such a way as to provide the same voltage level at the virtual ground node with the scheme proposed at each power-off mode and for the same input vector. Thus, the logic core consumes the same amount of static power in both architectures at each power-off mode. For example, considering an input vector that drives the two-thirds of the transistors to logic “1” and the rest of the transistors to logic 0,” the voltage level at the $V_{GND}$ node is equal to 217.1, 415.8, 541.8, and 668.5 mV at four intermediate power-off modes.
Static Power Reduction using Variation Tolerant and Body Biased Multi-Mode Switches

The last column presents the results for the proposed method. It is obvious that the methods proposed fail to deliver a tradeoff between wake-up time and power consumption regardless of the kind of parker transistor (high-\(V_t\) or low-\(V_t\)) or the bias voltage. Even though multiple types of these transistors and/or bias voltages are used at the same core, with an obvious impact on area overhead, they still fail to deliver a sufficient range of wake-up times. The method proposed in offers low static power consumption at the expense of very large wake-up times and increased area overhead. More importantly, similar to the method proposed, the method proposed supports only a single intermediate power-off mode. In contrast to the proposed method offers more than one intermediate power-off mode with a wide range of wakeup times and, as will be presented later, the proposed method can easily provide even more than two intermediate power off modes—a target that is obviously unachievable for the other methods. Finally, the proposed method has the smallest used area overhead. Therefore, the proposed method better exploits the tradeoff between static power dissipation and wake-up time with much less area overhead than the rest of the methods.

V. EXPERIMENTAL RESULTS ANALYSIS

In figs.4 to 5 a given below a Snore Mode, Dream Mode, Sleep Mode, Input/output waveforms. When input is high, output is low. The wave forms can represented in x-axis time and in y-axis voltage.

VI. CONCLUSION

We described a Body biasing scheme that provides multiple power-off modes. The proposed design offered the advantage of simplicity and required minimum design effort. Extensive simulation results showed that, in contrast to a recent power-gating method, the proposed design is robust to process variations and it is scalable to more than two powers off modes. Moreover, it requires significantly less area and consumes much less power than the previous design. Finally, a reconfigurable version of this method can be used to increase the manufacturability and robustness of the proposed design in technologies with larger process variations.

VII. REFERENCES

