Understanding Voltage Variations in Chip Multiprocessors using a Distributed Power-Delivery Network

Meeta S. Gupta*, Jarod L. Oatley*, Russ Joseph†, Gu-Yeon Wei* and David M. Brooks*
*Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA
†Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL
{meeta,jloatley,wei,dbrooks}@eecs.harvard.edu
rjoseph@ece.northwestern.edu

Abstract—Recent efforts to address microprocessor power dissipation through aggressive supply voltage scaling and power management require that designers be increasingly cognizant of power supply variations. These variations, primarily due to fast changes in supply current, can be attributed to architectural gating events that reduce power dissipation. In order to study this problem, we propose a fine-grain, parameterizable model for power-delivery networks that allows system designers to study localized, on-chip supply fluctuations in high-performance microprocessors. Using this model, we analyze voltage variations in the context of next-generation chip-multiprocessor (CMP) architectures using both real applications and synthetic current traces. We find that the activity of distinct cores in CMPs present several new design challenges when considering power supply noise, and we describe potentially problematic activity sequences that are unique to CMP architectures.

I. INTRODUCTION

Supply-voltage fluctuations have emerged as a serious cause for concern in high-performance processor design. These perturbations occur when the processor demands rapidly change current consumption over a relatively small time scale. Since the power-delivery subsystem can have substantial parasitic inductance, this current variation produces voltage ripple on the chip’s supply lines. This is significant because if the supply voltage rises or drops below a specific tolerance range, the CPU may malfunction. This fundamental challenge is known as the $\frac{dI}{dt}$ problem since the magnitude of these voltage ripples is affected by the instantaneous change of current with respect to time. Current fluctuations are primarily derived from dynamic resource utilization fluctuations, which are heavily influenced by architectural power-saving events such as clock- and power-supply gating and idle/sleep modes. Thus, analysis at the architecture-level is critical to allow designers to understand the impact of these techniques on power-supply voltage stability under a variety of power-delivery and package-modeling assumptions.

Previous architecture-level $\frac{dI}{dt}$ studies ([1] and [2]) have used lumped models of the on-chip power-delivery network to capture the mid-frequency resonance. The major limitation of these architectural models is the global treatment of on-chip VDD/GND as single nodes, which fails to capture local on-die voltage variations across the chip. As the effects of supply variation play a more prominent role in performance and reliability, architects will have to pay closer attention to localized supply fluctuations due to package connections and the on-chip power-supply grid. In this paper, we describe an architecture-level, fine-grained, power-delivery model that captures localized voltage variations across the entire chip.

Current technology trends are moving towards chip multiprocessor (CMP) architectures like IBM’s Cell processor [3] and Intel’s Core Duo processor [4]. It is important to understand inter-core voltage variations for multiple cores on a CMP machine. Core utilization patterns and activity interactions between cores can lead to large inter-core voltage variations. In order to understand these inter-core variations, a fine-grained power-delivery network is needed to model these effects. Using a distributed power-delivery model of the on-chip power-supply grid, we explore the repercussions of different combinations of activity patterns.

The main contributions of our work are:

1) We provide a parameterizable, distributed, power-delivery model, which can be configured to closely match measured impedances found in the literature [5].
2) This paper investigates voltage variations across a CMP machine using both real and synthetic activity patterns.
3) We illustrate possible problematic activity sequences that are unique to CMP architectures.

The paper is organized as follows: Section II describes the modeling of a distributed power-delivery network. The different types of activities and their effects on voltage variations are studied in Section III. Section IV reviews prior research generally related to power delivery modeling. Finally, Section V concludes the paper.

II. MODELING THE POWER DELIVERY NETWORK

This section presents a detailed yet flexible power-delivery model that captures the characteristic mid-frequency resonance, transients related to board and package interfaces, and localized on-chip voltage variations.

Figure 1(a) presents our detailed model of the power-delivery network with a distributed on-chip power-supply grid. The off-chip network includes the motherboard, package, and off-chip decoupling capacitors and parasitic inductances, modeled via a ladder RLC network. Figure 1(b) illustrates the distributed on-chip grid model used in our analysis. The C4 bumps are modeled as parallel connections (via RL pairs) that connect the grid to the off-chip network, with each grid
point having a bump connection. The on-chip grid itself is modeled as an RL network. The evenly distributed on-chip capacitance between the VDD and GND grids is modeled in two ways — $C_{SPC}$ represents the decoupling capacitance placed in the free space between functional units and $C_{BLK}$ represents the intrinsic parasitic capacitance of the functional units. In contrast, an on-chip lumped model would consist of a single RLC network connected across the package-to-chip interface. Table I provides the values of the resistances, inductances, and capacitances used for the PCB, package and on the die, for the lumped and distributed power-delivery models. These values were chosen to match the measured off-chip impedance of the Pentium 4 processor [5], [6], Figure 2(a) plots the off-chip impedance for the lumped and distributed models, which closely match one another and are validated with respect to the available Pentium 4 measurements [5]. The slight difference in the on-chip impedance, shown in Figure 2(b), can be attributed to the slightly higher bump resistances in the lumped model, which are required to match off-chip impedances. It is important to note these parameters can easily be modified to model different architectures and power-delivery networks.

![Lumped Model](image1)

![On-die grid model](image2)

![Off-chip Impedance Plot](image3)

![On-Chip Impedance Plot](image4)

![On-Chip Impedance Plot](image5)

![On-die grid model](image6)

Figure 1. Power delivery model

Table I

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Unit</th>
<th>Parameter</th>
<th>Value</th>
<th>Unit</th>
</tr>
</thead>
<tbody>
<tr>
<td>$R_{s,PC}$</td>
<td>0.004 mohm</td>
<td></td>
<td>$L_{s,PC}$</td>
<td>21 picoH</td>
<td></td>
</tr>
<tr>
<td>$R_{s,PC}$</td>
<td>0.1065 mohm</td>
<td></td>
<td>$C_{s,PC}$</td>
<td>240 pC</td>
<td></td>
</tr>
<tr>
<td>$R_{s,PC}$</td>
<td>1 mohm</td>
<td>$L_{s,PC}$</td>
<td>120 picoH</td>
<td></td>
<td></td>
</tr>
<tr>
<td>$R_{s,PC}$</td>
<td>0.5415 mohm</td>
<td></td>
<td>$C_{s,PC}$</td>
<td>26 pC</td>
<td></td>
</tr>
<tr>
<td>$R_{s,PC}$</td>
<td>0.5 mohm</td>
<td>$L_{s,PC}$</td>
<td>0.5 pC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>$R_{s,PC}$</td>
<td>40 mohm</td>
<td>$L_{s,PC}$</td>
<td>72 pC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>$R_{s,PC}$</td>
<td>0.1 mohm</td>
<td>$L_{s,PC}$</td>
<td>3.3 pC</td>
<td></td>
<td></td>
</tr>
<tr>
<td>$C_{PC}$</td>
<td>5.6 nF</td>
<td>C_{PC}</td>
<td>1.5 nF</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table I: Parameters for the power delivery model.

Voltage regulator modules (VRM) typically have response frequencies in the sub-MHz range, which is much lower than the challenging higher frequencies associated with the entire power-delivery network. For simplicity, the power supply is modeled as a fixed voltage source, which is scaled with respect to the average current draw to deliver 1V at the bump nodes, mimicking the feedback loop associated with the VRM.

Our architectural simulation framework consists of a four-core setup, shown in Figure 3, with each core divided into five microarchitectural blocks: FPU (floating point unit), OOO (which combines the rename, regfile, resultbus and window units on a core), INT (Integer ALU), Fetch (which combines the lcache and branch predictor) and Data (representing the Data cache and Load-Store Queue). Each block’s power, derived from architectural simulations [7], is distributed evenly across the grid points according to their respective areas. To have a reasonably accurate model with low simulation overhead, we use a 12x12 grid, with each core having 36 grid points. A fast circuit solver, based on preconditioned Krylov subspace iterative methods [8], utilizes a SPICE netlist of the entire power-delivery network and per block current profiles to simulate on-die voltages.

The power consumption of a CMP typically varies per core due to variations in the application profiles for each core, as
these variations in the context of CMP workload scenarios.

To better understand variations both within and across different cores, it is necessary to have a distributed on-chip power delivery model. In the rest of the paper, we focus on the distributed model for the CMP processor.

III. ANALYSIS OF VOLTAGE VARIATIONS

Voltage variations within a CMP architecture are a strong function of different workloads and current profiles associated with each core. In this section, we classify the different kinds of load current profiles and understand their effects on voltage variations within each core and across the chip.

A. Classification of Activity Patterns

In order to facilitate a thorough analysis of using a distributed power-delivery network model in CMP architectures, we begin by classifying current consumption profiles based on a suite of SPEC benchmarks. Figure 5 illustrates snapshots of interesting current profiles for four of the SPEC benchmarks—equake, apsi, bzip, and mcf for a single core. The current for the SPEC benchmarks were measured using an architectural power model based on Wattch [7]. Based on the observed characteristics, we broadly classify current consumption profiles into three categories:

1) Step Currents: This type of current profile commonly occurs when a core suddenly changes state. For example, a sudden increase/decrease in activity after long stalls due to various events like cache misses/branch mispredicts. This can also occur when the firmware enables sleep/active transitions that power up/down cores.

2) Pulse Currents: These are sudden and short duration increase/decrease in activity of the core which can again be caused due to long stalls. Figures 5(a) and 5(b) shows two examples of isolated pulses, with varying pulse widths.

3) Resonating Currents: Periodic behavior is largely associated with recurring activity patterns generally attributed to loops in an application. In particular, a periodic sequence of current pulses occurring at or near the resonant frequency of the power-delivery network are of most interest. These resonating currents are shown in Figures 5(c) and 5(d), occurring for bzip and mcf, respectively.

Given the observed application profiles we can simplify the analysis by substituting in synthetic current profiles in order to interrogate the power-delivery network for a wide range of problematic scenarios. In this paper, we focus on the effects of step currents and sequences of pulse currents on the power-delivery network leading to voltage variations. Current pulses of long enough duration can be classified as step currents. The worst case analysis can be achieved by using two states for each core: Max-power and Min-power. A max power state refers to when the core is drawing maximum power, which corresponds to 10W/core in our simulations. The min power state refers to the core consuming minimum power from the system, which corresponds to 4W/core. In our remaining
Fig. 5. Snapshot of current consumption for equake, apsi, bzip and mcf for a single core

Fig. 6. Effect of powering on cores

analysis we model steps and pulses with these max/min power levels to mimic powering up/down cores or activities observed in the SPEC benchmarks.

B. Voltage Variations given Step Currents

Current steps can induce large voltage fluctuation around the nominal voltage. A drop in voltage is the more alarming scenario as this can cause timing violations. Figure 6 shows the voltage variation for a node on the chip when all four cores are powered on at the same time. Given that a step is comprised of signals across a wide range of frequencies, the initial drop in voltage and the subsequent ringing can be attributed to the high frequency resonance (100MHz) in the power-delivery network. The voltage dip that occurs at 500 cycles can be attributed to the low frequency resonance. The voltage eventually stabilizes to the nominal voltage of the system (1V). Figure 6 (inset) plots the minimum voltage with respect to the number of simultaneously engaged cores. As expected, the worst drop is observed when all the cores are switched on simultaneously.

To avoid this worst case condition, a staggering mechanism can be used to gradually ramp the current profile with assistance from the firmware. The inter-core delay for switching on the cores is called the stagger interval. Figure 7 (inset) illustrates one such staggering mechanism. The combined waveform reflects the overall current consumed by the chip. Figure 7 shows that increasing stagger intervals can reduce voltage fluctuations. As stagger intervals increase beyond three clock cycles, the worst case minimum voltage across the chip improves and eventually stabilizes as the stagger interval extends beyond ten clock cycles. At this point each core behaves independently and is equivalent to a single core switching on (Figure 6 (inset)).

C. Voltage Variations given Periodic Current Pulses

Resonating currents are periodic current pulses occurring with frequencies within the resonant band of the power-delivery network. Figure 8 plots the peak voltage swing observed across the chip when the current consumption of all four cores simultaneously switch between max and min power at different frequencies with 50% duty cycle. As anticipated by the impedance plot of the power-delivery network, worst case voltage swings occur in the vicinity of 100MHz. Previous studies [1], [9] for single core machines have highlighted the detrimental effects of resonating currents on supply voltage stability. In this section, we explore the effect of resonating currents in CMP machines.

Given resonating currents, the resulting voltage ripple initially grows and then settles to a periodic waveform around the
nominal voltage (as shown in Figure 8 (inset)). In steady state, small current pulses can induce large peak-to-peak swings, becoming the focus of our analysis. Resonance can be further classified into: Locally Resonant, where each core individually has periodic current pulses at the resonant frequency; and Globally Resonant, where the aggregate current, globally seen across the die, has or appears to have current pulses at the resonant frequency of the power-delivery network’s impedance. We further investigate the combination and interaction of these two types of resonating currents:

1) Locally and Globally Resonant: This is a scenario where each core has resonating current and the combined (or average) current pulses across all of the cores is also at the same resonant frequency. Figure 9 plots worst-case minimum and maximum voltages seen across the chip as the number of active cores increase. As expected, swings grow as the number of resonating cores increase due to the higher aggregate current amplitudes. The theoretical worst-case condition occurs when current pulses across all of the cores are aligned in phase.

2) Locally Non-Resonant but Globally Resonant : Locally, the cores are resonating but due to phase differences the combined view seen by the system is not a resonating wave. For conditions where the resonating currents across the four cores are phase-shifted with respect to one another, currents between the cores can interact to cancel out some of the effects of the locally resonating currents at the global scale. When 50% duty cycle current pulses are 90° out of phase, with one another, as shown in Figure 10(a), the currents combine to appear as constant current with fixed amplitude at the global scale. It is important to note that due to the distributed power-supply grid model with non-zero impedance between cores, localized fluctuations exist, but interaction between the cores would cancel out resonant behavior that was seen when all of the phases were aligned. On the other hand, a lumped model would overly underestimate the potential problem given that it lacks the localized view of resonating. Figure 10(b) presents the case where resonating current pulses are each offset by 60°. In this case, the combined currents have periodicity at the resonant frequency, but the stepwise waveform leads to smaller voltage fluctuations. Figure 11 summarizes the effect of varying the phase shift between resonant currents across the four cores, and a range of duty cycles, on the resulting peak-to-peak voltage swing magnitudes seen across the CMP. As seen before the worst-case condition is when all current pulses are aligned in phase (0 or 360). And generally, larger duty cycle means higher overall current draw and, hence, larger voltage swings. Interestingly, in this four core CMP example, interactions between cores lead to the most canceling when current pulses are phase-shifted by multiples of 90°. Given this dependence on the number of cores, a 16 core CMP may exhibit similar dips for phase differences occurring in multiples of 22.5°.

3) Locally Non-Resonant but Globally Resonant : While the previous two conditions were examples of resonating
mainly focused on throttling approaches to mitigate voltage swings in single-core microprocessors [1], [2], [9], [10]. This work focuses on the inductive noise problem in the context of CMP architectures and primarily considers issues that are specific to core-to-core interactions in these machines.

V. Conclusions

As the industry trends towards aggressive power management and voltage scaling in future multi-core designs, it is increasingly important for architects to understand the potential for voltage fluctuations within this new paradigm. This paper presents a distributed power-delivery model that is designed to analyze local on-chip voltage variations to allow architects to understand the impact of inter-core interactions. We analyze this system across a range of current loads using SPEC benchmarks and synthetic current traces. We find that powering on all cores simultaneously can lead to a significant voltage drop in the system and that staggering this activity can be beneficial. Resonating current pulses can cause significant voltage swings, but if cores resonate out-of-phase, swings can be reduced. We also find that in some cases current behavior that would not be resonant within a local core, can become resonant when combined with the activity of other cores.

This paper is an initial attempt to understand the voltage variations in a CMP system. A more detailed model of the CMP architecture with different kinds of applications would lead to more insights into $di/dt$ effects on CMPS and possible solutions. Future research should consider more gating styles including Vdd-gating; understanding the impact of isolated per-core power domains; and studying more multi-threaded workload scenarios.

REFERENCES


