POWER2+

The Enhanced POWER2 Superscalar RISC Processor

Agenda

- Microprocessor Roadmap
- Project Goals
- System Features
- Chip and Packaging Technology
- Performance Monitor
- Performance
- Summary
Project Goals

- Leverage off of POWER2 design
- Target Commercial transaction processing capability on the high end
- Target cost reduced system on the low end
- Maintain competitive fixed point and floating point performance
High Performance MCM Chip Set for Servers

- 4 Data Cache Unit Chips
  - 128 Kbytes of DCache
- 32 Kbyte ICache
- 512 Kbyte – 2 Mbyte L2 Cache
- 4 Word Memory Interface
  - Minimum Memory Configuration of two memory cards
  - 64 Mbyte – 2048 Mbyte
- Ceramic Multi-Chip Module (MCM) CPU Package
Cost Reduced Chip Set for Desktop Systems

- 2 Data Cache Unit Chips
  - 64 Kbytes of DCache
- 32 Kbyte ICache
- 512 Kbyte – 1 Mbyte L2 Cache
- 2 Word Memory Interface
  - Minimum Memory Configuration of one memory card
  - 32 Mbyte – 512 Mbyte
- Single Chip Solder Ball Connect (SBC) CPU Package
Processor Core Features

- 6 Instruction Dispatch
- 8 Operations/cycle
- Large, multi-ported Data Cache
- High bandwidth buses
- Dual Fixed Point, Floating Point, Branch Units
Optimized L2 Cache Subsystem

- 512 KB, 1 MB, 2 MB Second Level Cache
- Direct-Mapped, 128 byte line
- Store-through – Overlapped write to L2 and Memory
- Industry standard Burst SRAM
  - 2–1–1–1 Cache Hit Read and Write Timing
- Run at CPU clock speed
- Single bit correct, double bit detect ECC for all L2 Cache Accesses

Storage Control Unit L2 Cache Features

- Programmable Second Level Cache and Main Memory Size
- Programmable Bus Width
- Integrated L2 Cache Tag RAM
- Overlapped L2 Tag lookup/compare with DRAM access
  - Single cycle L2 Tag lookup
  - DRAM access never started on L2 Cache Hit
  - No Memory cycle penalty for L2 Cache Miss
- Direct Store Segment Load/Store to L2 directory and data
CMOS Technology

- Lithography length of .7 micron
- Effective Channel length of .45 micron
- 4 levels of metal wiring
- 1 level of polysilicon
- 12.7 x 12.7 mm (ICU, FXU, FPU)
- 11.7 x 9.55 mm (DCU, SCU)

Multi-Chip Module (MCM) Technology

- 64 x 64 mm
- 44 Total Planes
- 20 Signal Planes
- Maximum power 60 watts
- 512 signal pins
Performance Monitor Hardware

ICU

Example Sensor Logic
16 Monitoring Points
1 Bit Increment Signal
16 Monitoring Points
1 Bit Increment Signal
16 Monitoring Points
1 Bit Increment Signal
16 Monitoring Points
1 Bit Increment Signal
1 of 16 bit Mixers

4 Bit Group ID

FXU

FPU

SCU

4 Bits for each of PPU, FXU, ICU, SCU

ICU Monitors
Incrementor
Incrementor
Incrementor
Incrementor
Incrementor

FXU Monitors
5 incrementors

FPU Monitors
5 incrementors

SCU Monitors
5 incrementors

CYCLE Counter

CPU Clk

Up to 80 different signals can be monitored per chip
22 Total counts can be monitored concurrently
Each of the 4 chips have 5 selectable sensor points.
Also: 1 Absolute Cycle Counter

IBM RISC System/6000 Division

SAMPLE OF TPC–C RESULTS

<table>
<thead>
<tr>
<th></th>
<th>59H (1M L2)</th>
<th>390 (512K L2)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Percent FXU instructions</td>
<td>75.9%</td>
<td>74.5%</td>
</tr>
<tr>
<td>Percent ICU instructions</td>
<td>24.1%</td>
<td>25.5%</td>
</tr>
<tr>
<td>Percent FPU instructions</td>
<td>0.01%</td>
<td>0.01%</td>
</tr>
<tr>
<td>Percent FXU ops executed by FXU0 unit</td>
<td>67%</td>
<td>68%</td>
</tr>
<tr>
<td>Percent FXU ops executed by FXU1 unit</td>
<td>33%</td>
<td>32%</td>
</tr>
<tr>
<td>CPI (cycles per instruction)</td>
<td>1.40</td>
<td>1.68</td>
</tr>
<tr>
<td>Percent conditional branches</td>
<td>33%</td>
<td>34%</td>
</tr>
<tr>
<td>Percent conditional branches taken</td>
<td>58%</td>
<td>58%</td>
</tr>
<tr>
<td>Average basic block length (instructions)</td>
<td>4.1%</td>
<td>4.1%</td>
</tr>
<tr>
<td>I–Cache miss rate (per instruction)</td>
<td>2.65%</td>
<td>2.73%</td>
</tr>
<tr>
<td>D–Cache miss rate (per instruction)</td>
<td>0.88%</td>
<td>1.38%</td>
</tr>
<tr>
<td>I–TLB miss rate (per instruction)</td>
<td>0.13%</td>
<td>0.14%</td>
</tr>
<tr>
<td>D–TLB miss rate (per instruction)</td>
<td>0.16%</td>
<td>0.16%</td>
</tr>
<tr>
<td>Avg memory references to satisfy TLB miss</td>
<td>0.61</td>
<td>0.78</td>
</tr>
<tr>
<td>% hit in L2 per I–Cache L1 miss</td>
<td>85.4%</td>
<td>74.6%</td>
</tr>
<tr>
<td>% hit in L2 per D–Cache L1 miss</td>
<td>38.1%</td>
<td>16.6%</td>
</tr>
</tbody>
</table>
### Low-End Performance

<table>
<thead>
<tr>
<th>System</th>
<th>RS/6000 model 375</th>
<th>RS/6000 model 380</th>
<th>RS/6000 model 390</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>POWER1</td>
<td>POWER2+</td>
<td>POWER2+</td>
</tr>
<tr>
<td>Clock rate</td>
<td>62.5 Mhz</td>
<td>59 Mhz</td>
<td>67.0 Mhz</td>
</tr>
<tr>
<td>SPECint92</td>
<td>70.3</td>
<td>99.3</td>
<td>114.3</td>
</tr>
<tr>
<td>SPECfp92</td>
<td>121.1</td>
<td>187.2</td>
<td>205.3</td>
</tr>
<tr>
<td>Linpack</td>
<td>25.9</td>
<td>49.7</td>
<td>55.1</td>
</tr>
</tbody>
</table>

### Mid-Range Performance

<table>
<thead>
<tr>
<th>System</th>
<th>RS/6000 model 580</th>
<th>RS/6000 model 590</th>
<th>RS/6000 model 59H</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>POWER1</td>
<td>POWER2</td>
<td>POWER2+</td>
</tr>
<tr>
<td>Clock rate</td>
<td>62.5 Mhz</td>
<td>66.5 Mhz</td>
<td>66.7 Mhz</td>
</tr>
<tr>
<td>SPECint92</td>
<td>73.3</td>
<td>121.6</td>
<td>122.4</td>
</tr>
<tr>
<td>SPECfp92</td>
<td>134.6</td>
<td>259.7</td>
<td>250.7</td>
</tr>
<tr>
<td>Linpack</td>
<td>38.1</td>
<td>131.8</td>
<td>132.0</td>
</tr>
<tr>
<td>TPC-C (tpmC)</td>
<td>-</td>
<td>726.1</td>
<td>1122.3</td>
</tr>
<tr>
<td>K$/tpmC</td>
<td>-</td>
<td>1.6</td>
<td>1.0</td>
</tr>
</tbody>
</table>

### High-End Performance

<table>
<thead>
<tr>
<th>System</th>
<th>RS/6000 model 980</th>
<th>RS/6000 model 990</th>
<th>RS/6000 model R24</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>POWER1</td>
<td>POWER2</td>
<td>POWER2+</td>
</tr>
<tr>
<td>Clock rate</td>
<td>62.5 Mhz</td>
<td>71.5 Mhz</td>
<td>71.5 Mhz</td>
</tr>
<tr>
<td>SPECint92</td>
<td>73.3</td>
<td>131.0</td>
<td>134.1</td>
</tr>
<tr>
<td>SPECfp92</td>
<td>134.6</td>
<td>279.0</td>
<td>273.8</td>
</tr>
<tr>
<td>Linpack</td>
<td>38.1</td>
<td>141.6</td>
<td>141.0</td>
</tr>
<tr>
<td>TPC-A (tpsA)</td>
<td>160.3</td>
<td>275.6</td>
<td>357.2</td>
</tr>
<tr>
<td>K$/tpsA</td>
<td>10.1</td>
<td>7.0</td>
<td>7.3</td>
</tr>
</tbody>
</table>
POWER2+ Summary

- High Performance Superscalar RISC Processor
- Optimized L2 Cache subsystem
- High bandwidth buses
- High performance, cost reduced desktop system
- Industry leading Commercial transaction processing system