3D Graphics Processor Chip Set

Makoto Awaga
Senior Engineer
awaga@slde.ed.fujitsu.co.jp

APPLIED SYSTEM LSI DIVISION
FUJITSU LIMITED

Outline

- Background of the design
- Chip set configuration
- Product overview
- Summary
Background of the design

3D Graphics performance trend

---

5.1-04

Background of the design

3D Graphics operation layers applied to PC

- Application
- OS: Windows NT (Successor of Win '95)
- API: Open GL, Reality Lab, Renderware, etc.
- 3DG Libraries: Geometry, Rendering

Edgeslope calculation
- Type 1: CPU program
- Type 2: Hardware assistance
- Type 3: CPU program

---
Background of the design

3D Graphics processing stages

<table>
<thead>
<tr>
<th>Geometry process</th>
<th>Coordinate transform</th>
<th>View volume clip</th>
<th>Lighting calculation</th>
<th>Screen projection</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Z sort</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Rendering process</td>
<td>Rasterization</td>
<td>DDA</td>
<td>Texture mapping, Alpha blending</td>
<td>Scissor/Window clip, Stencil</td>
</tr>
<tr>
<td></td>
<td>Z compare</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Z-sort is not applicable in 3D-DDI

Background of the design

Objectives

- Support a total solution of local 3DG acceleration (offer balanced performance of geometry and rendering)

- To be applicable for both run time and authoring applications (support both Z-sort and Z-buffer algorithms)

- Realize 300k polygons/sec sustained performance by a chip set (to be competitive against the highend arcade game machines)

- Provide a flexible solution to accelerate the performance scalably depending on each applied system's target range
Chip Set Configuration

- MB86235 (TGPx4: Three dimensional Geometry Processor x4)
  - Floating point DSP (80MFLOPS@40MHz)
  - Geometry operation performance:
    - 300k polygons/sec (isolated triangle flat polygons)
    - 180k polygons/sec (isolated triangle gouraud polygons)

- MB86271 (AGP: Advanced Graphics Processor)
  - Rendering Processor (120MIPS@60MHz)
  - Rendering performance:
    - 450k polygons/sec (isolated triangle gouraud polygons: 25pixels/polygon)
    - 15M texcels/sec (point sampled texture mapping: 200pixels/polygon)

- MB86272 (Z-sorter)
  - Polygon Sorter
  - Sorting performance: 300k polygons/sec

Chip set configuration
System block diagram [1]
Chip set configuration
System block diagram [2]

Chip set configuration
System block diagram [3]
Product overview

MB86235 (TGPx4) : Block diagram

Product overview

MB86235 (TGPx4) : Execution block
Product overview

MB86235 (TGPx4) : Coordinate transformation

\[
\begin{pmatrix}
X_n' \\
Y_n' \\
Z_n'
\end{pmatrix} =
\begin{pmatrix}
C_{00} & C_{01} & C_{02} & C_{03} \\
C_{10} & C_{11} & C_{12} & C_{13} \\
C_{20} & C_{21} & C_{22} & C_{23}
\end{pmatrix}
\begin{pmatrix}
X_n \\
Y_n \\
Z_n \\
1
\end{pmatrix}
\]

Product overview

MB86235 (TGPx4) : Instruction format and sample coding

<table>
<thead>
<tr>
<th>MSB</th>
<th>ALU field</th>
<th>MUL field</th>
<th>Transfer field [Type 1]</th>
</tr>
</thead>
<tbody>
<tr>
<td>63 61 60</td>
<td>42 41</td>
<td>27 26</td>
<td>0</td>
</tr>
<tr>
<td>000</td>
<td>XOR AA0, AA0, AA0 :</td>
<td>FMUL MB0, PR++, AB0 :</td>
<td>MOV1 FI, MB0</td>
</tr>
<tr>
<td>XOR AA2, AA2, AA2 :</td>
<td>FMUL MB1, PR++, AB1 :</td>
<td>MOV1 FI, MB2</td>
<td></td>
</tr>
<tr>
<td>FADD AA0, AB0, AA0 :</td>
<td>FMUL MB2, PR++, AB2 :</td>
<td>MOV1 #1, MB3</td>
<td></td>
</tr>
<tr>
<td>FADD AA0, AB1, AA0 :</td>
<td>FMUL MB3, PR++, AB3 :</td>
<td>MOV1 AA0, @AA7++</td>
<td></td>
</tr>
<tr>
<td>FADD AA0, AB2, AA0 :</td>
<td>FMUL MB1, PR++, AB1 :</td>
<td>MOV1 AA1, @AA7++</td>
<td></td>
</tr>
<tr>
<td>FADD AA1, AB0, AA1 :</td>
<td>FMUL MB2, PR++, AB2 :</td>
<td>MOV1 #0, PRP</td>
<td></td>
</tr>
<tr>
<td>FADD AA1, AB1, AA1 :</td>
<td>FMUL MB3, PR++, AB3 :</td>
<td>MOV1 AA2, @AA7++</td>
<td></td>
</tr>
<tr>
<td>FADD AA1, AB2, AA1 :</td>
<td>FMUL MB0, PR++, AB0 :</td>
<td>MOV1 AA0, @AA7++</td>
<td></td>
</tr>
<tr>
<td>FADD AA2, AB0, AA2 :</td>
<td>FMUL MB2, PR++, AB2 :</td>
<td>MOV1 AA1, @AA7++</td>
<td></td>
</tr>
<tr>
<td>FADD AA2, AB1, AA2 :</td>
<td>FMUL MB3, PR++, AB3 :</td>
<td>MOV1 #0, PRP</td>
<td></td>
</tr>
<tr>
<td>FADD AA2, AB2, AA2 :</td>
<td>FMUL MB0, PR++, AB0 :</td>
<td>MOV1 AA2, @AA7++</td>
<td></td>
</tr>
</tbody>
</table>

Hot Chips VII  Stanford, California, August 14–15, 1995  page 127
Product overview

MB86235 (TGPx4) : Parameter registers

Parameter register
24 word x 32bit

read entry

write entry

PRP : PR Read Pointer

PRP : PR Write Pointer

Product overview

MB86235 (TGPx4) : Performance estimation

<table>
<thead>
<tr>
<th></th>
<th>Flat shading</th>
<th>Gouraud shading</th>
</tr>
</thead>
<tbody>
<tr>
<td>[1] Coordinate transform</td>
<td>52</td>
<td>90</td>
</tr>
<tr>
<td>[2] Side detection</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>[3] Lighting calculation</td>
<td>13</td>
<td>38</td>
</tr>
<tr>
<td>[4] Screen projection</td>
<td>20</td>
<td>20</td>
</tr>
</tbody>
</table>

Case I [1]-[5]

Execution cycles 100 163
Performance @40MHz 400k pps 245k pps

Case II [1]-[6]

Execution cycles 136 219
Performance @40MHz 300k pps 183k pps
**Product overview**

MB86271 (AGP) : Block diagram

**Product overview**

MB86271 (AGP) : Internal parallel operation scheme
5.1-19

**Product overview**

MB86271 (AGP) : Input data packet

![Diagram](image)

- \((X_s, Y_0, Z_s)\)
- \((R_s, G_s, B_s, A_s, D_s)\)
- \((S_s, T_s, Q_s)\)

\[\begin{align*}
&dX_s/dY_0, dZ_s/dY_0, \\
&dR_s/dY_0, dG_s/dY_0, dB_s/dY_0, \\
&dA_s/dY_0, dD_s/dY_0, \\
&dS_s/dY_0, dT_s/dY_0, dQ_s/dY_0
\end{align*}\]

5.1-20

**Product overview**

MB86271 (AGP) : Execution unit block diagram

![Diagram](image)
Product overview

MB86271 (AGP): Data path flow in the rendering engine unit

Parameter set input from Microprocessor block. Trigger command

Texture memory

Texture memory address

Texture (R, G, B, A)

Texture blend

(R, G, B, A)

Frame memory address generation

(R, G, B, A)

Frame memory

DDA

Perspective correction

(S, T, Q) (R, G, B, A, D)

Texture memory address generation

Texture memory address

Texture (R, G, B, A)

Input color

(R, G, B, A, D)

Local memory address generation

(X, Y, Z)

Local memory address

Local memory

Current information of the pixel

(R, G, B, A) (Z, W, S)

Z: Z-buffer value

W: Window attribute

S: Stencil attribute

Alpha blend, Drawing condition check

(R, G, B, A)

(X, Y)

(R, G, B, A)

(X, Y)

(R, G, B, A)

(X, Y)

Local memory address

Local memory address

Local memory address

Local memory address

Local memory address

Local memory address

Product overview

MB86271 (AGP): Flat/Gouraud shading performance

Hidden surface algorithm: Z-sort

- Flat + Alpha blend (Microprocessor block peak)

- Gouraud + Alpha blend (Microprocessor block peak)

- Flat + Alpha blend

- Gouraud + Alpha blend

[pps]

0 100K 200K 300K 400K 500K 600K 700K 800K 900K

[pixels/polygon]

25 50 100 150 200 250 300 350 400 450 500
Product overview

MB86271 (AGP) : Texture mapping performance

Summary

- New 3D Graphics chip set including geometry processor and Z-sorter

- World highest level of 3DG performance on both engineering/design CAD and realtime 3D Graphic applications

- Provides flexible structure to realize the most appropriate performance level for various types of application systems

- Going to support extensive software drivers for standard 3D Graphics platform

- Next stage is to integrate the functions of those 3 chips into single LSI to offer an intensive solution to PC 3D Graphics.
MB86235 (TGPx4) : Die photo

MB86271 (AGP) : Die photo