This apparent paradox, i.e. superior performance in the face of inferior fillrate and bandwidth, is because Kyro has a highly bandwidth efficient design. It employs region-based rendering as opposed to GeForce's conventional immediate mode rendering. Unlike immediate mode renderers that texture and shade polygons as soon as they are received from the application, Kyro first evaluates all the polygons that constitute an entire scene prior to rendering.
To balance out the workload across the entire pipeline and keep operations on-chip, the scene is sorted into tiles of a fixed size (32x16 pixels), so that one tile is processed at a time. Therefore, one of the overheads of a region-based render is in the sorting process. If a polygon spans several tiles, it is subdivided into smaller polygons and processed in separate tiles. There is also additional bandwidth needed for storing the scene geometry (polygonal data) and its subsequent retrieval to extract only the visible portions of the scene geometry - hidden surface removal. Finally, scene geometry occupies video board memory. By default, ten megabytes have been set aside for this purpose. For the MBTR benchmark, 2.7 megabytes of video board memory was allocated as scene buffer.
Dividing each scene into tiles (32x16 pixels) and passing each tile individually through the graphics pipeline have made it possible to maintain depth- and frame-buffers on-chip.
Since depth-testing and hidden surface removal are implemented entirely on-chip, an external depth buffer has been dispensed with. In contrast, GeForce accesses external, or off-chip, memory for depth-testing. Another important difference between Kyro and GeForce is the order in which depth-testing takes place. Kyro performs depth-testing and hidden surface removal prior to texturing. As only visible pixels are textured, the savings in texture bandwidth is a function of the extent of overdraw. GeForce performs depth-testing after textures have been applied to the polygon.
Region-based rendering permits an on-chip 32-bit framebuffer (32x16 pixels). This arrangement enables full-speed full-scene anti-aliasing (FSAA) and accelerates multi-pass texturing / blending. It also prevents dithering in 16-bit rendering.