Hidden Secrets: Investigation Shows That NVIDIA GPUs Implement Tile Based Rasterization for Greater Efficiency
by Ryan Smith on August 1, 2016 5:00 AM ESTAs someone who analyzes GPUs for a living, one of the more vexing things in my life has been NVIDIA’s Maxwell architecture. The company’s 28nm refresh offered a huge performance-per-watt increase for only a modest die size increase, essentially allowing NVIDIA to offer a full generation’s performance improvement without a corresponding manufacturing improvement. We’ve had architectural updates on the same node before, but never anything quite like Maxwell.
The vexing aspect to me has been that while NVIDIA shared some details about how they improved Maxwell’s efficiency over Kepler, they have never disclosed all of the major improvements under the hood. We know, for example, that Maxwell implemented a significantly altered SM structure that was easier to reach peak utilization on, and thanks to its partitioning wasted much less power on interconnects. We also know that NVIDIA significantly increased the L2 cache size and did a number of low-level (transistor level) optimizations to the design. But NVIDIA has also held back information – the technical advantages that are their secret sauce – so I’ve never had a complete picture of how Maxwell compares to Kepler.
For a while now, a number of people have suspected that one of the ingredients of that secret sauce was that NVIDIA had applied some mobile power efficiency technologies to Maxwell. It was, after all, their original mobile-first GPU architecture, and now we have some data to back that up. Friend of AnandTech and all around tech guru David Kanter of Real World Tech has gone digging through Maxwell/Pascal, and in an article & video published this morning, he outlines how he has uncovered very convincing evidence that NVIDIA implemented a tile based rendering system with Maxwell.
In short, by playing around with some DirectX code specifically designed to look at triangle rasterization, he has come up with some solid evidence that NVIDIA’s handling of tringles has significantly changed since Kepler, and that their current method of triangle handling is consistent with a tile based renderer.
NVIDIA Maxwell Architecture Rasterization Tiling Pattern (Image Courtesy: Real World Tech)
Tile based rendering is something we’ve seen for some time in the mobile space, with both Imagination PowerVR and ARM Mali implementing it. The significance of tiling is that by splitting a scene up into tiles, tiles can be rasterized piece by piece by the GPU almost entirely on die, as opposed to the more memory (and power) intensive process of rasterizing the entire frame at once via immediate mode rendering. The trade-off with tiling, and why it’s a bit surprising to see it here, is that the PC legacy is immediate mode rendering, and this is still how most applications expect PC GPUs to work. So to implement tile based rasterization on Maxwell means that NVIDIA has found a practical means to overcome the drawbacks of the method and the potential compatibility issues.
In any case, Real Word Tech’s article goes into greater detail about what’s going on, so I won’t spoil it further. But with this information in hand, we now have a more complete picture of how Maxwell (and Pascal) work, and consequently how NVIDIA was able to improve over Kepler by so much. Finally, at this point in time Real World Tech believes that NVIDIA is the only PC GPU manufacturer to use tile based rasterization, which also helps to explain some of NVIDIA’s current advantages over Intel’s and AMD’s GPU architectures, and gives us an idea of what we may see them do in the future.
Source: Real World Tech
191 Comments
View All Comments
jabber - Monday, August 1, 2016 - link
Yeah I loved playing Unreal with my 4MB Matrox Mystique/M3D setup. Looked really good.BrokenCrayons - Monday, August 1, 2016 - link
Yeah it was fantastic looking on PowerVR hardware. I'm not sure what it was about those cards (pretty sure mine was the same Matrox board...I recall thinking something along the lines of "What? That's it?" when pulling the card out of the box since there was so little on the PCB, just the the chip and the two memory ICs) but I liked the visuals more than I did when Unreal was running under Glide.jabber - Monday, August 1, 2016 - link
Yeah tiny card...and no nasty image quality sapping analogue passthrough cable! I used that till I swapped over to a 3dFX Banshee when PowerVR lost the battle.Ryan Smith - Monday, August 1, 2016 - link
"but wasn't tile rasterization implemented on PCs already back in times when mobile phones resembled a brick?"If it makes it any clearer, I could write that it's a first for "video cards that weren't a market failure." Technically the early PowerVR desktop cards did it first, but ultimately they weren't successful on the market. Good idea, bad timing and poor execution.
HollyDOL - Monday, August 1, 2016 - link
Ic, makes it clear, thx... I thought I am missing some major difference.Scali - Monday, August 1, 2016 - link
Sadly, it wasn't even so much the hardware or the drivers at fault, but rather the software.Thing with immediate renderers is that they render immediately. A deferred renderer such as the PowerVR had to buffer the draw calls until a scene was finished. Direct3D had specific BeginScene()/EndScene() functions to mark this, but developers were very sloppy with their usage.
As a result, the driver could not accurately determine when it should render, and when it needs to preserve or flush the z-buffer (the PowerVR doesn't actually need a z-buffer in VRAM, the temporary tile-cache z-buffer is all it needs).
This caused a lot of depth-sorting bugs. Not because the hardware was broken, not because the driver was broken, but because people didn't write proper D3D code. It just 'happened to work' on cards that render directly to VRAM.
invinciblegod - Monday, August 1, 2016 - link
Isn't that what is innovative about Maxwell? They were able to implement it and the driver takes care of compatibility issues like the one you cite.Scali - Monday, August 1, 2016 - link
Well, firstly, I don't think they're doing the same as what PowerVR is doing.Secondly, PowerVR is now a major player in the mobile segment (powering every single iOS device out there, and also various other phones/tablets), the compatibility issues belong to a distant past.
wumpus - Wednesday, August 3, 2016 - link
Just out of curiosity, is this why we "need" g/freesync? It seems to be the solution to a problem that never should have existed, but either GPUs spew bad frames or LCDs get lost when accepting frames during an update.Scali - Wednesday, August 3, 2016 - link
G-Sync is just to remove the legacy of the old CRT displays.CRTs literally scan the display, left-to-right and top-to-bottom, at a given fixed frequency. Historically, the video signal was driving the CRT electronics directly, so you had to be in sync with the CRT.
LCDs just adopted that model, and buffered the frames internally. It was simple and effective. Initially just by digitizing the analog VGA signal that would normally drive a CRT. Later by a digital derivative in the form of DVI/HDMI.
But now that we have more advanced LCDs and more advanced image processors, we can choose to refresh the image whenever the GPU has finished rendering one, eliminating the vsync/double/triple buffering issues.