GPU Physics

When ATI and NVIDIA launched their first physics initiatives in 2006, they rallied behind Havok, the physics middleware provider whose software has powered a great number of PC games this decade. Havok in turn produced Havok FX, a separate licensable middleware package that used Shader Model 3.0 for calculating physics on supported GPUs. Havok FX was released in Q2 of 2006, and if you haven't heard about it you're not alone.

So far not a single game has shipped that uses Havok FX; plenty of games have shipped using the normal Havok middleware which is entirely CPU-powered, but none with Havok FX. The only title we know of that has been announced with Havok FX support is Hellgate: London, which is due this year. However we've noticed there has been next-to-no mention of this since NVIDIA's announcement in 2006, so make of that what you will.

Why any individual developer chooses to use Havok FX or not will be a unique answer, but there are a couple of common threads that we believe explain much of the situation. The first is pure business: Havok FX costs extra to license. We're not privy to the exact fee schedule Havok charges, but it's no secret PC gaming has been on a decline - it's a bad time to be spending more if it can be avoided. Paying for Havok FX isn't going to break the bank for the large development houses, but there are other potentially cheaper options.

The second reason, and that which has the greater effect, is a slew of technical details that stem from using Havok FX. Paramount to this is what the GPU camp is calling physics is not what the rest of us would call physics with a straight face. As Havok FX was designed, the physics simulations run on the GPU are not retrievable in a practical manner, as such Havok FX is designed to be used to generate "second-order" physics. Such physics are not related to gameplay and are inserted as eye-candy. A good example of this is Ghost Recon: Advanced Warfighter, which we'll ignore was a PhysX powered title for the moment and focus on the fact that it used the PhysX hardware primarily for extra debris.

The problem with this of course is obvious, and Havok goes through a great deal of trouble in their Havok FX literature to make this clear. The extra eye-candy is nice and it's certainly an interesting solution to bypassing the problem of lots-of-little-things loading down the CPU (although Direct3D 10 has reduced the performance hit of this), but it also means that the GPU can't have any meaningful impact on gameplay. It doesn't make Havok FX entirely useless since eye-candy does serve its purpose, but it's not what most people (ourselves included) envision when we think hardware accelerated physics; we're looking for the next step in interactive physics, not more eye-candy.

There's also a secondary issue that sees little discussion, largely because it's not immediately quantifiable, and that's performance. Because Havok FX is doing its work on the GPU, shader resources being used for rendering may be getting reallocated to physics calculations, while the remainder of the resources are left to pick up the rest of the work on top of the additional work generated by Havok FX as a result of creating more eye-candy. When the majority of new titles are GPU limited, it's not hard to imagine this scenario.


A Jetway board with 3 PCIe x16 slots. We're still waiting to put them to use

Thankfully for the GPU camp, Havok isn't the only way to get some level of physics, Shader Model 4.0 introduces some new options. Besides implementing Havok FX in the form of custom code, with proper preparation the geometry shader can be used to do second-order physics like Havok. For example the Call of Juarez technology demonstration uses this technique for its water effects. That said using the geometry shader brings on the same limitations as Havok FX in not being able to retrieve the data for first-order physics.

The second, and by far more interesting use of new GPU technology is exploiting the use of GPGPU techniques to do physics calculations for games. ATI and NVIDIA provide the CTM and CUDA interfaces respectively to allow developers to write high-level code for GPUs to do computing work, and although the primary use of GPGPU technology is for the secondary market of high-performance research computing, it's possible to use this same technology with games. NVIDIA is marketing this under the Quantum Effects initiative, separating it from their early Havok-powered SLI Physics initiative.

Unfortunately the tools for all of these technologies are virtually brand new, games using GPGPU techniques are going to take some time to arrive. This would roughly be in line with the arrival of games that make serious use of DirectX10, which includes the lag period where games will need to support older hardware and hence can't take full advantage of GPGPU techniques. The biggest question here is if any developers using GPGPU techniques will end up using the GPU for first-order physics or solely second-order.

It's due to all of the above that the GPU camp has been so quiet about physics as of late. Given that the only currently commercial-ready GPU accelerated physics technology is limited to second-order physics and only one game is due to be released using said technology this year, there's simply not much to be excited about at the moment. If serious GPU accelerated physics are to arrive, it's going to be another video card upgrade away at the least.

Index PhysX
POST A COMMENT

32 Comments

View All Comments

  • Bladen - Thursday, July 26, 2007 - link

    When I say first order physics, I mean the most obvious type: fully destructible environments.

    In UT3, you could have a fully destructible environment as an on/off option without making the game unbalanced in single player. The game is mindless killing, who care is you blow a hole through a wall to kill your enemy?

    I guess you could have fully destructible environments processed via hardware and software, but I'd assume that the software performance hit would be huge, maybe only playable on a quad core.
    Reply
  • Bladen - Thursday, July 26, 2007 - link

    Whether or not the game has fully destructible environments, I don't know. Reply
  • Verdant - Thursday, July 26, 2007 - link

    dedicated hardware for this is pointless, with GPU speeds and numbers of cores on a die increasing on CPUs, I see no point in focusing on another pipe.

    Plus the article has a ton of typographical errors :(.
    Reply
  • bigpow - Wednesday, July 25, 2007 - link

    Maybe it's just me, but I hate multiple-page reviews Reply
  • Shark Tek - Wednesday, July 25, 2007 - link

    Just click on the "Print this article" link and you will have the whole article in one page. Reply
  • Visual - Thursday, July 26, 2007 - link

    indeed thats what i do. almost.
    i hate that printarticle shows up in a popup, can't open it in a tab easily with a middle-click too... same as the comments page btw. really hate it.
    so i manually change the url - category/showdoc -> printarticle, it all stays in the same tab and is great. i'm planning on writing a ".user.js" (for opera/greasemonkey/trixie) for fixing the links some time
    Reply
  • Egglick - Wednesday, July 25, 2007 - link

    Other than what was mentioned in the article, I think another big problem is that the PCI bus doesn't have enough bandwidth (bi-directional or otherwise) for a card doing heavy real-time processing. For whatever reason, manufacturers still seem apprehensive about using PCIe x1, so it will be rough for standalone cards to perform at any decent level.

    I've always felt the best application for physics processors would be to piggyback them on high-end videocards with lots of ram. Not only would this solve the PCI bandwidth problem, but the physics processor would be able to share the GPU's fast memory, which is probably what constitutes the majority of the cost for standalone physics cards.

    This setup would benefit both NVidia/ATI and Ageia. On one hand, Ageia gets massive market penetration by their chips being sold with the latest videocards, while NVidia/ATI get to tout having a huge new feature. They could also use their heavy influence to get game developers to start using the Ageia chip.
    Reply
  • cfineman - Wednesday, July 25, 2007 - link

    I thought one of the advantages of DX10 was that it would allow one to partition off some of the GPU subprocessors for physics work.

    I was *very* surprised that the author implied that the GPUs were not well suited to embarrassingly parallel applications.... um.... what's more embarrassingly parallel than rendering?
    Reply
  • Ryan Smith - Wednesday, July 25, 2007 - link

    I think you're misinterpreting what I'm saying. GPUs are well suited to embarrassingly parallel applications, however with the core-war now you can put these tasks on a CPU which while not as fast at FP as a GPU/PPU, is quickly catching up thanks to having multiple CPU cores and how easy it is to put embarrassingly parallel tasks on such a CPU. GPUs are still better suited, but CPUs are becoming well enough suited that the GPU advantage is being chipped away.

    As for DX10, there's nothing specifically in it for physics. Using SM4.0/geometry shaders you can do some second-order work, but first-order work looks like it will need to be done with CUDA/CTM which isn't a part of DX10. You may also be thinking of the long-rumored DirectPhysics API, which is just that: a rumor.
    Reply
  • yyrkoon - Thursday, July 26, 2007 - link

    Actualy, because of the limited bandwidth capabilities of any GPU interface, the CPU is far better suited. Sure a 16x PCIe interface is limited to a huge 40Gbit/s bandwidth (asyncronous), and as I said, this may *seem* huge, but I personally know many game devers who have maxed this limit easily when experimenting with game technologies. When, and if the PCIe bus expands to 32x, and *if* graphics OEMs / motherboard OEMs implement it, then we'll see something that resembles the CPU-> memory of current bandwidth capabilities(10GB/s). By then however, who is to say how much the CPU -> memory bandwidth will be capable of. Granted, having said all that, this is why *we* load compressed textures into video memory, and do the math on the GPU . . .

    Anyhow, the whole time reading this article, I could not help but think that with current CPUs being at 4 cores, and Andahls law, that the two *other* cores could be used for this purpose, and it makes total sense. I think it would behoove Aegia, and Havok both to forget about Physics hardware, and start working on a liscencable software solution.
    Reply

Log in

Don't have an account? Sign up now