NVIDIA Announces Quadro M6000 & Quadro VCA (2015)by Ryan Smith on March 19, 2015 10:00 AM EST
Earlier this week we took a look at the GeForce GTX Titan X, NVIDIA’s first product to use their new high-end Maxwell GPU, the GM200. Now just 2 days later the company is back again with GM200 and is set to launch it in their new professional graphics counterpart, the Quadro M6000.
Like Titan, 6000 is NVIDIA’s flagship Quadro card and today’s launch sees the new GM200 based Quadro M6000 take its place at the top of the Quadro graphics stack. What makes this launch interesting is that NVIDIA has never launched a flagship Quadro card so close to a flagship GeForce card in this manner. Quadro cards usually launch months down the line, not days. The end result being that professional users are getting much earlier access to NVIDIA’s best hardware.
|NVIDIA Quadro Specification Comparison|
|Memory Clock||6.6GHz GDDR5||6GHz GDDR5||6GHz GDDR5||3GHz GDDR5|
|Memory Bus Width||384-bit||384-bit||256-bit||384-bit|
|FP64||1/32 FP32||1/3 FP32||1/3 FP32||1/2 FP32|
|Manufacturing Process||TSMC 28nm||TSMC 28nm||TSMC 28nm||TSMC 40nm|
So just what is Quadro M6000? Packing a fully enabled GPU, this is GM200 at its best. All 3072 CUDA cores are enabled, and with a maximum clockspeed of 1.14GHz the card is capable of pushing 7 TFLOPs of single precision performance. Coupled with the card is GM200’s double-sized ROP clusters, giving M6000 96 ROPs and better than 2x the pixel throughput of the outgoing K6000.
Meanwhile it’s interesting to note that NVIDIA’s GPU Boost technology has finally come to the Quadro lineup via the M6000. The M6000 supports 10 different boost states, the fastest of which is the 1.14GHz state that gives the card its 7 TFLOPS of performance. As with GeForce and Tesla cards, GPU Boost allows NVIDIA to raise their shipping clockspeeds for better performance without violating the card’s cooling or power delivery restrictions.
Paired with the GM200 is 12GB of GDDR5 memory, which is as much as the K6000 and still the most one can pack on a memory bus of this size. M6000 clocks its memory at 6.6GHz, which is good for 317GB/sec of memory bandwidth. Furthermore, as with past high-end Quadro cards ECC protection is available for the memory (and only the memory, no cache), which trades off some memory bandwidth for better protection against memory errors.
On the overall performance front, Quadro M6000 is expected to offer a significant performance boost over K6000, similar to what we’ve seen on the consumer side with GTX Titan X. Along with the greater clockspeed and the slight increase in the number of CUDA cores, M6000 brings with it the Maxwell 2 family architecture and its efficiency improvements. Actual performance will depend on the application, but 50% or more is possible, especally in exotic scenarios that stress the ROPs. To that end NVIDIA gave Lucasfilm some of the first M6000 cards, and they reported a better than expected performance increase:
To create the most immersive and visually exciting imagery imaginable, Lucasfilm artists and developers need optimal graphics performance and GPU power," said Lutz Latta, Principal Engineer at Lucasfilm. "With the NVIDIA Quadro M6000 GPU, we saw overall gains of 55% in heavy a compute and memory access ray-tracing application using layered shadow maps. This kind of performance boost gives our artists a necessary edge to realize their creative vision.
Along with Maxwell 2’s architectural efficiency improvements, Maxwell 2 also brings with it a series of feature improvements that make their debut in the Quadro family on the M6000. On the display side, M6000 is the first Quadro capable of driving four 4K displays (previous gen Quadros were limited to two such displays) thanks to the updated display controller. Meanwhile Quadro also gains the latest NVENC video encoder, which though unlikely to be used at this early stage, opens the door up to real-time HEVC encoding on Quadro.
As for the card’s construction and power requirements, both have changed compared to K6000. M6000’s TDP is 250W, up from 225W on K6000. The increased TDP allows for higher clockspeeds than the Quadro family’s historically conservative clockspeeds, and is at this point equivalent to the consumer GTX Titan X’s power requirements. Interestingly despite this increase, M6000 only requires 1 8-pin PCIe power connector (located on the far side of the card, as in past Quadro designs); this technically puts the M6000 out of spec on PCIe since 250W is more than what the slot + 8-pin connector can provide (225W). We asked NVIDIA about this, and they have told us that the card is pulling the extra power from the 8-pin connector, and though not officially in spec, the kind of systems expected to house the M6000 are expected to have no problem delivering the extra amperage necessary.
Meanwhile the card’s construction has seen the K6000’s plastic shroud and cooling apparatus replaced with the metal GTX Titan shroud and cooler, similar to the GTX Titan X. This change is largely driven by the power increase, as the GTX Titan cooler is already qualified to handle 250W designs. To set it apart from the GTX Titan X, the M6000 gets a black & green paint job rather than the Titan’s all-black paintjob. Otherwise the change in coolers has no effect on the card’s dimensions, with the card still being a double-slot 10.5” long card, just like the K6000.
Moving on, while M6000 will be a graphics monster, as it’s using the GM200 GPU this means that it will also inherit GM200’s compute capabilities, including the GPU’s highly limited double precision (FP64) performance. On the more recent Quadro 6000 cards, NVIDIA has used GPUs with high FP64 throughput (largely an artifact of also using these GPUs in Tesla compute cards) and left FP64 throughput unrestricted on Quadro cards. This made the Quadro K6000 a sort of jack of all trades, offering NVIDIA’s best pro graphics performance along with their full compute performance.
However GM200 and the Quadro M6000 change that. With Quadro M6000 having a native FP64 rate of 1/32 FP32, M6000 will only have minimal FP64 capabilities. In our GTX Titan X article we discuss the development rationale for this, but NVIDIA has essentially opted to build the best graphics and FP32 compute GPU they can, and not waste space on FP64 resources. Consequently this is the first Quadro 6000 series card in some time to have such poor FP64 performance. However as FP64 compute is not widely used in graphics, this is not something NVIDIA believes will be an issue. In the far more common scenario of FP32 compute (e.g. most ray-tracing engines), M6000 will be far more performant than its predecessors.
Finally, as far as use cases go, NVIDIA is aiming the M6000 at a cross-section of possible markets. There is of course the traditional pro visualization market, the high-end of which is always in need of greater GPU performance, something the M6000 can provide in spades. However the company is also pushing the use of Physically Based Rendering (PBR), a compute-intensive rendering solution that uses far more accurate rendering algorithms to accurately model the physical characteristic of a material, in essence properly capturing how light will interact with that material and reflect off of it rather than using a rough approximation. We’ll have more on PBR a bit later this week when we talk about Quadro developments at GDC.
Wrapping things up, NVIDIA tells us that Quadro M6000 will be available soon in complete systems through the company’s regular OEM partners, and as individual cards via the typical retail channels. As is company for NVIDIA, they have not announced a launch price for the M6000, but we would expect to see it launch at $5000+, as has been the case with past Quadro 6000 series cards.
Quadro VCA (2015)
Meanwhile with the launch of the Quadro M6000, NVIDIA is also using this opportunity to refresh their Iray Visual Computing Appliance (VCA), the company’s high-end network-attached render server. The VCA specializes in very high performance remote rendering jobs, packing in multiple GPUs into a single server box, with further scale-out capabilities to multiple VCA boxes via 10GigE and Infiniband.
Now dubbed the Quadro VCA, this updated VCA packs in 8 of NVIDIA’s high-end Quadro cards. The cards themselves are GM200 based but are technically not M6000 – NVIDIA is quick to note that they have a different BIOS that has them clocked slightly differently – but should perform similar to the aforementioned M6000. These cards have 12GB per GPU and are fully enabled, giving the entire VCA some 96GB of VRAM and 24,576 CUDA cores.
Driving the Quadro cards will be a pair of 10-core Xeon processors (we don’t have the specific model at this time, but believe it to be from the Xeon E5 V3 family), 256GB of system memory, and 2TB of solid state storage. Other than the change in processors and the updated Quadro cards, the rest of these specs are identical to the previous generation VCA.
On the software side, the new Quadro VCA runs CentOS 6.6. It will also come with Iray 2015 and Chaos’s V-Ray RT pre-installed to make setup easier, however it should be noted that the VCA does not include the licenses for those software packages and those must be purchased separately.
The Quadro VCA will be available soon through NVIDIA's VCA partners for $50,000.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Dorek - Thursday, March 19, 2015 - linkPerformant isn't a word. Try "capable" there.
tipoo - Thursday, March 19, 2015 - linkThe first recorded use was in 1847. Plus it's widely used in IT vernacular. Every word is made up, after all, so the detractors will have to get over it :P
xthetenth - Thursday, March 19, 2015 - linkIf it's not a word, then there sure are a lot of people coincidentally making the same serious of mouth noises across the tech industry when expressing the same concept.
ddriver - Thursday, March 19, 2015 - link1/32 DP performance? No thanks, neeeext!
tipoo - Thursday, March 19, 2015 - linkWhat was your planned use case, out of curiosity?
Intervenator - Thursday, March 19, 2015 - linkLol
JarredWalton - Thursday, March 19, 2015 - linkTrolling most likely. :-)
tipoo - Thursday, March 19, 2015 - linkEveryone knows you need at least 1/2 to full DP performance for trolling!
ddriver - Thursday, March 19, 2015 - linkAnd you need to be clueless and born yesterday for your conception of professional GPUs to boil down to running games. This a professional product, compute performance is important, professional application use double precision unlike games.
tipoo - Thursday, March 19, 2015 - linkIf you can quote where I mentioned games - I'll buy you one. I'm just curious as to what exactly your use case for double precision compute is, which you still have not provided, since you seem to need it.
That was a complete non-sequitur and straw man, you're not fooling anyone with saying that.
Yes, I know this is a professional product. I also know that there are many professional uses for single precision, which you didn't seem to know, which one would expect given you have probably never used a pro card. Double precision is required for certain scientific work, but it's still a niche within a niche.
The proof of that is self-evident, as Nvidia thought it was worth cutting DP out in favor of squeezing as much SP out of it as they could per unit die area.
Now, for you to give us that very specific use case you have that uses double precision. You surely have one, right, and weren't just talking out your butt?