NVIDIA Announces Quadro Pascal Family: Quadro P6000 & P5000by Ryan Smith on July 25, 2016 4:45 PM EST
If there was one word to describe the launch of NVIDIA’s Pascal generation products, it’s “expedient.” On the consumer side of the business the company has launched 3 different GeForce cards and announced a fourth (Titan X), while on the HPC side the company has already launched their Tesla P100 accelerator, with the PCIe version due next quarter. With the company moving so quickly it was only a matter of time until a Quadro update was announced, and now today at SIGGRAPH 2016 the company is doing just that.
Being announced today are the two Quadro models that will fill out the high-end of the Quadro family, the P6000 and P5000. As hinted at by the name, these are based on NVIDIA’s latest Pascal generation GPUs., marking the introduction of Pascal to the Quadro family. And like NVIDIA’s consumer counterparts, these new cards should offer significant performance and feature upgrades over their Maxwell 2 based predecessors.
|NVIDIA Quadro Specification Comparison|
|Memory Clock||9Gbps GDDR5X||9Gbps GDDR5X||6.6Gbps GDDR5||6.6Gbps GDDR5|
|Memory Bus Width||384-bit||256-bit||384-bit||258-bit|
|FP64||1/32 FP32||1/32 FP32||1/32 FP32||1/32 FP32|
|Architecture||Pascal||Pascal||Maxwell 2||Maxwell 2|
|Manufacturing Process||TSMC 16nm||TSMC 16nm||TSMC 28nm||TSMC 28nm|
|Launch Date||October 2016||October 2016||03/22/2016||08/11/2015|
|Launch Price (MSRP)||TBD||TBD||$5000||$2000|
We will start, as always, at the top, with the Quadro P6000. As NVIDIA’s impending flagship Quadro card, this is based on the just-announced GP102 GPU. The direct successor to the GM200 used in the Quadro M6000, the GP102 mixes a larger number of SMs/CUDA cores and higher clockspeeds to significantly boost performance.
Paired with P6000 is 24GB of GDDR5X memory, running at a conservative 9Gbps, for a total memory bandwidth of 432GB/sec. This is the same amount of memory as in the 24GB M6000 refresh launched this spring, so there’s no capacity boost at the top of NVIDIA’s lineup. But for customers who didn’t jump on the 24GB – which is likely a lot of them, including most 12GB M6000 owners – then this is a doubling (or more) of memory capacity compared to past Quadro cards. At this time the largest capacity GDDR5X memory chips we know of (8Gb), so this is as large of a capacity that P6000 can be built with at this time. Meanwhile this is so far the first and only Pascal card with GDDR5X to support ECC, with NVIDIA implementing an optional soft-ECC method for the DRAM only, just as was the case on M6000.
NVIDIA has also sent over pictures of the card design, and confirmed that the card ships with the Quadro 6000-series standard TDP of 250W. Utilizing the same basic metal shroud and blower design as the M6000 cards, the P6000 should be suitable as drop-in replacement for older M6000 cards. Do note however that like M6000, external power is pulled via a single 8-pin power connector, so technically this card is out of spec (not that this was a problem for M6000).
Unfortunately in their zeal to get this announcement out in time for SIGGRAPH - a frequent venue for Quadro announcements – we don’t have specific performance numbers available. NVIDIA has not locked down the GPU clockspeeds, and as a result we don’t just how P6000s clockspeeds and total throughput will compare to M6000’s. It goes without saying that it should be higher, but how much higher remains to be seen.
For overall expected performance, NVIDIA has published that the P6000 is rated for 12 TFLOPs FP32. Given that it's a fully enabled GP102 we're looking at, this works out to a clockspeed of around 1560MHz. On paper this gives P6000 around 71% more shading performance and 37% more ROP throughput than the older Maxwell 2 M6000. This also puts the P6000 around 9% ahead of the recently announced NVIDIA Titan X.
On a quick technical note, as this announcement comes just 4 days after NVIDIA announced the GP102 GPU used on this card, this Quadro announcement does confirm a few more things about GP102. Quadro P6000 ships with 3840 CUDA cores (30 SMs), confirming our earlier suspicions that GP102 was a (or at least) 30 SM part. Meanwhile this also confirms that GP102 can be outfit with 24GB of GDDR5X. Finally, NVIDIA has confirmed that there’s no high-speed FP64 support on GP102, which is why we’re looking at a 1/32 rate for even the top Quadro card.
Moving on, let’s talk about Quadro M5000. Based on NVIDIA’s GP104 GPU, this is the smaller, cheaper, lower power sibling to the P6000. This is a fully enabled part with all 2560 CUDA cores (20 SMs) active, so the performance gains versus M5000 should be similar to what we saw with the consumer GeForce GTX 1080. Clockspeeds are also comparable, so we're looking at sizable boost in shading/compute/texture performance of 2.06x, and ROP throughput has increased by 65%. Of the two cards, M5000 is going to the bigger upgrade versus its direct predecessor.
Meanwhile on the memory front, P5000 is equipped with 16GB of GDDR5X memory. This is attached to GP104’s 256-bit memory bus, and like P6000 is clocked at 9Gbps. P5000’s predecessor, M5000, maxed out at just 8GB of memory, so along with a 36% increase in memory bandwidth, this doubles the amount of memory available for a Quadro 5000 tier card.
Looking at the card design itself, to no surprise it strongly resembles the M5000, with its plastic blower dressed up in Quadro livery. The card’s TDP stands at 180W, which is a slight increase over M5000, but shouldn’t too significantly impact the drop-in replacement nature of the design.
Pascal Features & Availability
Along with the significant performance increase afforded by the Pascal architecture and TSMC’s 16nm FinFET manufacturing process, the other big news here is of course the functionality that comes to the Quadro P-series courtesy of Pascal. While for our regular readers there’s nothing new we haven’t seen already with GeForce, Pascal’s new functionality will apply a bit differently to the Quadro lineup.
Perhaps the biggest change here is Pascal’s new display controller. With both the P6000 and P5000 shipping with 4 DisplayPorts, the DisplayPort 1.4 capable controller means that both cards can now support higher resolutions and refresh rates. Whereas the M-series maxed out at 4 4K@60Hz monitors, the P-series can now handle 4 monitors running 5K@60Hz, 4 4K monitors running at 120Hz, or even 8K monitors with additional limitations. Do note however that the per-card monitor limit is still 4 displays, as this is as many displays as Pascal can support.
Speaking of multiple displays, alongside the Quadro card announcements NVIDIA is also announcing a new Quadro Sync card, the aptly named Quadro Sync 2. The multi-adapter/multi-display timing synchronization card is being updated to support the Pascal cards, and will support a larger number of adapters as well. The new Sync 2 will support 8 cards in sync, as opposed to 4 on the original Sync card. Coupled with the 4 display per card capability of Pascal, and this means synchronized video walls and other systems can now be built out to 32 displays.
NVIDIA will also be heavily promoting Simultaneous Multi-Projection (SMP), the company’s multi-viewport technology. Like the consumer cards, VR is a big driver here, with NVIDIA looking reach out to VR developers. NVIDIA is also pitching this at VR CAVE systems, as they can see similar benefits from SMP’s geometry reprojection.
Taking a look at the overall Quadro lineup, the P6000 and P5000 will at least for the time being be sitting alongside the existing M4000 and lower cards. Within the Quadro lineup these cards are meant for the most demanding workloads– massive memory sets and complex rendering/compute tasks – and they will be priced accordingly. Specific pricing has not been announced, but NVIDIA tells us to expect them to be priced similarly to the last generation cards. This would work out to $5000+ for Quadro P6000, and $2000+ for Quadro P5000 at launch.
Finally, as we mentioned before NVIDIA was announcing these cards early, before the final clockspeeds have been locked down. This means that while the cards are being announced today, they won’t launch for another two months; NVIDIA expects them to be available in early October. It’s not unusual for Quadro cards to be announced ahead of time, though as SIGGRAPH is also a popular venue for AMD pro card announcements, the earlier than usual announcement may have been for multiple reasons.
Ecosystem Announcements: New SDKs, Iray VR, & OptiX 4
Along with the announcement of the Quadro P-series, NVIDIA is also using SIGGRAPH to announce updates to various software and ecosystem initiatives within the company. Overall a number of the company’s SDKs are receiving an update in some form, ranging from rendering to video encode and capture, the latter taking advantage of Pascal’s 8K encode/decode capabilities.
Of particular note here, NVIDIA’s Iray physically based render plugin for 3D modeling applications is getting a significant update. As with other parts of their ecosystem, NVIDIA is doubling down on VR here as well. The next update to Iray will include support for generating panoramic VR lightfields – think high detail fixed position 3D panoramas – which can then be displayed on other devices. NVIDIA has been showing off an early version of this technology at GTC 2016 and other events, where it was used to show off renders of the company’s under-construction headquarters.
The Iray update will also be part of a larger focus on integrating the company’s software with their DGX-1 server, which incorporates 8 Tesla P100 accelerators. Iray will be coming to DGX-1 this fall, supporting the same features that are already available in multi-GPU setups with the older Quadro VCA. Longer term, in 2017, the company will be adding NVLink support for better multi-GPU scaling.
NVIDIA’s OptiX ray tracing engine is the other product that’s getting a DGX-1 update. OptiX 4.0, which is being released this week, adds support for the DGX-1, including NVLink support. It is interesting to note though that the company is only supporting clusters of 4 GPUs, despite the fact that DXG-1 has 8 GPUs (the other 4 GPUs form a second cluster). This may mean that OptiX needs direct GPU links to perform best – as in an 8-way configuration, some GPUs are 2 hops away – or it may just be that OptiX naturally doesn’t scale well beyond 4 GPUs.
Finally, NVIDIA is also announcing a change to how mental ray support is handled for Maya. Previous, integrating the ray tracer with Maya was handled by Autodesk, but NVIDIA is currently in the process of taking that over. The goal of doing so is to allow mental ray to be updated and have features added at the more brisk pace that NVIDIA tends to work at. The new plugin is currently scheduled to ship in September, and as one of their first actions, NVIDIA will be integrating a new global illumination engine, GI-Next.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Freakie - Monday, July 25, 2016 - linkNeutered FP16 really stinks. It was wishful thinking to be able to use the new Titan X for neural networking but I was at least hoping that Quadro would do it. Guess I won't be buying anything Nvidia just for the neural networking boost. The P100 is just way too pricey, and I think Nvidia is really shooting themselves in the foot for getting their products to be the de facto neural network accelerators. Good news for other vendors selling more capable hardware, though!
Yojimbo - Monday, July 25, 2016 - linkWhat other vendors have anything more capable than running the code using FP32 on Pascal?
Gondalf - Tuesday, July 26, 2016 - linkToo bad FP64 performance is 1/32 of this.
This quadro line is not competitive at all with Xeon Phi in a lot of applications. Reason why Intel has orders for 100K Phi this year.
extide - Tuesday, July 26, 2016 - linkNeural networking is largely FP16, not FP32. With P100, you get a 2x perf boost going from FP32 to FP16. On Quadro, and Titan X, there is nor performance boost. That's what he is talking about. Not FP32, not FP64, guys.
I thin Polaris also has a 2x speedup going from FP32 to FP16 -- and AMD tends to be much more generous in leaving the enhanced compute abilities intact on consumer SKU's, so maybe have a look over there.
jabbadap - Monday, July 25, 2016 - linkRyan, they did mention fp32 flops:
P6000 12TFlops amd P5000 8.9TFlops.
That would make clocks(boost clocks maybe):
P6000 12000/(2*3840) ~ 1.56GHz and P5000 8900/(2*2560) ~ 1.738 GHz
Ryan Smith - Monday, July 25, 2016 - linkInteresting. Thanks. That information was not available at my briefing last week.
jabbadap - Monday, July 25, 2016 - linkWell there's a bit typo on the article too, you are starting to speak m5000 after P6000, while obviously you mean P5000. But overall great read, thanks!
eddman - Monday, July 25, 2016 - linkNow I'm wondering what 1080 Ti is going to be (if we get it this gen to begin with. NVidia might delay it for the eventual 20x0 series, depending on when Vega shows up).
Given the past relations between quadro, tesla, titan and geforce, it's possible that it'd be a 3584 core chip but with higher clocks than titan. Later on, we might see a new titan black or X black, etc.
prashontech - Monday, July 25, 2016 - linkWell, the next gen would be the 11x0 series, not 20x0
eddman - Monday, July 25, 2016 - linkIt's not important. Just a number. Besides, a lot of people thought this gen is going to be 1x00, but nvidia trolled everyone.