NVIDIA Details DRIVE AGX Orin: A Herculean Arm Automotive SoC For 2022
by Ryan Smith on December 18, 2019 8:30 AM ESTWhile NVIDIA’s SoC efforts haven’t gone entirely to plan since the company first started on them over a decade ago, NVIDIA has been able to find a niche that works in the automotive field. Backing the company’s powerful DRIVE hardware, these SoCs have become increasingly specialized as the DRIVE platform itself evolves to meet the needs of the slowly maturing market for the brains behind self-driving cars. And now, NVIDIA’s family of automotive SoCs is growing once again, with the formal unveiling of the Orin SoC.
First outlined as part of NVIDIA’s DRIVE roadmap at GTC 2018, NVIDIA CEO Jensen Huang took the stage at GTC China this morning to properly introduce the chip that will be powering the next generation of the DRIVE platform. Officially dubbed the NVIDIA DRIVE AGX Orin, the new chip will eventually succeed NVIDIA’s currently shipping Xavier SoC, which has been available for about the last year now. In fact, as has been the case with previous NVIDIA DRIVE unveils, NVIDIA is announcing the chip well in advance: the company isn't expecting the chip to be fully ready for automakers until 2022.
What lies beneath Orin then is a lot of hardware, with NVIDIA going into some high-level details on certain parts, but skimming over others. Overall, Orin is a 17 billion transistor chip, almost double the transistor count of Xavier and continuing the trend of very large, very powerful automotive SoCs. NVIDIA is not disclosing the manufacturing process being used at this time, but given their timeframe, some sort of 7nm or 5nm process (or derivative) is pretty much a given. And NVIDIA will definitely need a smaller manufacturing process – to put things in comparison, the company’s top-end Turing GPU, TU102, takes up 754mm2 for 18.6B transistors, so Orin will pack in almost as many transistors as one of NVIDIA’s best GPUs today.
NVIDIA ARM SoC Specification Comparison | |||||
Orin | Xavier | Parker | |||
CPU Cores | 12x Arm "Hercules" | 8x NVIDIA Custom ARM "Carmel" | 2x NVIDIA Denver + 4x Arm Cortex-A57 |
||
GPU Cores | "Next-Generation" NVIDIA iGPU | Xavier Volta iGPU (512 CUDA Cores) |
Parker Pascal iGPU (256 CUDA Cores) |
||
INT8 DL TOPS | 200 TOPS | 30 TOPS | N/A | ||
FP32 TFLOPS | ? | 1.3 TFLOPs | 0.7 TFLOPs | ||
Manufacturing Process | 7nm? | TSMC 12nm FFN | TSMC 16nm FinFET | ||
TDP | ~65-70W? | 30W | 15W |
Those transistors, in turn, will be driving several elements. Surprisingly, for today’s announcement NVIDIA has confirmed what CPU core they’ll be using. And even more surprisingly, it isn’t theirs. After flirting with both Arm and NVIDIA-designed CPU cores for several years now, NVIDIA has seemingly settled down with Arm. Orin will include a dozen of Arm’s upcoming Hercules CPU cores, which are from Arm’s client device line of CPU cores. Hercules, in turn, succeeds today’s Cortex-A77 CPU cores, with customers recently receiving the first IP for the core. For the moment we have very little information on Hercules itself, but Arm has previously disclosed that it will be a further refinement of the A76/A77 cores.
I won’t spend too much time dwelling on NVIDIA’s decision to go with Arm’s Cortex-A cores after using their own CPU cores for their last couple of SoCs, but it’s consistent with the direction we’ve seen most of Arm’s other high-end customers take. Developing a fast, high-performance CPU core only gets harder and harder every generation. And with Arm taking a serious stab at the subject, there’s a lot of sense in backing Arm’s efforts by licensing their cores as opposed to investing even more money in further improving NVIDIA’s Project Denver-based designs. It does remove one area where NVIDIA could make a unique offering, but on the flip side it does mean they can focus more on their GPU and accelerator efforts.
Speaking of GPUs, Jensen revealed very little about the GPU technology that Orin will integrate. Besides confirming that it’s a “next generation” architecture that offers all of the CUDA core and tensor functionality that NVIDIA has become known for, nothing else was stated. This isn’t wholly surprising since NVIDIA hasn’t disclosed anything about their forthcoming GPU architectures – we haven’t seen a roadmap there in a while – but it means the GPU side is a bit of a blank slate. Given the large gap between now and Orin’s launch, it’s not even clear if the architecture will be NVIDIA’s next immediate GPU architecture or the one after that, however given how Xavier’s development went and the extensive validation required for automotive, NVIDIA’s 2020(ish) GPU architecture seems like a safe bet.
Meanwhile NVIDIA’s Deep Learning Accelerator (DLA) blocks will also be making a return. These blocks don’t get too much attention since they’re unique to NVIDIA’s DRIVE SoCs, but these are hardware blocks to further offload neural network inference, above and beyond what NVIDIA’s tensor cores already do. On the programmable/fixed-function scale they’re closer to the latter, with the task-specific hardware being a good fit for the power and energy-efficiency needs NVIDIA is shooting for.
All told, NVIDIA expects Orin to deliver 7x the 30 INT8 TOPS performance of Xavier, with the combination of the GPU and DLA pushing 200 TOPS. It goes without saying that NVIDIA is still heavily invested in neural networks as the solution to self-driving systems, so they are similarly heavily investing in hardware to execute those neural nets.
Rounding out the Orin package, NVIDIA’s announcement also confirms that the chip will offer plenty of hardware for supporting features. The chip will offer 4x 10 Gigabit Ethernet hosts for sensors and in-vehicle communication, and while the company hasn’t disclosed how many camera inputs the SoC can field, it will offer 4Kp60 video stream encoding and 8Kp30 decoding for H.264/HEVC/VP9. The company has also set a goal for 200GB/sec of memory bandwidth. Given the timeframe for Orin and what NVIDIA does for Xavier today, an 256-bit memory bus with LPDDR5 support sounds like a shoe-in, but of course this remains to be confirmed.
Finally, while NVIDIA hasn’t disclosed any official figures for power consumption, it’s clear that overall power usage is going up relative to Xavier. While Orin is expected to be 7x faster than Xavier, NVIDIA is only claiming it’s 3x as power efficient. Assuming NVIDIA is basing all of this on INT8 TOPS as they usually do, then the 1 TOPS/Watt Xavier would be replaced by the 3 TOPS/Watt Orin, putting the 200 TOPS chip at around 65-70 Watts. Which is admittedly still fairly low for a single chip at a company that sells 400 Watt GPUs, but it could add up if NVIDIA builds another multi-processor board like the DRIVE Pegasus.
Overall, NVIDIA certainly has some lofty expectations for Orin. Like Xavier before it, NVIDIA intends for various forms of Orin to power everything from level 2 autonomous cars right up to full self-driving level 5 systems. And, of course, it will do so while being able to provide the necessary ASIL-D level system integrity that will be expected for self-driving cars.
But as always, NVIDIA is far from the only silicon vendor with such lofty goals. The company will be competing with a number of other companies all providing their own silicon for self-driving cars – ranging from start-ups to the likes of Intel – and while Orin will be a big step forward in single-chip performance for the company, it’s still very much the early days for the market as a whole. So NVIDIA has their work cut out for them across hardware, software, and customer relations.
Source: NVIDIA
33 Comments
View All Comments
Fataliity - Monday, December 23, 2019 - link
That's what the ARM core does. The redundancy.My point was the T4 is a current-gen product that you can compare performance against, instead of using 2-3gen old hardware to compare. And in Xavier, their DL TOPS (Nvidia GPUS itself) was only 20-21. they added 10TOPS by counting the arm core.
While the T4 is a current-gen that you can compare to future gen and get an idea of the improvement of their generation they are offering.
The Xavier was 30TOPS, but it was 20 TOPS per GPU (redundancy) and 10 TOPS arm core.
Fataliity - Thursday, December 19, 2019 - link
Where has Ian been for this stuff? He actually does research and makes sure what Nvidia or Intel or other companies provide in slides is actually relevant to what their current products are. It's easy to make slides look good when your comparing to a 4-5 year old arch by now.MASSAMKULABOX - Thursday, December 19, 2019 - link
I thought of Nvidia driving my car, then next I thought of their gaming Heritage, and the the place where many buy their games , the steam platform. I then realized in future I may have a steam powered car , Chitty-chitty-Bang-bang (2 large cores and two little cores)