AMD Releases Instinct MI210 Accelerator: CDNA 2 On a PCIe Card
by Ryan Smith on March 22, 2022 9:00 AM EST- Posted in
- GPUs
- AMD
- HPC
- AMD Instinct
- Infinity Fabric
- CDNA 2
With both GDC and GTC going on this week, this is a big time for GPUs of all sorts. And today, AMD wants to get in on the game as well, with the release of the PCIe version of their MI200 accelerator family, the MI210.
First unveiled alongside the MI250 and MI250X back in November, when AMD initially launched the Instinct MI200 family, the MI210 is the third and final member of AMD’s latest generation of GPU-based accelerators. Bringing the CDNA 2 architecture into a PCIe card, the MI210 is being aimed at customers who are after the MI200 family’s HPC and machine learning performance, but need it in a standardized form factor for mainstream servers. Overall, the MI200 is being launched widely today as part of AMD moving the entire MI200 product stack to general availability for OEM customers.
AMD Instinct Accelerators | ||||||
MI250 | MI210 | MI100 | MI50 | |||
Compute Units | 2 x 104 | 104 | 120 | 60 | ||
Matrix Cores | 2 x 416 | 416 | 480 | N/A | ||
Boost Clock | 1700MHz | 1700MHz | 1502MHz | 1725MHz | ||
FP64 Vector | 45.3 TFLOPS | 22.6 TFLOPS | 11.5 TFLOPS | 6.6 TFLOPS | ||
FP32 Vector | 45.3 TFLOPS | 22.6 TFLOPS | 23.1 TFLOPS | 13.3 TFLOPS | ||
FP64 Matrix | 90.5 TFLOPS | 45.3 TFLOPS | 11.5 TFLOPS | 6.6 TFLOPS | ||
FP32 Matrix | 90.5 TFLOPS | 45.3 TFLOPS | 46.1 TFLOPS | 13.3 TFLOPS | ||
FP16 Matrix | 362 TFLOPS | 181 TFLOPS | 184.6 TFLOPS | 26.5 TFLOPS | ||
INT8 Matrix | 362.1 TOPS | 181 TOPS | 184.6 TOPS | N/A | ||
Memory Clock | 3.2 Gbps HBM2E | 3.2 Gbps HBM2E | 2.4 Gbps HBM2 | 2.0 Gbps GDDR6 | ||
Memory Bus Width | 8192-bit | 4096-bit | 4096-bit | 4096-bit | ||
Memory Bandwidth | 3.2TBps | 1.6TBps | 1.23TBps | 1.02TBps | ||
VRAM | 128GB | 64GB | 32GB | 16GB | ||
ECC | Yes (Full) | Yes (Full) | Yes (Full) | Yes (Full) | ||
Infinity Fabric Links | 6 | 3 | 3 | N/A | ||
CPU Coherency | No | N/A | N/A | N/A | ||
TDP | 560W | 300W | 300W | 300W | ||
Manufacturing Process | TSMC N6 | TSMC N6 | TSMC 7nm | TSMC 7nm | ||
Transistor Count | 2 x 29.1B | 29.1B | 25.6B | 13.2B | ||
Architecture | CDNA 2 | CDNA 2 | CDNA (1) | Vega | ||
GPU | 2 x CDNA 2 GCD "Aldebaran" |
CDNA 2 GCD "Aldebaran" |
CDNA 1 "Arcturus" |
Vega 20 | ||
Form Factor | OAM | PCIe (4.0) | PCIe (4.0) | PCIe (4.0) | ||
Launch Date | 11/2021 | 03/2022 | 11/2020 | 11/2018 |
Starting with a look at the top-line specifications, the MI210 is an interesting variant to the existing MI250 accelerators. Whereas those two parts were based on a pair of Aldebaran (CDNA 2) dies in an MCM configuration on a single package, for MI210 AMD is paring everything back to a single die and related hardware. With MI250(X) requiring 560W in the OAM form factor, AMD essentially needed to halve the hardware anyhow to get things down to 300W for a PCIe card. So they’ve done so by ditching the second on-package die.
The net result is that the MI210 is essentially half of an MI250, both in regards to physical hardware and expected performance. The CNDA 2 Graphics Compute Die features the same 104 enabled CUs as on MI250, with the chip running at the same peak clockspeed of 1.7GHz. So workload scalability aside, the performance of the MI210 is for all practical purposes half of a MI250.
That halving goes for memory, as well. As MI250 paired 64GB of HBM2e memory with each GCD – for a total of 128GB of memory – MI210 brings that down to 64GB for the single GCD. AMD is using the same 3.2GHz HBM2e memory here, so the overall memory bandwidth for the chip is 1.6 TB/second.
In regards to performance, the use of a single Aldebaran die does make for some odd comparisons to AMD’s previous-generation PCIe card, the Radeon Instinct MI100. While clocked higher, the slightly reduced number of CUs relative to the MI100 means that for some workloads, the old accelerator is, at least on paper, a bit faster. In practice, MI210 has more memory and more memory bandwidth, so it should still have the performance edge the real world, but it’s going to be close. In workloads that can’t take advantage of CDNA 2’s architectural improvements, MI210 is not going to be a step up from MI100.
All of this underscores the overall similarity between the CDNA (1) and CDNA 2 architectures, and how developers need to make use of CDNA 2’s new features to get the most out of the hardware. Where CDNA 2 shines in comparison to CDNA (1) is with FP64 vector workloads, FP64 matrix workloads, and packed FP32 vector workloads. All three use cases benefit from AMD doubling the width of their ALUs to a full 64-bits wide, allowing FP64 operations to be processed at full speed. Meanwhile, when FP32 operations are packed together to completely fill the wider ALU, then they too can benefit from the new ALUs.
But, as we noted in our initial MI250 discussion, like all packed instruction formats, packed FP32 isn’t free. Developers and libraries need to be coded to take advantage of it; packed operands need to be adjacent and aligned to even registers. For software being written specifically for the architecture (e.g. Frontier), this is easily enough done, but more portable software will need updated to take this into account. And it’s for that reason that AMD wisely still advertises its FP32 vector performance at full rate (22.6 TFLOPS), rather than assuming the use of packed instructions.
The launch of the MI210 also marks the introduction of AMD’s improved matrix cores into a PCIe card. For CDNA 2, they’ve been expanded to allow full-speed FP64 matrix operation, bringing them up to the same 256 FLOPS rate as FP32 matrix operations, a 4x improvement over the old 64 FLOPS/clock/CU rate.
AMD GPU Throughput Rates (FLOPS/clock/CU) |
|||||
CDNA 2 | CDNA (1) | Vega 20 | |||
FP64 Vector | 128 | 64 | 64 | ||
FP32 Vector | 128 | 128 | 128 | ||
Packed FP32 Vector | 256 | N/A | N/A | ||
FP64 Matrix | 256 | 64 | 64 | ||
FP32 Matrix | 256 | 256 | 128 | ||
FP16 Matrix | 1024 | 1024 | 256 | ||
BF16 Matrix | 1024 | 512 | N/A | ||
INT8 Matrix | 1024 | 1024 | N/A |
Moving on, the PCIe format MI210 also gets a trio of Infinity Fabric 3.0 links along the top of the card, just like the MI100. This allows an MI210 card to be linked up with one or three other cards, forming a 2 or 4-way cluster of cards. Meanwhile, backhaul to the CPU or any other PCIe devices is provided via a PCIe 4.0 x16 connection, which is being powered by one of the flexible IF links from the GCD.
As previously mentioned, the TDP for the MI210 is set at 300W, the same level as the MI100 and MI50 before it – and essentially the limit for a PCIe server card. Like most server accelerators, this is fully passive dual slot card design, relying on significant airflow from the server chassis to keep things cool. The GPU itself is powered by a combination of the PCIe slot and an 8 pin, EPS12V connector at the rear of the card.
Otherwise, despite the change in form factors, AMD is going after much the same market with MI210 as they have MI250(X). Which is to say HPC users who specifically need a fast FP64 accelerator. Thanks to its heritage as a chip designed first and foremost for supercomputers (i.e. Frontier), the MI200 family currently stands alone in its FP64 vector and FP64 matrix performance, as rival GPUs have focused instead on improving performance at the lower precisions used in most industry/non-scientific workloads. Though even at lower precisions, the MI200 family is nothing to sneeze at with tis 1024 FLOPS-per-CU rate on FP16 and BF16 matrix operations.
Wrapping things up, MI210 is slated to become available today from AMD’s usual server partners, including ASUS, Dell, Supermicro, HPE, and Lenovo. Those vendors are now also offering servers based on AMD’s MI250(X) accelerators, so AMD’s more mainstream customers will have access to systems based on AMD’s full lineup of MI200 accelerators.
39 Comments
View All Comments
Khanan - Thursday, March 24, 2022 - link
“Don't agree. Maybe not fp64, but fp32 and lower-precision have applicability to games. At the lowest-precision, they can use it for AI-driven upscaling, like Nvidia's DLSS.”“They have a long track record with iGPUs. At my job, we use their iGPUs in a shipping product.”
So you’re talking about track records and iGPUs and comparing dGPUs to iGPUs of a vendor that never delivered any good dGPUs. I think it’s safe to say that you’re a) not a dev b) trolling and c) never seriously used a AMD Pro card. Last time I checked drivers were pretty good, nobody cares about ancient track records from yesteryear. And hearsay isn’t relevant either. You used the words “track record” instead of “my experience with”, so maybe stop talking out of your ass for a second. Intel is a clusterfuck when it comes to GPUs, a meme at best and a disaster otherwise. Nvidia is good but also very expensive and locks you down into their stuff with CUDA and software limitations. AMD has great open source drivers for Linux, Nvidia isn’t even comparable. You’d know that if you were a serious or real dev.
Maybe could’ve been used for something else, but not for FSR 2.0 as AMD aims for compatibility and it won’t use any special cores, probably the same as with FSR 1.0 and if you ask me that’s the way to go, not Nvidias.
“How many developers have that kind of money? This is why Nvidia is winning the the GPU-compute race. Because they support development on their *entire* stack of GPUs, even down to the near-bottom tier of gaming GPUs and the Jetson development boards.”
A coincidence which happened because Nvidia needs tensor cores to do DLSS and RT denoising. A coincidence and intentional proprietarity locking you artificially into stuff.
I don’t think you need tensor cores to do development for anything, not anything a Radeon Pro can’t do as well. And then we have to wait and see if RDNA3 isn’t coming with some sort of AI cores as well, but at this point I guess it’s unlikely.
mode_13h - Thursday, March 24, 2022 - link
> So you’re talking about track records and iGPUs and comparing dGPUs> to iGPUs of a vendor that never delivered any good dGPUs.
Yes, because they're far more similar than they are different. 95% of the software stack needed to do one is also needed for the other.
If you paid attention to my concerns, I'm principally interested in software issues. I don't need Intel's dGPUs to be the best performance, as long as the software support is there for what I/we need and the perf/$ and perf/W is reasonably competitive.
> I think it’s safe to say that you’re a) not a dev b) trolling
This is precisely a troll comment, which is why I'm not going to address it or any similar attempts to impeach my credentials. You're free to disregard my statements and opinions, however I owe you nothing.
> c) never seriously used a AMD Pro card.
This exactly misses my point. Nvidia and Intel fully support development on virtually their entire hardware stack. Why should I have to pay $$$ for an AMD Pro card? If AMD wants developer mindshare, they need to reach developers where they *are*, not blame developers for not beating a path to their door.
> Last time I checked drivers were pretty good,
Where's ROCm's RDNA support?
> nobody cares about ancient track records from yesteryear.
Yes they do. AMD loses hearts and minds when people buy RDNA cards and wait *years* for ROCm support never to materialize. Or when it *breaks* what had been working on earlier generations, like Vega and Polaris.
You've clearly never read the endless threads of people trying to get/keep their AMD GPUs working on DaVinci Resolve, for instance. Many are continuing to run heavily-outdated images, for (very real) fears of breaking their setup.
I know that's slightly off-topic, but not really, since I'm talking about lack of stability in AMD's hardware support of their GPU-compute stack.
> And hearsay isn’t relevant either. You used the words “track record”
> instead of “my experience with”
If you don't value my perspective, that's not *my* problem.
> Intel is a clusterfuck when it comes to GPUs
It's funny how you say this, right after attacking *me* for using hearsay and ancient track records.
The fact of the matter is that Intel has been among the first to support each OpenCL release since about version 2.0. AMD seemed to stall out around 2.1. It was a nearly 3 years late on OpenCL 2.2 support. After several minutes of searching, I haven't even found any clear evidence that AMD supports OpenCL 3.0, yet both Intel and (even!) Nvidia do.
> AMD has great open source drivers for Linux
So does Intel.
> You’d know that if you were a serious or real dev.
How do you know I don't? You didn't ask.
Open source drivers are definitely nice. They're not deal-breaker, for me. What I care about most is:
1. Hardware support - the compute stack should work on everything from whatever GCN-based APUs people have to the latest gaming GPUs. Both for developer access, and also so that developers have some assurance they'll be able to support customers with this hardware.
2. API support - I have no more interest in using Hip than I do in using CUDA. I only trust Hip to be any good or stable on AMD hardware, which means it effectively locks me into AMD, even if it's open source. I'm willing to use open source libraries, like deep learning frameworks, that have a Hip or CUDA backend, however. But I will not personally or professionally invest in developing for a vendor-specific API.
3. Platform support. Right now, all I care about is Linux/x86. I think AMD hasn't required kernel patches to use new ROCm releases in a couple years, which is definitely progress.
> I don’t think you need tensor cores to do development for anything
That's not my point. My point is that if AMD wants broader support for hardware features like their Matrix Cores and other new CDNA features, they should worry about getting these features into the hands of more developers via gaming cards.
With Nvidia GPUs recently being so ridiculously expensive and hard to find, AMD has been squandering a huge opportunity by their poor/absent ROCm support & missing hardware features on RDNA GPUs.
> not anything a Radeon Pro can’t do as well.
There's nothing magic about Radeon Pro. You know that, right? They're just the same as AMD's gaming GPUs, with maybe a few extra features enabled, maybe some more RAM, and costing lots more $$$.
> we have to wait and see if RDNA3 isn’t coming with some sort of AI cores as well
Yup. I'm willing to keep an open mind. AMD has finally started coming around with ROCm support of RDNA2, so it's still possible they'll turn over a new leaf.
Khanan - Friday, March 25, 2022 - link
Here just a example of you trolling:https://en.m.wikipedia.org/wiki/ROCm
It is clearly stated that all RDNA and GCN 4 and 5 are supported, pro card or not. And of course support is better for Pro cards, that’s why you buy Pro cards, to get more support. Not any different with Nvidia. Is CUDA better supported than ROCm, yes, but ROCm did a lot of progression and gets better every day. Is Intel even comparable to AMD or Nvidia? No. They simply don’t have any noteworthy GPUs and thus cannot even be compared to those. Intel’s new dGPUs are delayed since 4 months almost, do you know why? Because they drivers suck. It’s well known the products got delayed because of that. They will release their shit so late, that it will compete with current gen AND RDNA 3 and ADL, nobody will buy it. Then later this year their CPUs will get destroyed by Zen 4. Hard days for Intel fanboys.
mode_13h - Friday, March 25, 2022 - link
> It is clearly stated that all RDNA and GCN 4 and 5 are supported, pro card or not.This proves you don't actually know anything about ROCm, because that's an utter fiction.
Their documentation is about as clear as mud about which devices are actually known to work, and the recent history of ROCm has been a litany of breakage and dropping support for even some recently-sold GPUs.
Even today, this is still a live issue:
https://github.com/RadeonOpenCompute/ROCm/issues/1...
As of now, AMD cannot even answer a simple question about this. In the docs, he found only a list of 7 Instinct and Pro models.
> ROCm did a lot of progression and gets better every day.
According to whom? And how do expect me to believe this statement, given you obviously know so little about ROCm that you cite Wikipedia over anything on amd.com or the ROC github.
> Is Intel even comparable to AMD or Nvidia? No. They simply don’t
> have any noteworthy GPUs and thus cannot even be compared to those.
Spoken like a true gamer.
Khanan - Friday, March 25, 2022 - link
And I wanna add, you made a few fair points, but your shilling for Intel and too many negative comments about AMD which are simply outdated blabber, leave a bad taste.Matrix cores won’t come to consumer, AMD isn’t into locking consumers into just one gen of product, thus they aren’t needed for FSR 2.0 and 2.0 will run on any hardware that also supports 1.0 (most probably).
mode_13h - Friday, March 25, 2022 - link
> your shilling for IntelHow is anything I said "shilling for Intel"? The only good thing I said about them was that their compute stack support is better, which I supported with specific points about OpenCL support compared with AMD. Oh, and I pointed to *actual benchmark results* to correct Spunjji, who usually knows their stuff.
If I were trying to shill for Intel, don't you think I'd be a bit more effusive? No, you apparently don't seem to think beyond seeing anything you don't like about AMD.
At first, I thought maybe you were an AMD employee, but I actually know AMD employees and I'm getting the clear sense that you wouldn't meet their standards.
Khanan - Sunday, March 27, 2022 - link
Yep maybe I’m wrong about ROCm, maybe not. But at least I can admit it, while 3 people call you biased and you’re still deflecting like a kid. And then I didn’t compare real GPUs that are released for years to some iGPU trash and unreleased stuff from Intel, which really just shows that you’re a huge fanboy of Intel. I really don’t care about you being critical about AMD, or let’s say I barely care. But what I can’t accept is you praising Intel for unreleased stuff or trashy iGPUs at the same time. That’s utterly dumb and unacceptable. We will see how good their GPUs will be, for now they postponed the release for over 2 years because of a mix of terrible performance and terrible drivers. Not much to see so far and nothing that would confirm your points. Some people are just hard fanboys and can’t admit it.mode_13h - Monday, March 28, 2022 - link
> maybe I’m wrong about ROCm, maybe not. But at least I can admit it"maybe not" doesn't sound like an admission. Better yet, don't take a strong position on something you don't know anything about, and then you won't be in a position where you need to climb down.
> while 3 people call you biased
Which 3? All I see are two gamers and @Spunjji who made a generally correct, if potentially anachronistic statement. Spunjji is more than capable of taking me on, if they want. The mere fact that all they did was to quip about Intel iGPU performance suggests nothing about my core points.
What you're missing is that a mountain of bad counterpoints doesn't add up to a few good ones. Your argument is only as strong as your strongest point, and I haven't seen you make a strong refutation of any of my core claims.
> you’re still deflecting like a kid.
Don't blame me for your own trolling fails.
> And then I didn’t compare real GPUs that are released for years to some iGPU trash
That shows a lack of understanding on your part, that the software stack for these devices is the same mostly irrespective of whether they're iGPUs or not. You're so caught up in hardware that you can't even see my whole point centers around software.
> you’re a huge fanboy of Intel.
I like whatever actually works for me, and so far Intel has a better track record. Did I mention that I use them in shipping product, earning $Millions of annual revenue with thousands of customers having multiple systems and support contracts which pick up the phone or email whenever anything goes wrong? And if we can't solve the problem remotely, we have to send a very highly-qualified and highly-paid support tech to fix it on site. That's dependability.
Whatever state their gaming drivers might be in, their compute stack is working well for us. And that builds the kind of confidence I'm talking about.
So, call me what you want, but the zero other people reading this thread will see that I never made any effusive claims about Intel. I made a few factual points and that's it. You'd think a real fan would be rather more emphatic.
> But what I can’t accept is you praising Intel for unreleased stuff
I didn't. I just said I expected it would probably be my next dGPU. And I further clarified that by saying I intended to wait at least for initial impressions. But, I actually tend not to be on the bleeding edge. So, it might take months before I finally decide to buy anything. It just depends on many things.
As you'll probably know, Intel is set to launch the first DG2 products by later, this week. I suggest you keep your powder dry, because then you'll have some much more substantial threads to tear into.
> nothing that would confirm your points.
Which points did I even make? I linked to some early benchmarks of Tiger Lake G7. That's it.
I even said that all I needed from Intel's GPUs was to be merely competitive on perf/$ and perf/W. If they can get in the ballpark, then the GPU compute software stack is what's much more important to me.
> Some people are just hard fanboys and can’t admit it.
Agreed. Maybe even some people in this thread!
Khanan - Monday, March 28, 2022 - link
Where exactly do you “compute” anything with a iGPU? Please. That’s just ridiculous. We will see how their dGPUs fare, iGPUs don’t prove much and I don’t agree with your points “it’s the same software”, hahahaha, not based on my vast experience.And then again you keep adapting your opinions and pretending you had this opinion from the get go, you’re a smart little weasel. Too bad it won’t work with me.
mode_13h - Monday, March 28, 2022 - link
> Where exactly do you “compute” anything with a iGPU? Please. That’s just ridiculous.They have more raw compute performance than the CPU cores, and leave the CPU cores free to do other things. The performance isn't great, but it met our needs.
Interesting fact: Intel's older Iris Pro iGPUs actually had more fp64 performance than Nvidia or AMD's gaming cards of that era. That's because they cut it back by only 1/2 of fp32 vs. Nvidia and AMD cutting it to 1/32 or 1/16.
> then again you keep adapting your opinions
When did I change any opinion I voiced?
> Too bad it won’t work with me.
Of course. Perhaps you're merely posing as a pro-AMD troll, but actually you're from Nvidia and just trying to give the AMD camp a bad image. If you gave up after the exchange had reached a reasonable outcome, that'd look far too decent.