AMD Releases Instinct MI210 Accelerator: CDNA 2 On a PCIe Card
by Ryan Smith on March 22, 2022 9:00 AM EST- Posted in
- GPUs
- AMD
- HPC
- AMD Instinct
- Infinity Fabric
- CDNA 2
With both GDC and GTC going on this week, this is a big time for GPUs of all sorts. And today, AMD wants to get in on the game as well, with the release of the PCIe version of their MI200 accelerator family, the MI210.
First unveiled alongside the MI250 and MI250X back in November, when AMD initially launched the Instinct MI200 family, the MI210 is the third and final member of AMD’s latest generation of GPU-based accelerators. Bringing the CDNA 2 architecture into a PCIe card, the MI210 is being aimed at customers who are after the MI200 family’s HPC and machine learning performance, but need it in a standardized form factor for mainstream servers. Overall, the MI200 is being launched widely today as part of AMD moving the entire MI200 product stack to general availability for OEM customers.
AMD Instinct Accelerators | ||||||
MI250 | MI210 | MI100 | MI50 | |||
Compute Units | 2 x 104 | 104 | 120 | 60 | ||
Matrix Cores | 2 x 416 | 416 | 480 | N/A | ||
Boost Clock | 1700MHz | 1700MHz | 1502MHz | 1725MHz | ||
FP64 Vector | 45.3 TFLOPS | 22.6 TFLOPS | 11.5 TFLOPS | 6.6 TFLOPS | ||
FP32 Vector | 45.3 TFLOPS | 22.6 TFLOPS | 23.1 TFLOPS | 13.3 TFLOPS | ||
FP64 Matrix | 90.5 TFLOPS | 45.3 TFLOPS | 11.5 TFLOPS | 6.6 TFLOPS | ||
FP32 Matrix | 90.5 TFLOPS | 45.3 TFLOPS | 46.1 TFLOPS | 13.3 TFLOPS | ||
FP16 Matrix | 362 TFLOPS | 181 TFLOPS | 184.6 TFLOPS | 26.5 TFLOPS | ||
INT8 Matrix | 362.1 TOPS | 181 TOPS | 184.6 TOPS | N/A | ||
Memory Clock | 3.2 Gbps HBM2E | 3.2 Gbps HBM2E | 2.4 Gbps HBM2 | 2.0 Gbps GDDR6 | ||
Memory Bus Width | 8192-bit | 4096-bit | 4096-bit | 4096-bit | ||
Memory Bandwidth | 3.2TBps | 1.6TBps | 1.23TBps | 1.02TBps | ||
VRAM | 128GB | 64GB | 32GB | 16GB | ||
ECC | Yes (Full) | Yes (Full) | Yes (Full) | Yes (Full) | ||
Infinity Fabric Links | 6 | 3 | 3 | N/A | ||
CPU Coherency | No | N/A | N/A | N/A | ||
TDP | 560W | 300W | 300W | 300W | ||
Manufacturing Process | TSMC N6 | TSMC N6 | TSMC 7nm | TSMC 7nm | ||
Transistor Count | 2 x 29.1B | 29.1B | 25.6B | 13.2B | ||
Architecture | CDNA 2 | CDNA 2 | CDNA (1) | Vega | ||
GPU | 2 x CDNA 2 GCD "Aldebaran" |
CDNA 2 GCD "Aldebaran" |
CDNA 1 "Arcturus" |
Vega 20 | ||
Form Factor | OAM | PCIe (4.0) | PCIe (4.0) | PCIe (4.0) | ||
Launch Date | 11/2021 | 03/2022 | 11/2020 | 11/2018 |
Starting with a look at the top-line specifications, the MI210 is an interesting variant to the existing MI250 accelerators. Whereas those two parts were based on a pair of Aldebaran (CDNA 2) dies in an MCM configuration on a single package, for MI210 AMD is paring everything back to a single die and related hardware. With MI250(X) requiring 560W in the OAM form factor, AMD essentially needed to halve the hardware anyhow to get things down to 300W for a PCIe card. So they’ve done so by ditching the second on-package die.
The net result is that the MI210 is essentially half of an MI250, both in regards to physical hardware and expected performance. The CNDA 2 Graphics Compute Die features the same 104 enabled CUs as on MI250, with the chip running at the same peak clockspeed of 1.7GHz. So workload scalability aside, the performance of the MI210 is for all practical purposes half of a MI250.
That halving goes for memory, as well. As MI250 paired 64GB of HBM2e memory with each GCD – for a total of 128GB of memory – MI210 brings that down to 64GB for the single GCD. AMD is using the same 3.2GHz HBM2e memory here, so the overall memory bandwidth for the chip is 1.6 TB/second.
In regards to performance, the use of a single Aldebaran die does make for some odd comparisons to AMD’s previous-generation PCIe card, the Radeon Instinct MI100. While clocked higher, the slightly reduced number of CUs relative to the MI100 means that for some workloads, the old accelerator is, at least on paper, a bit faster. In practice, MI210 has more memory and more memory bandwidth, so it should still have the performance edge the real world, but it’s going to be close. In workloads that can’t take advantage of CDNA 2’s architectural improvements, MI210 is not going to be a step up from MI100.
All of this underscores the overall similarity between the CDNA (1) and CDNA 2 architectures, and how developers need to make use of CDNA 2’s new features to get the most out of the hardware. Where CDNA 2 shines in comparison to CDNA (1) is with FP64 vector workloads, FP64 matrix workloads, and packed FP32 vector workloads. All three use cases benefit from AMD doubling the width of their ALUs to a full 64-bits wide, allowing FP64 operations to be processed at full speed. Meanwhile, when FP32 operations are packed together to completely fill the wider ALU, then they too can benefit from the new ALUs.
But, as we noted in our initial MI250 discussion, like all packed instruction formats, packed FP32 isn’t free. Developers and libraries need to be coded to take advantage of it; packed operands need to be adjacent and aligned to even registers. For software being written specifically for the architecture (e.g. Frontier), this is easily enough done, but more portable software will need updated to take this into account. And it’s for that reason that AMD wisely still advertises its FP32 vector performance at full rate (22.6 TFLOPS), rather than assuming the use of packed instructions.
The launch of the MI210 also marks the introduction of AMD’s improved matrix cores into a PCIe card. For CDNA 2, they’ve been expanded to allow full-speed FP64 matrix operation, bringing them up to the same 256 FLOPS rate as FP32 matrix operations, a 4x improvement over the old 64 FLOPS/clock/CU rate.
AMD GPU Throughput Rates (FLOPS/clock/CU) |
|||||
CDNA 2 | CDNA (1) | Vega 20 | |||
FP64 Vector | 128 | 64 | 64 | ||
FP32 Vector | 128 | 128 | 128 | ||
Packed FP32 Vector | 256 | N/A | N/A | ||
FP64 Matrix | 256 | 64 | 64 | ||
FP32 Matrix | 256 | 256 | 128 | ||
FP16 Matrix | 1024 | 1024 | 256 | ||
BF16 Matrix | 1024 | 512 | N/A | ||
INT8 Matrix | 1024 | 1024 | N/A |
Moving on, the PCIe format MI210 also gets a trio of Infinity Fabric 3.0 links along the top of the card, just like the MI100. This allows an MI210 card to be linked up with one or three other cards, forming a 2 or 4-way cluster of cards. Meanwhile, backhaul to the CPU or any other PCIe devices is provided via a PCIe 4.0 x16 connection, which is being powered by one of the flexible IF links from the GCD.
As previously mentioned, the TDP for the MI210 is set at 300W, the same level as the MI100 and MI50 before it – and essentially the limit for a PCIe server card. Like most server accelerators, this is fully passive dual slot card design, relying on significant airflow from the server chassis to keep things cool. The GPU itself is powered by a combination of the PCIe slot and an 8 pin, EPS12V connector at the rear of the card.
Otherwise, despite the change in form factors, AMD is going after much the same market with MI210 as they have MI250(X). Which is to say HPC users who specifically need a fast FP64 accelerator. Thanks to its heritage as a chip designed first and foremost for supercomputers (i.e. Frontier), the MI200 family currently stands alone in its FP64 vector and FP64 matrix performance, as rival GPUs have focused instead on improving performance at the lower precisions used in most industry/non-scientific workloads. Though even at lower precisions, the MI200 family is nothing to sneeze at with tis 1024 FLOPS-per-CU rate on FP16 and BF16 matrix operations.
Wrapping things up, MI210 is slated to become available today from AMD’s usual server partners, including ASUS, Dell, Supermicro, HPE, and Lenovo. Those vendors are now also offering servers based on AMD’s MI250(X) accelerators, so AMD’s more mainstream customers will have access to systems based on AMD’s full lineup of MI200 accelerators.
39 Comments
View All Comments
Khanan - Friday, March 25, 2022 - link
I agree with this. Anyone who praises unreleased products can only be a fan/trolling or whatever reason be heavily biased.He made a lot of claims about driver problems with AMD cards, I can’t verify them, however I’m pretty sure they are exaggerated. If you want to do it there is a way, full stop. This is true for PC tech since infinity.
And about Matrix cores: they are only a thing with new CDNA GPUs don’t hold your breath with them being released for RDNA3, they won’t. FSR 2.0 won’t need them so they aren’t needed, AMD isn’t into locking support for their support for just 1 gen of cards, unlike Nvidia who did it with 20 series, despite DLSS being a disaster at the beginning and until release of 2.0. And yes it was absolutely possible to release DLSS for all cards, 1.9 proves this, just no interest by Nvidia who want to copy Apple as much as possible. Great they couldn’t buy ARM, nobody needed that gridlock.
mode_13h - Friday, March 25, 2022 - link
> Anyone who praises unreleased productsWhat I praised was their compute stack support for their iGPUs. Those *are* released product, you know?
Also, Tiger Lake has been a released product for almost 1.5 years.
> fan/trolling or whatever reason be heavily biased.
I'm definitely sensing some of that, in this thread.
> driver problems with AMD cards
Their GPU Compute stack. Don't mis-characterize my position. I have nothing to say about their graphics drivers, because that's not my primary focus.
Again, this comment thread is on an article for AMD Compute accelerators. So, it's relevant in ways that talking about graphics & gaming are not.
> And about Matrix cores: they are only a thing with new CDNA GPUs
> don’t hold your breath with them being released for RDNA3
Thanks, I won't. I mentioned that as a suggestion, and nothing more.
I'd like to see AMD be more competitive on the GPU compute front. I was really supporting them through the early days of ROCm and when they were pushing standards like OpenCL and HSA.
Sometimes, criticism can come from a place of concern, you know? It doesn't always have to be rooted in wanting to tarnish, undermine, and destroy.
mode_13h - Friday, March 25, 2022 - link
Wow, the posse grows!> You clearly are an Intel shill.
All I did was correct Spunjji, in pointing out that the Tiger Lake G7 iGPUs were actually competitive, which I supported with some evidence. You guys haven't provided nothing to support your allegations, except a link to a sketchy Wikipedia article.
> Anything there should be taken with a grain of salt.
Benchmarks are benchmarks, though. That's real data, with day 1-quality drivers. If anything, it should've gotten better since then.
If you have any evidence which can invalidate what I claimed, you're free to provide it.
> Your past posts show this to be true and accurate.
In this thread? In general? Got links? If you're such a studied expert on me, why don't I recognize your username?
> You come out and disclaim how glorious Intel is in everything
"disclaim"? Wow, if AMD is paying you guys, they should ask for a refund.
> shit on anything AMD every chance you get.
No, that doesn't sound like me. There are posters who do that, however. You must have me confused with one of them.
> Perhaps you should step back a moment and remove some bias from your decision making?
At my job, we make decisions based on data, our customers, and the market for our products. In my personal purchasing decisions, I have goals for projects I want to work on, and I know which APIs I want to use. So, my decisions amount to looking at the hardware offerings and how well those have been working for others who are doing similar sorts of work.
> Seriously. You are here claiming that you would choose unreleased hardware
I'm not going to buy it on day 1. I'm going to at least wait and see what initial impressions of it are. All I said is that my current expectation is that I'll probably go with Intel, next time.
> Well who knows. No logical or rational person would make the decision.
I explained my rationale in quite some detail. You're free to disagree and make your own decisions for your own reasons.
> A logical and rational person would wait, keep their mouth shut and ...
Ah, that's what this is about. Well, if you guys are intent on trying to shut down any negative comments about AMD, this appears to be backfiring.
Why don't you just keep at it, and see where it goes next? I can promise it's not going to make AMD look any better. I have no personal agenda against AMD, but if somebody is calling me a liar, it'll force me to drag out some unflattering evidence to prove I'm not.
Khanan - Sunday, March 27, 2022 - link
Just don’t talk about unreleased stuff and praise Intel for things they didn’t do. Their gaming drivers are terrible, you can quickly google this in 5 seconds, but maybe you’re just trolling. Don’t praise unreleased products full stop. AMD did a lot of progress recently with ROCm this is what I recently read, so don’t expect me to be interested in your gossip, there’s a clear bias you have with Intel whether you want to admit it or not. IGPUs aren’t relevant to this conversation, it’s a simple fact. They do not compete with full GPU cards. So don’t talk about it being better than competing products that are proven and released. Intel recently postponed release of Arc because of terrible drivers, doesn’t really confirm anything you say about Intels driver performance, to the contrary. And I would stay far way from any GPUs of them for at least a year until they have mature drivers that are proven.If two people come and say you’re a Intel shill or biased I would start thinking about myself and not endlessly deflect everything. Start being a grown up maybe.
mode_13h - Monday, March 28, 2022 - link
> Their gaming drivers are terrible, you can quickly google this in 5 secondsI never said anything about that. My only real experience with their graphics drivers is just running regular desktop apps.
> Don’t praise unreleased products full stop.
I didn't. However, wouldn't it be hypocrisy for you to say that while simultaneously trashing unreleased products?
> AMD did a lot of progress recently with ROCm this is what I recently read
It's great that you're reading up on ROCm. That's at least something you can take away from this.
I still have hopes ROCm will mature well. Having followed it from the early days, it's been a much longer journey than I imagined.
The core problem with ROCm is that it's funded by their professional and cloud compute sales. That means all their priorities are driven by those sales and contracts, which tends to leave independent developers freezing out in the cold. And it's students and independent developers that are often at the forefront of innovation.
I know they have some good people working on it. AMD just needs to decide they're going to make a similar level of investment in building a similar developer community as Nvidia did. The formula is pretty simple, but it takes investment and time.
> doesn’t really confirm anything you say about Intels driver performance, to the contrary.
I'm not talking about gaming. This article isn't even about a gaming card. Maybe you just saw AMD and failed to notice that?
> If two people come and say you’re a Intel shill or biased I would start thinking about myself
If you don't know when to trust your own competence on a subject matter, then I feel sorry for you. I suggest you get good enough at something that you can build some confidence and know when not to listen to others.
> not endlessly deflect everything.
I'm not the slightest bit sorry for defending against a bunch of malicious and poorly-informed critiques and allegations.
> Start being a grown up maybe.
That would be good advice for some in this thread.
Khanan - Monday, March 28, 2022 - link
“If you don't know when to trust your own competence on a subject matter, then I feel sorry for you. I suggest you get good enough at something that you can build some confidence and know when not to listen to others.”I have more confidence than you will ever have, what a cheap and weak allegation. I don’t need weaseling wannabes like yourself pretending to be something which they are not.
You’re constantly adapting your own opinion and then pretending you had this opinion from the get go. Too bad 3 different people called you biased or a AMD hater, so you’re just trying to weasel yourself out now, which won’t work with me and the others don’t even care anymore.
“I'm not the slightest bit sorry for defending against a bunch of malicious and poorly-informed critiques and allegations.”
Nice try, but to this point you didn’t prove anything about your alleged “shortcomings” of ROCm. So you essentially provided nothing and pretended to have something, which I won’t fall for. For every shit you have googled up I can easily google up positive sources to contradict yours. You’re essentially a argumentative idiot that never used the hardware he criticizes and when called out quotes some weak sources that don’t hold up a inspection. That’s it for me, won’t waste any more time with you.
Suffice to say, ROCm is working and anyone who wants to use it, can use it. Devs aren’t exactly noobs when it comes to software, they will know how to do it. You never had a point.
mode_13h - Monday, March 28, 2022 - link
> I have more confidence than you will ever haveI'll grant you that's sure a confident statement, but sometimes a confident front is only that. True confidence is knowing when to stand your ground because the ground is indeed yours to hold. A foolish confidence is merely defiance in the face facts that are obvious for all else to see.
See also: bluster.
> You’re constantly adapting your own opinion
My original post was so short, it couldn't possibly capture a rich and nuanced point of view. So, I don't know what this nonsense is about "adapting" my opinion. You couldn't hope to get a real view of my opinion and experience from only that.
> Too bad 3 different people called you biased or a AMD hater
If you're going to keep posting, at least come up with new points. I already debunked this one.
> to this point you didn’t prove anything about your alleged “shortcomings” of ROCm.
Ah, so you want to go there, eh? I figured maybe you wouldn't want all of its dirty laundry aired, but I guess that proves you're just an agitator sent to make AMD look bad.
> For every shit you have googled up I can easily google up positive sources to contradict yours.
Really? How's that going to work? For every buggy and broken release, you're going to prove that it's not a bug or didn't break some hardware?
> never used the hardware he criticizes
In this point, you're actually correct. I wish I could, but they never supported 1st gen RDNA GPUs!
Even if they did, AMD turned their back on OpenCL, while Intel did not. Given my experience with Intel's compute stack on iGPUs, I'm willing to give their dGPUs a chance.
> when called out quotes some weak sources that don’t hold up a inspection.
Which ones? If you had anything substantive to say, why not say it, instead of wasting so much typing on childish name-calling?
> ROCm is working and anyone who wants to use it, can use it.
And good luck to anyone who tries. To this very day, it still doesn't support 1st gen RDNA GPUs. Doesn't matter whether Pro or not.
Espinosidro - Wednesday, April 27, 2022 - link
I'm sorry to interject, but sadly mode_13h is right about ROCm at least, I cant comment on Intel GPUs.AMD's Linux OpenCL support is utter garbage, both for AMDGPU-Pro and ROCm, both things have nearly no documentation, are hard to setup and are extremely prone to breaking. Even trying to use Mesa's OpenCL support is broken somehow.
In my opinion AMD should just give up OpenCL, at this rate they will simply never be a competitor to Nvidia when it comes to compute. They could instead focus on Vulkan compute, which works beautifully and painlessly on their open source drivers. My absolute best track record getting any kind of acceleration out of my Polaris and Vega 10 GPUs has been with Vulkan compute.
lmcd - Tuesday, April 5, 2022 - link
While I didn't end up changing the world with it, I did play around with CUDA on my Shield Tab while I was taking a CUDA class at university. I was stunned that it worked at all. It's broken these days, but so are Nvidia's Android aspirations. There's still a clear path to GPU compute using my aging GTX 660 Ti purchased around the same era.Meanwhile, I quite literally never got OpenCL support for my Vega 11 IGP. Look at this beauty! https://bbs.archlinux.org/viewtopic.php?id=254491
Two open source drivers, no support. AMDGPU-Pro, the third Linux driver, in turn never added any IGPs either. Compute literally works better on my Intel m3 tablet. I got better compute support from fglrx and a Radeon HD 5770.
And here's the real killer -- Intel has consistently put effort toward GPU virtualization (not just pass-through). If that lands in FOSS for the restructured Xe GPUs (it already existed with GVT-G for the prior generation), there won't be any question as to which GPU is right for compute.