AMD EPYC Milan Review Part 2: Testing 8 to 64 Cores in a Production Platformby Andrei Frumusanu on June 25, 2021 9:30 AM EST
It’s been a few months since AMD first announced their new third generation EPYC Milan server CPU line-up. We had initially reviewed the first SKUS back in March, covering the core density optimised 64-core EPYC 7763, EPYC 7713 and the core-performance optimised 32-core EPYC 75F3. Since then, we’ve ben able to get our hands on several new mid and lower end SKUs in the form of the new 24-core EPYC 7443, the 16-core 7343, as well as the very curious 8-core EPYC 72F3 which we’ll be reviewing today.
What’s also changed since our initial review back in March, is the release of Intel’s newer 3rd generation Xeon Scalable processors (Ice Lake SP) with our review of the 40-core Xeon 8330 and 28-core Xeon 6330.
Today’s review will be focused around the new performance numbers of AMD’s EPYC CPUs, for a more comprehensive platform and architecture overview I highly recommend reading our respective initial reviews which go into more detail of the current server CPU landscape:
(April 6th) Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
(March 15th) AMD 3rd Gen EPYC Milan Review: A Peak vs Per Core Performance Balance
(December 18th) The Ampere Altra Review: 2x 80 Cores Arm Server Performance Monster
What's New: EPYC 7443, 7343, 72F3 Low Core Count SKUs
In terms of new SKUs that we’re testing today, as mentioned, we’ll be looking at AMD new EPYC 7443, 7343 as well as the 72F3, mid- to low core-count SKUs that come at much more affordable price tags compared to the flagship units we had initially reviewed back in March. As part of the new platform switch, we’ll cover in a bit, we’re also re-reviewing the 64-core EPYC 7763 and the 32-core EPYC 75F3 – resulting in a few surprises and resolving some of the issues we’ve identified with 3rd generation Milan in our first review.
|AMD EPYC 7003 Processors
Core Performance Optimized
|EPYC 75F3||32 / 64||2950||4000||256
|EPYC 74F3||24 / 48||3200||4000||240 W||$2900|
|EPYC 73F3||16 / 32||3500||4000||240 W||$3521|
|EPYC 72F3||8 / 16||3700||4100||180 W||$2468|
Starting off with probably the weirdest CPU in AMD’s EPYC 7003 line-up, the new 72F3 is quite the speciality part in the form of it being an 8-core server CPU, yet still featuring the maximum available platform capabilities as well as the full 256MB of L3 cache. AMD achieves this by essentially populating the part with 8 chiplet dies with each a full 32MB of L3 cache, but only one core enabled per die. This enables the part (for a server part) relatively high base frequency of 3.7GHz, boosting up to 4.1GHz and landing with a TDP of 180W, with the part costing $2468.
The unit is a quite extreme case of SKU segmentation and focuses on deployments where per-core performance is paramount, or also use-cases where per-core software licenses vastly outweigh the cost of the actual hardware. We’re also re-reviewing the 32-core 75F3 in this core-performance optimised family, featuring up to 32 cores, but going for much higher 280W TDPs.
|AMD EPYC 7003 Processors
Core Density Optimized
|EPYC 7763||64 / 128||2450||3400||256
|EPYC 7713||64 / 128||2000||3675||225 W||$7060|
|EPYC 7663||56 / 112||2000||3500||240 W||$6366|
|EPYC 7643||48 / 96||2300||3600||225 W||$4995|
|P-Series (Single Socket Only)|
|EPYC 7713P||64 / 128||2000||3675||256||225 W||$5010|
In the core-density optimised series, we’re continuing on using the 64-core EPYC 7763 flagship SKU which lands in at 280W TDP and a high cost of $7890 MSRP. Unfortunately, we no longer have access to the EPYC 7713 so we couldn’t re-review this part, and benchmark numbers from this SKU in this review will carry forward our older scores, also being aptly labelled as such in our graphs.
|AMD EPYC 7003 Processors|
|EPYC 7543||32 / 64||2800||3700||256 MB||225 W||$3761|
|EPYC 7513||32 / 64||2600||3650||128 MB||200 W||$2840|
|EPYC 7453||28 / 56||2750||3450||64 MB||225 W||$1570|
|EPYC 7443||24 / 48||2850||4000||128
|EPYC 7413||24 / 48||2650||3600||180 W||$1825|
|EPYC 7343||16 / 32||3200||3900||190 W||$1565|
|EPYC 7313||16 / 32||3000||3700||155 W||$1083|
|P-Series (Single Socket Only)|
|EPYC 7543P||32 / 64||2800||3700||256 MB||225 W||$2730|
|EPYC 7443P||24 / 48||2850||4000||128 MB||200 W||$1337|
|EPYC 7313P||16 / 32||3000||3700||155 W||$913|
Finally, the most interesting parts of today’s evaluation are AMD’s mid- to low-core count EPYC 7443 and EPYC 7343 CPUs. At 24- and 16-core, the chips feature a fraction of the maximum theoretical core counts of the platform, but also come at much more affordable price points. These parts should especially be interesting for deployments that plan on using the platform’s full memory or I/O capabilities, but don’t require the raw processing power of the higher-end parts.
These two parts are also defined by having only 128MB of L3 cache, meaning the chips are running only 4 active chiplets, with respectively only 6 and 4 cores per chiplet active. The TDPs are also more reasonable at 200W and 190W, with also respectively lower pricing of $2010 and $1565.
Following Intel’s 3rd generation Xeon Ice Lake SP and our testing of the Xeon 28-core 6330 which lands in at an MSRP of $1894, it’s here where we’ll be seeing the most interesting performance and value comparison for today’s review.
Test Platform Change - Production Milan Board from GIGABYTE: MZ72-HB0 (rev. 3.0)
In our initial Milan review, we unfortunately had to work with AMD to remotely test newest Milan parts within the company’s local datacentre, as our own Daytona reference server platform encountered an unrecoverable hardware failure.
In general, if possible, we also prefer to test things on production systems as they represent a more mature and representative firmware stack.
A few weeks ago, at Computex, GIGABYTE had revealed their newest revision of the company’s dual-socket EPYC board, the E-ATX MZ72-HB0 rev.3.0, which now comes with out-of-box support for the newest 3rd generation Milan parts (The prior rev.1.0 boards don’t support the new CPUs).
The E-ATX form-factor allows for more test-bench setups and noiseless operation (Thanks to Noctua’s massive NH-U14S TR4-SP3 coolers) in more conventional workstation setups.
The platform change away from AMD’s Daytona reference server to the GIGABYTE system also have some significant impacts in regards to the 3rd generation Milan SKUs’ performance, behaving notably different in terms of power characteristics than what we saw on AMD’s system, allowing the chips to achieve even higher performance than what we had tested and published in our initial review.
Post Your CommentPlease log in or sign up to comment.
View All Comments
mode_13h - Sunday, June 27, 2021 - linkThanks for this update. Exciting findings!
Gondalf - Sunday, June 27, 2021 - linkSPECint2017 is good but....SPECint2017 Rate to estimate the per-core performance, no no absolutely no. SPECint2017 Rate have a very small dataset and it can not be utilized to estimate the single core performance, we need of the full SPECint2017 workload, the only manner to bypass the crazy L3 of Ryzen. Half the article have a so so sense ( obviously SPEC Rate is very criticized by many and very likely means less than nothing, expeciallly if you rise the bar on L3 ), the other half nope, without sense.
In fact Intel claim a new 10nm 32 cores superior than a 32 cores Milan, after all the two cores ( Zen 3 and Willow Cove) have around the same IPC, more or less, and being chiplets, 32 cores Milan is out of the games.
Obviously in this article the world "latency" is hidden or so. A single die solution is always better than chiplet design under load with the same number of cores.
Qasar - Sunday, June 27, 2021 - linkand there is the highly biased anti amd post from gondalf that he is known for.
" In fact Intel claim a new 10nm 32 cores superior than a 32 cores Milan, after all the two cores ( Zen 3 and Willow Cove) have around the same IPC, more or less, and being chiplets, 32 cores Milan is out of the games. "
yea ok, more pr bs from intel that you blindly believe ? post a link to this. the fact that you start with " in fact intel claim" kind of point to it being bs.
schujj07 - Monday, June 28, 2021 - linkGandalf missed a link I posed that has a 32c Intel vs 32c AMD. In that the AMD averages 20% better performance than the Intel across the entire test suite. https://www.servethehome.com/intel-xeon-gold-6314u...
iAPX - Sunday, June 27, 2021 - linkThere's a lot to read and understand on the last chart (per-Thread score / Socket Perf), about usefulness of SMT (or not), about who is the per-Thread performance leader and also the per-Socket performance leader, with a notable exception, the Altra Q80-33.
I would like to see these kind of chart more often, it sum-up things very clearly, while naturally you have to understand that it is just a long-story short, and have to read about specific performance depending on the payload (ie: DB as stated).
nordform - Thursday, July 1, 2021 - linkToo bad Apple's M1 was left out ... it clearly would have smoked the "competition". Everything with a TDP higher than 25W is inappropriate, not to say obscene.
Apple rules hands down
Qasar - Friday, July 2, 2021 - link" Everything with a TDP higher than 25W is inappropriate, not to say obscene. " and why would that be ?
mode_13h - Friday, July 2, 2021 - linkThat would be like drag racing a Tesla car against some 18-wheeled diesel trucks.
Server CPUs are not optimized for low-thread performance. They're designed to scale, and have data fabrics to handle massive amounts of I/O that the M1 can't. It wouldn't be a fair (or relevant) comparison.
Now, try running that Tesla car in a tractor pull and we'll see who's laughing!
Oxford Guy - Thursday, July 8, 2021 - linkHappy to have won another debate in which my suggestion was aggressively attacked.
I said having dual channel DDR4 for Zen 3 was unfortunate, as DDR4 is so long in the tooth — a fact that dual channel configuration makes more salient. I said it would have been good for the company to add more value by giving it quad channel RAM or, if possible, a support for both DDR4 and DDR5 — something some mainstream Intel quads had (support for DDR3 and DDR4).
My remark was derided mainly on the basis of the claim that dual channel is plenty. This new set of parts demonstrate the benefit of having more RAM and cache.
Considering how high the core counts are for Zen 3 desktop CPUs and how much Apple has set people on notice about what’s possible in CPU performance...
Also, part of the rebuttal was citing the existence of TR. That’s still Zen 2, eh? Can’t really go out and buy that rebuttal.
Is the benefit of being able to stay with the AM4 socket bigger than having less starvation of the CPU, particularly given the very high core counts of CPUs like the 5950? TR may be everyone’s segmentation dream (particularly when it’s being laughingly sold with obsolete Zen 2 and subjected to rapid expensive motherboard orphaning) but I think having five motherboard specs is a bridge too far. Let the low-end have dual channel and no overclocking, dump TR, and consolidate the enthusiast boards to a single (not two) chipset. But... that’s me. I like more value versus little crumbs and redundancies. When a whopping two companies is the state of the competition, though, people become trained to celebrate banality.
mode_13h - Thursday, July 8, 2021 - link> Zen 3 was unfortunate, as DDR4 is so long in the tooth ...
> it would have been good for the company to add more value by giving it quad channel RAM
Agreed. Would've been nice. In spite of that, the 5950X manages to show gains over the 5900X, but we can still wonder how much better it might be with more memory bandwidth.
I wouldn't have an issue with quad-channel being reserved for their TR platform if:
* they were more affordable
* they brought Zen3 to the platform more promptly
An interesting counter-point to consider is how little 8-channel RAM benefitted TR Pro:
"In the tests that matter, most noticeably the 3D rendering tests, we’re seeing a 3% speed-up on the Threadripper Pro compared to the regular Threadripper at the same memory frequency and sub-timings."
That's much less benefit than I'd have expected, as a 64-core TR on quad-channel should be far more bandwidth-starved than a 16-core Ryzen on dual-channel. However, that same article features a micro-benchmark which shows the full potential of 8-channel. So, it's obviously workload-dependent.