The Intel Xeon W Review: W-2195, W-2155, W-2123, W-2104 and W-2102 Testedby Ian Cutress & Joe Shields on July 30, 2018 1:00 PM EST
- Posted in
- Xeon Scalable
Anyone looking at a high-end Intel system has three choices: Core i9, Xeon W, or the larger socket Xeon Scalable. Those first two both use the LGA2066 socket, and have identical core/frequency configurations, but are in effect different platforms with locked motherboards for each. The benefits of the Xeon W and Xeon Scalable lie in the ability for ECC memory, vPro management features, and with some processors there are different cache variants.
In a previous generation, Intel had workstation counterparts in line with its high-end desktop line. Both of the products used the same socket, which made it easier for consumers, and the same single socket motherboard that held a Core i7 could also run a range of Xeons: the big E7 series, the dual-socket focused E5-2600 series, and the workstation-focused E5-1600 series. The benefits of these workstation chips were primarily for ECC memory, management features, and OEM support.
Cycle forward to today, and due to socket bifurcation, none of the server focused processors will fit into a modern HEDT platform. To that extent Intel created the Xeon W family, borne from the E5-1600 line, which matches up with the consumer line but with the usual ECC/OEM add-ons. Intel also cut consumer chipset support, pushing Xeon W out of consumer hands and purely into the OEM/system market due to the lack of server chipset based motherboards at retail. Despite the doom and gloom, Supermicro recently sampled us their server chipset X11SRA and a handful of Xeon W processors for review.
The Xeon W Line-Up
Announced back in January 2018, the Xeon W launch was somewhat unexpected: we had reason to believe that Intel would introduce components for the consumer high-end desktop socket with ECC, however what form that would take was unknown, especially with processors up to 18 cores being released on the consumer side. Intel would ultimately have to draw parity with the Xeon W line, potentially causing a shift in its single socket market on the server side as well.
In the end, Intel released eight new Xeon W processors for the market, along with two off-roadmap processors for particular OEMs, and a single version for Apple. The configurations are essentially identical to the consumer HEDT line, using the Enterprise-focused Skylake-SP cores with new AVX512 instructions, a new mesh interconnect, and a rearranged cache structure focusing on L2 data. We have covered the changes compared to the standard Skylake-S core in detail in our initial Skylake-X and Skylake-SP reviews.
What Intel has done with the Xeon W processors compared to the consumer HEDT line is focus more on the lower core count models: in the full line-up there are four quad-core models, some with hyperthreading, but there are also two six-core parts, a single eight-core part, and a ten-core parts. One could interpret this SKU differentiation as Intel not focusing as much on the high-end with the Xeon W line – where the consumer line as products at 18/16/14/12/10 cores, the Xeon W only has 18/14/10/10/8 as the top five models.
|Intel Xeon-W Processors (LGA2066)|
|Xeon W-2195||18/36||2.3 GHz||4.3 GHz||24.75||1.375||2666||140 W||$2553|
|Xeon W-2175||14/28||2.5 GHz||4.3 GHz||19.25||1.375||2666||140 W||$1947|
|Xeon W-2155||10/20||3.3 GHz||4.5 GHz||13.75||1.375||2666||140 W||$1440|
|Xeon W-2145||8/16||3.7 GHz||4.5 GHz||11.00||1.375||2666||140 W||$1113|
|Xeon W-2135||6/12||3.7 GHz||4.5 GHz||8.25||1.375||2666||140 W||$835|
|Xeon W-2133||6/12||3.6 GHz||3.9 GHz||8.25||1.375||2666||140 W||$617|
|Xeon W-2125||4/8||4.0 GHz||4.5 GHz||8.25||2.063||2666||120 W||$444|
|Xeon W-2123||4/8||3.6 GHz||3.9 GHz||8.25||2.063||2666||120 W||$294|
|Xeon W-2104*||4/4||3.2 GHz||-||8.25||2.063||2400||120 W||$255|
|Xeon W-2102*||4/4||2.9 GHz||-||8.25||2.063||2400||120 W||$202|
|Apple Only SKUs|
|Xeon W-2191B||18/36||2.3 GHz||4.3 GHz||24.75||1.375||2666||?||-|
|Xeon W-2170B||14/28||2.5 GHz||4.3 GHz||19.25||1.375||2666||?||-|
|Xeon W-2150B||10/20||3.0 GHz||4.5 GHz||13.75||1.375||2666||?||-|
|Xeon W-2140B||8/16||3.2 GHz||4.2 GHz||11.00||1.375||2666||?||-|
One of the other changes is in the AVX512 compatibility. With the Xeon Scalable processors, each core has the equivalent of two AVX512 FMA ports on each core to maximize bandwidth, except the off-roadmap SKUs that have one. The consumer product line also has two, although Intel initially said certain parts only had one. These Xeon W parts will also have two AVX512 FMA ports each, allowing hand-tuned code to use AVX512 to its fullest. Xeon W also has ECC memory, which is usually one of the main reasons to buy the processors.
Each of the CPUs can support up to 512GB of DDR4-2400 ECC RDIMMs in a quad channel configuration, which means that each module can be 64GB apiece. This is up from the 128 GB UDIMM support on the consumer space, but lower than the 768GB RDIMM support for Xeon Scalable (caused by having six memory channels).
A small note about the ‘different’ processors in the stack. The W-2102 and W-2104 are the low-end quad-core processors without hyperthreading or Turbo, but these are classified as ‘off roadmap’. These processors are not for sale to all OEMs, like the others, and typically are built for specific OEMs that have contracts with specific customers in mind. As a result, pricing lists will not show these parts, and to be honest, Intel does not really like talking about them as promoting them has no inherent value. Of course from our perspective, we like examining every member of the stack, regardless of how widely available it is.
The other set of different processors are the Apple-only parts. These are only found in the Late 2017 model of the updated iMac Pro. Take for example the Xeon W-2150B - this 10-core processor is almost identical to the 10-core Xeon W-2155, but has a lower base frequency of 3.0 GHz (compared to 3.5 GHz). The lower base frequency will greatly reduce the TDP of the processor, however these processors rarely run at base frequency and almost always in a turbo state, where TDP is undefined, making it difficult to place this processor. It is most likely a part that is binned well for voltage, frequency, and power. Again, this is another part that isn’t available to everyone (but if someone has one, we’d love to test it).
All these processors will require a motherboard that uses the C422 chipset. These chipsets are almost identical to the X299 chipsets used in the consumer platform, but are firmware locked to Xeon W processors only with support for ECC. Because of the split between the consumer and workstation platforms, there are very few C422motherboards in the open market for custom builders – most OEMs (Dell, Supermicro) build their own internal motherboards for pre-built systems specifically for their own customers, and optimized for their intended outcome (performance, price, etc.).
Per Core Turbo Data
Intel's per-core turbo data for these workstation parts are split up into three sections, due to the instruction sets they have. On the 'hardest' instructions, Intel uses special turbo values for AVX-512, as due to the way these instructions are processed, more heat is generated on chip. The chip has to balance frequency and power draw, so the AVX-512 data comes in at a lower frequency in order to keep the turbo in check.
The first thing to notice with this data is that for most CPUs, when the whole CPU is using AVX-512 instructions, the frequency will drop below the base frequency. For chips like the Xeon W-2123 and W-2133, even single core loading of AVX-512 will drop the frequency below the base frequency. Intel's base frequency does two things: first, it tells you the frequency at which TDP is applicable, and second it is the guaranteed minimum frequency for regular non-AVX instructions.
Behind AVX-512 is AVX2, which is still somewhat of a strain on the processor beyond regular instructions, but not as much. Where AVX-512 requires dedicated die area for support of the vector units, AVX2 is built into the back-end of the standard core design.
For AVX2, the W-2133 and W-2123 still end up below the base frequency of the processor. But for the big ones, like the W-2195, the full 18-core loading of AVX2 is 500 MHz faster than AVX-512. This is just an indication that users that are fine-tuning code should think about how much of the AVX-512 unit they can keep fed - the AVX-512 unit despite the 500 MHz difference is expected to be faster no doubt, but a half-fed AVX-512 might get trumped by a full AVX2.
For the regular instructions, turbo goes a bit like this:
For a number of users, the key metrics here are the all-core turbos, with the 18-core part having an all-core turbo of 3.2 GHz. Interestingly the W-2155 and W-2145 sits well here: for any code that can't reliably go beyond 12-14 threads, having the higher frequency but lower core count part might actually perform better. We saw a bit of this in our review, with the variable threaded loads executing somewhat better on the W-2155 than the W-2195.
Then and Now: Defining a Workstation
By splitting the motherboard support for workstation grade processors, Intel has (whether on purpose or not) redefined what it means to have an Intel workstation. In previous generations, a certain market of users would happily invest into an E5-2640 style processor and place it into a single socket consumer motherboard, taking advantage of a potentially better-binned processor, and on some motherboards that qualified it, ECC memory. Depending on the location and time, in some instances this method was cheaper than going for the similar grade consumer processor. Due to the motherboard support, these systems were certainly more widespread compared to today. In fact, some users are currently looking to eBay and investing in older 8-core and 10-core processors because they are extremely affordable.
In 2018, for the Intel workstation enthusiast, the situation is complicated and confusing.
If a workstation user looked at consumer-grade hardware, they can get the cores and the motherboard, but lose the ECC memory and co-processor compatibility consummate with a professional system: some motherboard hardware may not be qualified with Quadro, Tesla, FirePro, Xilinx, Altera, etc., because those motherboards aren’t built for that market.
If a workstation user looked at professional-grade hardware, it becomes a case of struggling to self-build based on availability or paying through the nose for an OEM system that might have some horrendous markup. We spoke to one OEM in years past, who said that the prices on the website were almost fictitious – most of their sales in this area come from large-scale corporate contracts which offer discounts based on volume. The single home-brew workstation user was not their target market, unless they wanted to pay the high prices.
|Similar SKU Comparison|
|Cores/Threads||18 / 36||18 / 36||18 / 36|
|Top Base/Turbo||2.6 / 4.2||2.3 / 4.3||2.3 / 3.7|
|GPU PCIe 3.0||44||48||48|
|DRAM / DDR4||128GB
|768GB / 1536GB
|Price||$1999||$2553||$2451 / $5448|
Ultimately if a user is going above and beyond for an OEM system, it might be worth looking into Xeon Scalable processors, especially if multiple sockets in a single system are required. This increases the expense significantly, however. The benefits on building a consumer-based workstation, if memory is not needed, also come down to clock speed and AVX-512 support.
The alternative is to look at AMD’s workstation offering, Threadripper, which is cheaper, offers similar core counts, more ECC memory per processor (depending on motherboard support), and more PCIe lanes, but does not have AVX-512 and can suffer from a non-unified memory architecture for software that requires a lot of core-to-core and core-to-memory communication. The multi-socket option here is EPYC, which gets more cores and more system memory, but not increase in PCIe lanes due to the way the platform shares resources.
|Intel vs AMD Comparison|
|Cores/Threads||8 / 16||8 / 16||24 / 48|
|Base/Turbo||3.7 / 4.5||3.8 / 4.0||2.0 / 3.0|
|GPU PCIe 3.0||48||60||124|
|DRAM / DDR4||512GB
|L3 Cache||11 MB||32 MB||64 MB|
A lot of purchasing decisions will be skewed specifically for the workflow in mind, which is one of the reasons why we have so many benchmarks in play for our reviews – there is no ‘one benchmark fits all’ scenario, and we are now in a situation where there are multiple options to choose from depending on the size of the wallet.
For our analysis today, we were able to secure five of the Xeon W processors: the top-end 18-core W-2195, the mid-range ten core W-2155, the more budget quad-core W-2153, and the two off-roadmap processors in the W-2104 and W-2102.
We have put these processors through our current generation testing suite, with the Spectre and Meltdown patches applied. The main targets for comparison are Intel’s Skylake-X high-end desktop platform, Intel’s Skylake-S consumer platform, and AMD’s Ryzen and Threadripper platforms.
The motherboard used in our review is the Supermicro X11SRA, one of the more 'available' Xeon W motherboards on the market.
You can read our review of the motherboard here.
We must also say thank you to Kingston for sampling us some DDR4-2666 C19 RDIMM Memory for this review.
Xeon W processors support RDIMM ECC memory, and our motherboard here would not accept UDIMMs, and Kingston kindly supplied the memory needed. The (KSM26RS8/8HAI) modules were faultless in our testing.
Pages In This Review
- Overview of Xeon W
- Test Setup and Power Consumption
- CPU Benchmarking: Office Tests
- CPU Benchmarking: System Tests
- CPU Benchmarking: Rendering Tests
- CPU Benchmarking: Encoding Tests
- CPU Benchmarking: Web Tests
- CPU Benchmarking: Legacy Tests
- Spectre vs Meltdown: SYSMark
- Conclusions: Is Intel Serious About Xeon W?
Post Your CommentPlease log in or sign up to comment.
View All Comments
HStewart - Monday, July 30, 2018 - linkI am curious why Xeon W for same core count is typically slower than Core X - also I notice the Scalable CPU have much more functionally especially related to reliability. In essence to keep the system running 24/7. Also the Scalable CPU's also appear to have 6 channel memory instead of 4 Channel memory. I wonder when 6 channel memory comes to consumer level CPUs.
One test that would be is to see what same core processor for Xeon W vs the Scalar CPU with only one CPU.
Another test that could be interesting is a dual CPU scalable with say 2 12 cores verses 1 24 core of CPU on same level.
Just test to see what it with more cores vs CPU's
duploxxx - Monday, July 30, 2018 - linkone threadripper 2.0 and you can throw all intel configs here into the bin
tricomp - Monday, July 30, 2018 - linkYeaH
HStewart - Monday, July 30, 2018 - linkI wish people keep the topic to the subject and not blab about competitor products
duploxxx - Tuesday, July 31, 2018 - linkif you would know anything about cpu scalable systems you would not ask these questions. a 2*12 vs 1*24 will be roughly 20% slower if your application scales cross the total core count due to in between socket communication. Even Intel provides data sheets on that. No need to test.
as long as intel can screw consumers they will not invest anything, you wont get 6 mem lanes in xeon W or consumer unless competition does it and they get nailed. btw why on earth would you need that on a consumer platform?
BurntMyBacon - Tuesday, July 31, 2018 - linkIf all things are equal, then what you say is true. There is a known performance drop due to intersocket communications. However, you may have more TDP headroom (depends on the chips you are using) and mostly likely more effective cooling with two sockets allowing for higher frequencies with the same number of active cores. If the workload doesn't require an abundance of socket to socket communications, then it is conceivable that the two socket solution may have merit is such circumstances.
SanX - Tuesday, July 31, 2018 - linkWhy ARM is just digging its buggers watching the game where it can beat Intel ? Where are ARM server and supercomputer chips? ARM processors soon surpass Intel in transistor count. And for the same amount of transistors ARM is 50-100x cheaper then duopoly Intel/AMD. As an additional advantage for ARM these two segments will soon completely abandon Microsoft.
firstname.lastname@example.org - Thursday, August 2, 2018 - linkARM is RISC which is completely from CISC so applications and os are limited. Microsoft server os has really evolved in every aspect in the last few years that may take RISC years to catch up on the software side.
JoJ - Saturday, August 4, 2018 - linkARM is Fujitsu's choice of successor core to SPARC64+, a architecture Fujitsu invested decades of research and development and testing to offer both commercially and at a national laboratory supercomputing level. ARM is therefore not a knee jerk choice of direction for a very interesting super builder.
Obviously you exaggerated a little bit, saying ARM is "50 - 100 times cheaper than AMD/Intel".
I wish I could shake my belief that pedantic literalism in Internet forums in general wasn't preventing broad discussion - we exaggerate in real life without any socially degrading effects, why not online?
OR ate your conversation parties sniffing that obviously -- any person who inadvertently speaks technically inaccurately despite forming perfectly understandable inquiry... as if they are unwashed know nothings, and turning on their heels to end the discussion.....a bit like HN's "we don't tolerate humor here" reactions to innocent attempts at lightening the thread...
but I digress, my point here is your comment above raised a couple of interesting questions, that I feel haven't been answered only because I think readers by themselves first over react to hyperbole, then infil the accepted wisdom to answer your questions, despite you ask about pertinent value critical concerns. I feel that by supplying the answer and dismissing the comment as uninformed, the most important thing happening is the reader voluntarily self reinforcing given marketing positions, and not engaging with the subject at all. I work in advertisingand am actually studying this, because advertising buyers adore this kind of"mind share" but we think that is at odds with the advertising buyers wanting"open minded, engaging, adaptable, innovative" customers.
1. have a look at Serve The Homes review of the Cavium ARM server generations. This architecture is definitely viable and competitive now in a increasing number of application areas.
2. Microsoft Azure has ARM deployed in my estimation at scale second only to Baidu. I am tempted to think it's actually politics that prevents a ARM Azure server machine offering to commercial users, little else. The problem with Microsoft, is user expectation of a all round performance consistency and intel and Microsoft have been working on that smooth delivery for decades.
3. ARM is bit cheaper if you need to do more than a quick recompile with a few architecture options selected.
re when we will see a Azure ARM instance, I think could even be waiting for the ability for Cavium to actually deliver hardware, because unmet demand is a fatal blow to new technology, as well as successful realisation.
All my"quality time" with our server fleet, is spent all hands on the thermal and power profiles of our applications.
We will rewrite to gain fractions of a percentage point where it's a consistent number across runs. Since twenty five years ago, I crashed a colo cage by not considering the power on start surge of a huge half terabyte raid array, power loads obsessed me. Power usage in Cavium ARM looks like a winner for us.
4. BUT I said that,based on data mapping dense thermal sensor arrays, with the functional code paths of the actual application logic in flight across the fleet, at the time. If we're able to calculate the cost benefit of routing a new application function to a specific server, depending on the thermal load and core behaviour at the time of dispatch, I admit we're not very typical for a small scale customer. I think small is a server count below 10,000 here, including any peak on demand usage in case you're consumer retail and sell half price Gucci shoes on Black Monday.
(we got surprised by the reliability of gains from very crude information. Originally we just wanted to see if we could balance the flows in the hot aisle, and even throttle hotspot buildup if we lost some cooling locally. For Intel, we got lots of gains, by sending jobs to not exceed the optimal max turbo clock of a processor, and immediately filling out the slower cores with background chores. AMD and Cavium ARM are not as sophisticated about thermal management, where Intel is keen on overkill recently, eg four nigh identical Xeon Gold SKUs. Just do really read that STH review about this"redundancy of the Xeon processor parts- I came away with a purchase order for the reviewed SKU, because we're so excited about the power management system roles in production deployment, as a competitive advantage.
5. REAL COST ADVANTAGE DEPENDS ON CHANNEL PENETRATION, WITH AMD AT 2%, yes, TWO percent is considered healthy for them today, AMD need to be shipping in far greater volume, to move the money dial to realise the kind of cost advantage SanX is excited about.
Certification of countless applications is hardly begun...
I want to use a ARM workstation, to eat my dog food. This necessitates nvidia Quadro cards support. Yes, I write for a living. I target CUDA for a ever increasing proportion of customer needs. SURE I can just remote machines at will. BUT IF YOU DON'T GIVE CRITICAL DEVELOPERS TRULY GREAT HARDWARE, YOU'RE ABANDONING THE PLATFORM FOR ANY IDEA OF GENERAL DEPLOYMENT.
6. Probably the last sentence should have been standalone here.
I'll just say that we need a workstation as cool as the Silicon Graphics Indy of'93, to get a chance of getting a new GENERAL purpose platform in the mainstream soon.
7. I am constantly a both astounded by the simple fact that we have a chip that good to compete at all, yet scared because I am starting to wonder if we'll ever see sales above"bargaining power level" and platform insurance, and the niche market for companies able to extract whole value chains from controling their entire software ecosystem, something almost nobody in the real world can do.
JoJ - Saturday, August 4, 2018 - linktypo, mea,
in point 3, I mean to say, "ARM is NOT cheaper, if you need to do more than a quick recompile.."