Intel's Xeon E5-2600 V2: 12-core Ivy Bridge EP for Servers
by Johan De Gelas on September 17, 2013 12:00 AM ESTWhat Has Improved?
Ivy Bridge is what Intel calls a tick+, a transition to the latest 22nm process technology (the famous P1270 process) with minor architectural optimizations compared to predecessor Sandy Bridge (described in detail by Anand here):
- Divider is twice as fast
- MOVs take no execution slots
- Improved prefetchers
- Improved shift/rotate and split/Load
- Buffers are dynamically allocated to threads (not statically split in two parts for each thread)
Given the changes, we should not expect a major jump in single-threaded performance. Anand made a very interesting Intel CPU generational comparison in his Haswell review, showing the IPC improvements of the Ivy Bridge core are very modest. Clock for clock, the Ivy Bridge architecture performed:
- 5% better in 7-zip (single-threaded test, integer, low IPC)
- 8% better in Cinebench (single-threaded test, mostly FP, high IPC)
- 6% better in compiling (multi-threaded, mostly integer, high IPC)
So the Ivy Bridge core improvements are pretty small, but they are measureable over very different kinds of workloads.
The core architecture improvements might be very modest, but that does not mean that the new Xeon E5-2600 V2 series will show insignificant improvements over the previous Xeon E5-2600. The largest improvement comes of course from the P1270 process: 22nm tri-gate (instead of 32nm planar) transistors. Discussing the actual quality of Intel process technology is beyond our expertise, but the results are tangible:
Focus on the purple text: within the same power envelope, the Ivy Bridge Xeon is capable of delivering 25% more performance while still consuming less power. In other words, the P1270 process allowed Intel to increase the number of cores and/or clock speed significantly. This can be easily demonstrated by looking at the high-end cores. An octal-core Xeon E5-2680 came with a TDP of 130W and ran at 2.7GHz. The E5-2697 runs at the same clock speed and has the same TDP label, but comes with four extra cores.
Virtualization Improvements
Each new generation of Xeon has reduced the amount of cycles required for a VMexit or a VMentry, but another way to reduce hardware virtualization overhead is to avoid VMexits all together. One of the major causes of VMexits (and thus also VMentries) are interrupts. With external interrupts, the guest OS has to check which interrupt has the priority and it does this by checking the APIC Task Priority Register (TPR). Intel already introduced an optimization for external interrupts in the Xeon 7400 series (back in 2008) with the Intel VT FlexPriority. By making sure a virtual copy of the APIC TPR exists, the guest OS is capable of reading out that register without a VMexit to the hypervisor.
The Ivy Bridge core is now capable of eliminating the VMexits due to "internal" interrupts, interrupts that originate from within the guest OS (for example inter-vCPU interrupts and timers). The virtual processor will then need to access the APIC registers, which will require a VMexit. Apparantly, the current Virtual Machine Monitors do not handle this very well, as they need somewhere between 2000 to 7000 cycles per exit, which is high compared to other exits.
The solution is the Advanced Programmable Interrupt Controller virtualization (APICv). The new Xeon has microcode that can be read by the Guest OS without any VMexit, though writing still causes an exit. Some tests inside the Intel labs show up to 10% better performance.
Related to this, Sandy Bridge introduced support for large pages in VT-d (faster DMA for I/O, chipset translates virtual addresses to physical), but in fact still fractioned large pages into 4KB pages. Ivy Bridge fully supports large pages in VT-d.
Only Xen 4.3 (July 2013) and KVM 1.4 (Spring 2013) support these new features. Both VMware and Microsoft are working on it, but the latest documents about vSphere 5.5 do not mention anything about APICv. AMD is working on an alternative called Advanced Virtual Interrupt Controller (AVIC). We found AVIC inside the AMD64 programmer's manual at page 504, but it is not clear which Opterons will support it (Warsaw?).
70 Comments
View All Comments
ShieTar - Tuesday, September 17, 2013 - link
Oops, you are perfectly right of course. In that case the 4960X actually gets the slightly better efficiency (12.08 is 0.28 per thread and GHz) than the dual 2697s (33.56 is 0.26 per thread and GHz), which makes perfect sense.It also indicates the 4960X gets about 70% of the performance of a single 2697 at 38% of the cost. Then again, a 1270v3 gets you 50% of the performance at 10% of the price. So when talking farms (i.e. more than one system cooperating), four single-socket boards with 1270v3 will get you almost the power of a dual-socket board with 2697v2 (minus communication overhead), will likely use similar power demand (plus communication overhead), and save you $4400 in the process. Since you use 32 instead of 48 threads, but 4 installations instead of 1, software licensing cost may vary strongly in either direction.
Would be interesting to see this tested. Anybody willing to send AT four single-socket workstations?
hpvd - Tuesday, September 17, 2013 - link
yes - this would be really interesting. But you should use Infiniband interconnect for a good scaling. And this could only be done without an expensive IB-Switch with 3-maschines...DanNeely - Tuesday, September 17, 2013 - link
Won't the much higher price of a 4 socket board kill any CPU cost savings?In any event, the 1270v3 is a unisocket chip so you'd need to do 4 boxes to cluster.
Poking around on Intel's site it looks like all 1xxx Xeons are uniprocessor, 2xxx is dual socket, 4xxx quad, 8xxx octo socket. But the 4xxx series is still on 2012 models and 8xxx on 2011 releases. The 4 way chips could just be a bit behind the 2way ones being reviewed now; but with the 8 way ones not updated in 2 years I'm wondering if they're being stealth discontinued due to minimal cases where 2 smaller servers aren't a better buy.
hpvd - Tuesday, September 17, 2013 - link
I think we are talking around about 4 systems with each one cpu, one mainboard, RAM, ..+ network interface cardhpvd - Tuesday, September 17, 2013 - link
another advantage would be that these CPUs uses the latest Hashwell Achitecture: some workloads would greatly benefit from it's AVX2 ...Kevin G - Tuesday, September 17, 2013 - link
I'd fathom the bigger benefit of Haswell is found in the TSX and L4 cache for server workloads. The benefits of AVX2 would be exploited in more HPC centric workloads. Now if Intel would just release a socketed 1200v3 series CPU with L4 cache.MrSpadge - Tuesday, September 17, 2013 - link
> Now if Intel would just release a socketed 1200v3 series CPU with L4 cache.Agreed! And someone would test it at server loads. And BOINC. And if only Intel would release an overclockalbe Haswell with L4 which we can actually buy!
ShieTar - Tuesday, September 17, 2013 - link
A 4 socket board is expensive, but thats not the discussion I was making. A Xeon E5-4xxx is not likely to be less expensive than the E5-2xxx part anyways.The question was specifically how four single socket boards (with 4 cores each, at 3.5GHz, and Haswell technology) would position themselves against a dual-socket board with 24 cores at 2.7GHz and Ivy Bridge EP tech. Admittedly, the 3 extra boards will add a bit of cost (~500$), and and extra memory & communications cards, etc. can also add something depending on usage scenario. Then again, a single 4-core might get the work done with less than half the memory of a 12-core, so you might safe a little there as well.
psyq321 - Tuesday, September 17, 2013 - link
E5-46xx v2 is coming in few months, qualification samples are already available and for all intents and purposes it is ready - Intel just needs to ramp-up production.E7-88xx v2 is coming in Q1 2014, it is definitely not discontinued, and the platform (Brickland) will be compatible with both Ivy Bridge EX (E7-88xx v2 among others) and Haswell EX (E7-88xx v3 among others) CPUs and will also be able to take DDR4 RAM. It will require different LGA 2011 socket, though.
EX platform will come with up to 15 cores in Ivy Bridge EX generation.
Kevin G - Tuesday, September 17, 2013 - link
The E5-46xxx is simply a rebranded E5-26xx with official support for quad socket. The dies are the going to be the same between both families. Intel is just doing extra validation for the quad socket market as the market tends to favor more reliability features as socket count goes up.While not socket compatible, Brickland as a platform is expected to be used for the next (last?) Itanium chips.