The Xeon E5-2600: Dual Sandy Bridge for Servers
by Johan De Gelas on March 6, 2012 9:27 AM EST- Posted in
- IT Computing
- Virtualization
- Xeon
- Opteron
- Cloud Computing
Intel's Sandy Bridge architecture was introduced to desktop users more than a year ago. Server parts however have been much slower to arrive, as it has taken Intel that long to transpose this new engine into a Xeon processor. Although the core architecture is the same, the system architecture is significantly different from the LGA-1155 CPUs, making this CPU quite a challenge, even for Intel. Completing their work late last year, Intel first introduced the resulting design as the six-core high-end Sandy Bridge-E desktop CPU, and since then have been preparing SNB-E for use in Xeon processors. This has taken a few more months but Xeon users' waits are at an end at last, as today Intel is launching their first SNB-E based Xeons .
Compared to its predecessor, the Xeon X5600, the Xeon E5-2600 offers a number of improvements:
A completely improved core, as described here in Anand's article. For example, the µop cache lowers the pressure on the decoding stages and lowers power consumption, killing two birds with one stone. Other core improvements include an improved branch prediction unit and a more efficient Out-of-Order backend with larger buffers.
A vastly improved Turbo 2.0. The CPU can briefly go beyond the TDP limits, and when returning to the TDP limit, the CPU can sustain higher "steady-state" clockspeed. According to Intel, enabling turbo allows the Xeon E5 to perform 14% better in the SAP S&D 2 tier test. This compares well with the Turbo inside the Xeon 5600 which could only boost performance by 4% in the SAP benchmark.
Support for AVX Instructions combined with doubling the load bandwidth should allow the Xeon to double the peak floating point performance compared to the Xeon "Westmere" 5600.
A bi-directional 32 byte ring interconnect that connects the 8 cores, the L3-cache, the QPI agent and the integrated memory controller. The ring replaces the individual wires from each core to the L3-cache. One of the advantages is that the wiring to the L3-cache can be simplified and it is easier to make the bandwidth scale with the number of cores. The disadvantage is that the latency is variable: it depends on how many hops a certain piece of data inside the L3-cache must cross before ends up at the right core.
A faster QPI: revision 1.1, which delivers up to 8 GT/s instead of 6.4 GT/s (Westmere).
Lower latency to PCI-e devices. Intel integrated a PCIe 3.0 I/O subsystem inside the die which sits on the same bi-directional 32 bit ring as the cores. PCIe 3.0 runs at 8 GT/s (PCIe 2.0: 5 GT/s), but the encoding has less overhead. As a result, PCIe 3.0 can deliver up to 1 GB full duplex per second per lane, which is twice as much as PCIe 2.0.
Removing the I/O lowered PCIe latency by 25% on average according to Intel. If you only access the local memory, Intel measured 32% lower read latency.
The access latency to PCIe I/O devices is not only significantly lower, but Intel's Data Direct I/O Technology allows the PCIe NICs to read and write directly to the L3-cache instead of to the main memory. In extremely bandwidth constrained situations (using 4 infiniband controllers or similar), this lowers power consumption and reduces latency by another 18%, which is a boon to HPC users with 10G Ethernet or Infiniband NICs.
The new Xeon also supports faster DDR-3 1600, up to 2 DIMMs per channel can run at 1600 MHz.
Last but certainly not least: 2 additional cores and up to 66% more L3 cache (20 MB instead of 12 MB). Even with 8 cores and a PCIe agent (40 lanes), the Xeon E5 still runs at 2.2 GHz within a 95W TDP power envelope. Pretty impressive when compared with both the Opteron 6200 and Xeon 5600.
81 Comments
View All Comments
silverblue - Thursday, March 8, 2012 - link
You've put that Interlagos has 4x2MB L2, but that would only be true for Valencia; Interlagos is 8x2MB.aranyagag - Thursday, March 8, 2012 - link
you forgot the E5-2687W with a 150w tdp and higher speedscolonelclaw - Friday, March 9, 2012 - link
Hi There,Thanks for an excellent article. With regards to the rendering benchmarks, would you consider using VRay as a rendering engine? It's fast becoming industry standard, is compatible with all the big hitters (Max, Maya, Softimage etc), is cross platform, and I believe, is incredibly well coded to scale with cores.
It's also incredibly popular, not something you could say about iRay right now.
Slik - Saturday, March 10, 2012 - link
Would be nice if some game benchmark was included as well.colonelclaw - Wednesday, March 14, 2012 - link
Bloody hell those chips look good, and don't Intel know it; those prices make me wince.Having waited what seems like forever, I was thrilled to see the Xeon E5s finally available, right up until I did some quick maths and figured out that for my business to buy a new 2U Twin squared rendernode with 16/32 cores per node will cost us around £10,000. Still the thing is, now that those chips are available, next time we buy kit we can't afford not to choose them.
Skouperd - Tuesday, March 20, 2012 - link
Great article... but can it run crysis?Seriously, what will happen if you plug in some high end graphics card in that machine, how will that compare from a gaming perspeective to say an LGA2011 cpu?
;-)
fudd101 - Wednesday, April 4, 2012 - link
From the 'article' .....'The Opteron might also have a role in the low end, price sensitive HPC market, where it still performs very well. It won't have much of chance in the high end clustered one as Intel has the faster and more power efficient PCIe interface'
Well, if that's the case, why exactly would AMD be scoring so many design wins with Interlagos. Including this one ...
http://www.pcmag.com/article2/0,2817,2394515,00.as...
http://www.eweek.com/c/a/IT-Infrastructure/Cray-Ti...
U think those guys at Cray were going for low performance ? In fact, seems like AMD has being rather cleaning up in the HPC market since the arrival of Interlagos. And the markets have picked up on it, AMD stock is thru the roof since the start of the year. Or just see how many Intel processors occupy the the top 10 supercomputers on the planet. Nuff said ...
jaskhoo - Wednesday, July 11, 2012 - link
Hi, abit blur here and would like to know if there's anyone who could enlighten abit.I'm looking to purchase a new server to work with an SQL 2012 4 core, the initial ppreference was for an E5620 which is now an outdated model but I can't go for higher E5-xxx models as all are 6 core and will affect the 4 core SQL licensing. I'm not running a huge databse but would like to know if there are any serious performance difference between the two processor. Appreciate it.
famalosa - Tuesday, April 26, 2016 - link
Les deux platines ci-dessus commençons par lance Gobelins ou la bombe tour/Canon au début de la bataille lorsque la barre de l'élixir obtient autour de 10. Je vais tout d'abord si l'ennemi a au compteur et ils ont habitude de laisser tomber tout d'abord une grosse carte ou utilisez les flèches. Puis jouer à votre avantage avec le coût de l'élixir, troupes pour la défense et une attaque forte. Déposer le Canon ou la tour au milieu de rerouter les agresseurs et pour la défense.<a href="http://clashroyalhack.fr/">CR hack</a>
famalosa - Tuesday, April 26, 2016 - link
Le pont sur la gauche est une mise en page « plus grand risque de mieux récompenser », tandis que celui de droite est un jeu plus lent que prendra de temps et patience pour gagner. Tous deux sont extrêmement compétents et me laisser pousser à Arena 7.http://clashroyalhack.fr/