The Xeon E5-2600: Dual Sandy Bridge for Servers
by Johan De Gelas on March 6, 2012 9:27 AM EST- Posted in
- IT Computing
- Virtualization
- Xeon
- Opteron
- Cloud Computing
The massive 416 mm² large chip contains no less than 2263 million transistors. Each generation of Intel and AMD server CPUs seem to get a bit larger as you can see below.
The Xeon 5400, 5500/5600 and E5-2600 package on top, the Opteron 2300/8300 and 6100/6200 below.So how does the new Xeon compare to the older Xeons and the latest Opterons? Let's take a look at the paper specs:
Xeon E5-2600 "Sandy Bridge EP" |
Opteron 6200 "Interlagos" |
Opteron 6100 "Magny-cours" |
Xeon 5600 "Westmere" |
|
Cores (Modules)/Threads | 8/16 | 8/16 | 12/12 | 6/12 |
L1 Instruction | 8x 32 KB 4-way | 8x 64 KB 2-way | 12x 64 KB 2-way | 6x 32 KB 4-way |
L1 Data | 8x 32 KB 8-way | 16x 16 KB 4-way | 12x 64 KB 2-way | 6x 32 KB 8-way |
L2 Cache | 8x 256 KB | 4x 2MB | 12x 0.5MB | 6x 256 KB |
L3 Cache | 20 MB | 2x 8MB | 2x 6MB | 12MB |
Max. Memory Bandwidth (Per socket) |
51.2 GB/s | 51.2 GB/s | 42.6 GB/s | 32 GB/s |
IMC Clock Speed | = corespeed | 2GHz | 1.8GHz | 2GHz |
Interconnect | 2x QPI 2.0 (8 GT/s) | 4x HT 3.1 (6.4 GT/s) | 4x HT 3.1 (6.4 GT/s) | 2x QPI (4.8-6.4 GT/s) |
Transistors (Billion) | 2,26 | 2x 1,2 | 2x 904 | 1,17 |
Die Size (mm²) | 416 | 2x 315 | 2x 346 | 248 |
The new Xeon comes with a huge die, and with its ring interconnect and improved RAS, it starts to look more like a successor of the Westmere-EX than the Westmere-EP Xeon. In fact the ring of the Xeon E5 is more advanced: it has a PCIe agent, PCU and IMC on the same ring as the 8 cores.
The massive die, the two extra cores, the integration of the PCIe controller and no competition in the high-end have made it easier for Intel to justify a price increase. The Sandy Bridge EP is somewhat more expensive than its predecessor, as you can see in the table below. The first clockspeed mentioned is the regular clock, the second the turbo clock with all cores active (most realistic one) and the last the maximum turbo clock.
Intel new vs. Intel 2-socket SKU Comparison | |||||||||
Xeon 5600 |
Cores/ Threads |
TDP |
Clock (GHz) |
Price |
Xeon E-5 |
Cores/ Threads |
TDP |
Clock (GHz) |
Price |
High Performance | High Performance | ||||||||
2690 | 8/16 | 135W | 2.9/3.3/3.8 | $2057 | |||||
X5690 | 6/12 | 130W | 3.46/3.6/3.73 | $1663 | 2680 | 8/16 | 130W | 2.7/3.1/3.5 | $1723 |
2670 | 8/16 | 115W | 2.6/3/3.3 | $1552 | |||||
2665 | 8/16 | 115W | 2.4/2.8/3.1 | $1440 | |||||
X5675 | 6/12 | 95W | 3.06/3.33/3.46 | $1440 | |||||
X5660 | 6/12 | 95W | 2.8/3.06/3.2 | $1219 | 2660 | 8/16 | 95W | 2.2/2.6/3.0 | $1329 |
X5650 | 6/12 | 95W | 2.66/2.93/3.06 | $996 | 2650 | 8/16 | 95W | 2/2.4/2.8 | $1107 |
Midrange | Midrange | ||||||||
E5649 | 6/12 | 80W | 2.53/2.66/2.8 | $774 | 2640 | 6/12 | 95W | 2.5/2.5/3 | $885 |
2630 | 6/12 | 95W | 2.3/2.3/2.8 | $612 | |||||
E5645 | 6/12 | 80W | 2.4/2.53/2.66 | $551 | |||||
2620 | 6/12 | 95W | 2/2/2.5 | $406 | |||||
E5620 | 4/8 | 80W | 2.4/2.53/2.66 | $387 | |||||
High clock / budget | High clock / budget | ||||||||
X5647 | 4/8 | 130W | 2.93/3.06/3.2 | $774 | 2643 | 4/8 | 130W | 3.3/3.3/3.5 | $885 |
E5630 | 4/8 | 80W | 2.53/2.66/2.8 | $551 | |||||
E5607 | 4/4 | 80W | 2.26 | $276 | 2609 | 4/4 | 80W | 2.4 | $294 |
Power Optimized | Power Optimized | ||||||||
L5640 | 6/12 | 60W | 2.26/2.4/2.66 | $996 | 2650L | 8/16 | 70W | 1.8/2/2.3 | $1107 |
5630 | 4/8 | 40W | 2.13/2.26/2.4 | $551 | 2630L | 8/16 | 60W | 2/2/2.5 | $662 |
The Xeon E5-2690's somewhat out of the ordinary TDP (135W) is easy to explain. With a very small TDP increase (+5W) Intel's engineers noticed they could raise the clock of the best SKU with another 200 MHz from 2.7 GHz (130W) to 2.9 GHz. The E5-2690 was more or less a safeguard in the event that the Interlagos Opteron turned out to be a real "Bulldozer". As the Opteron could not meet these expectations, the high performance of the 135W chip allows Intel to ask more than $2000 for its best Xeon EP. Which is quite a bit more than what the best Xeon EP used to sell for so far ($1500-1600).
Since the new Xeon has two extra cores and integrates the I/O hub (IOH), it is understandable that the TDP values are a bit higher compared to the older Xeon.
How does these new Xeon SKUs compare to the Opteron? See below.
AMD vs. Intel 2-socket SKU Comparison | |||||||||
Xeon E5 |
Cores/ Threads |
TDP |
Clock (GHz) |
Price | Opteron |
Modules/ Integer cores |
TDP |
Clock (GHz) |
Price |
High Performance | High Performance | ||||||||
2665 | 8/16 | 115W | 2.4/2.8/3.1 | $1440 | |||||
2650 | 8/16 | 95W | 2/2.4/2.8 | $1107 | 6282 SE | 8/16 | 140W | 2.6/3.0/3.3 | $1019 |
Midrange | Midrange | ||||||||
2640 | 6/12 | 95W | 2.5/2.5/3 | $885 | 6276 | 8/16 | 115W | 2.3/2.6/3.2 | $788 |
2630 | 6/12 | 95W | 2.3/2.3/2.8 | $639 | 6274 | 8/16 | 115W | 2.2/2.5/3.1 | $639 |
6272 | 8/16 | 115W | 2.0/2.4/3.0 | $523 | |||||
2620 | 6/12 | 95W | 2/2/2.5 | $406 | 6238 | 6/12 | 115W | 2.6/2.9/3.2 | $455 |
6234 | 6/12 | 115W | 2.4/2.7/3.0 | $377 | |||||
High clock / budget | High clock / budget | ||||||||
2643 | 4/8 | 130W | 3.3/3.3/3.5 | $885 | |||||
6220 | 4/8 | 115W | 3.0/3.3/3.6 | $455 | |||||
2609 | 4/4 | 80W | 2.4 | $294 | 6212 | 4/8 | 115W | 2.6/2.9/3.2 | $266 |
Power Optimized | Power Optimized | ||||||||
2630L | 8/16 | 60W | 2/2/2.5 | $662 | 6262HE | 8/16 | 85W | 1.6/2.1/2.9 | $523 |
Let's start with the midrange first, as the competition is the fiercest there and these SKUs are among the most popular on the market. Based on the paper specs, AMD's 6276, 6274 and Intel's 2640 and 2630 are in a neck-and-neck race. AMD offers 16 smaller integer clusters, while Intel offers 6 or 8 heavy, slightly higher clocked cores with SMT. And while we did not receive a Xeon E5-2630 for benchmarking purposes, we were able to quickly simulate one by disabling the 2 cores of our Xeon 2660, which gave us a six-core processor at 2.2 GHz with 20 MB L3-cache. This pseudo-2660 should perform very similar to the real Xeon 2630, which is clocked 4.5% higher, but has 5 MB less L3-cache.
Meanwhile in the high performance segment we'll be comparing our six-core 2660 with the Opteron 6276. The CPUs in this comparison aren't going to be in the same price bracket, but as the AMD platform is typically a bit cheaper the 2660 and the Opteron 6276 end up having similar total platform costs. Otherwise for a more straightforward comparison based solely on CPU prices the 2660's closest competitor would be the Opteron 6274. We don't have one of those on hand, but you can get a pretty good idea of how that would compare by knocking 4% off of the performance of the 6276..
Finally, for the "Power Optimized" market there seems to be little contest over who is going to win there. Intel's chip is a bit more expensive, but it offers a much lower TDP, just as many threads, and a higher clockspeed. Considering that the Intel chip also integrates the PCIe controller, it looks like Intel will have no trouble winning this battle by a landslide. Fortunately for AMD, this review is mostly about the more popular midrange market.
81 Comments
View All Comments
JohanAnandtech - Wednesday, March 7, 2012 - link
Argh. You are absolutely right. I reversed all divisions. I am fixing this as we type. Luckily this does not alter the conclusion: LS-DYNA does not scale with clockspeed very well.alpha754293 - Wednesday, March 7, 2012 - link
I think that I might have an answer for you as to why it might not scale well with clock speed.When you start a multiprocessor LS-DYNA run, it goes through a stage where it decomposes the problem (through a process called recursive coordinate bisection (RCB)).
This decomposition phase is done every time you start the run, and it only runs on a single processor/core. So, suppose that you have a dual-socket server where the processors say...are hitting 4 GHz. That can potentially be faster than say if you had a four-socket server, but each of the processors are only 2.4 GHz.
In the first case, you have a small number of really fast cores (and so it will decompose the domain very quickly), whereas in the latter, you have a large number of much slower cores, so the decomposition will happen slowly, but it MIGHT be able to solve the rest of it slightly faster (to make up for the difference) just because you're throwing more hardware at it.
Here's where you can do a little more experimenting if you like.
Using the pfile (command line option/flag 'p=file'), not only can you control the decomposition method, but you can also tell it to write the decomposition to a file.
So had you had more time, what I would have probably done is written out the decompositions for all of the various permutations you're going to be running. (n-cores, m-number of files.)
When you start the run, instead of it having to decompose the problem over and over again each time it starts, you just use the decomposition that it's already done (once) and then that way, you would only be testing PURELY the solving part of the run, rather than from beginning to end. (That isn't to say that the results you've got is bad - it's good data), but that should help to take more variables out of the equation when it comes to why it doesn't scale well with clock speed. (It should).
IntelUser2000 - Tuesday, March 6, 2012 - link
Please refrain from creating flamebait in your posts. Your post is almost like spam, almost no useful information is there. If you are going to love one side, don't hate the other.Alexko - Tuesday, March 6, 2012 - link
It's not "like spam", it's just plain spam at this point. A little ban + mass delete combo seems to be in order, just to cleanup this thread—and probably others.ultimav - Wednesday, March 7, 2012 - link
My troll meter is reading off the charts with this guy. Reading between the lines, he's actually a hardcore AMD fan trying to come across as the Intel version of Sharikou to paint Intel fans in a bad light. Pretty obvious actually.JohanAnandtech - Wednesday, March 7, 2012 - link
We had to mass delete his posts as they indeed did not contain any useful info and were full of insults. The signal to noise ratio has been good the last years, so we must keep it that way.Inteluser2000, Alexko, Ultimav, tipoo: thx for helping to keep the tone civil here. Appreciate it.
- Johan.
tipoo - Wednesday, March 7, 2012 - link
And thank you for removing that stuff.tipoo - Tuesday, March 6, 2012 - link
We get it. Don't spam the whole place with the same post.tipoo - Tuesday, March 6, 2012 - link
No, he's just a rational persons. I don't care which company you like, if you say the same thing 10 times in one article someones sure to get annoyed and with justification.MySchizoBuddy - Tuesday, March 6, 2012 - link
I'm again requesting that when you do the benchmarks please do a Performance per watt metric along with stress testing by running folding@home for straight 48hours.