A New Architecture

This is a first. Usually when we go into these performance previews we’re aware of the architecture we’re reviewing, all we’re missing are the intimate details of how well it performs. This was the case for Conroe, Nehalem and Lynnfield (we sat Westmere out until final hardware was ready). Sandy Bridge, is a different story entirely.

Here’s what we do know.

Sandy Bridge is a 32nm CPU with an on-die GPU. While Clarkdale/Arrandale have a 45nm GPU on package, Sandy Bridge moves the GPU transistors on die. Not only is the GPU on die but it shares the L3 cache of the CPU.

There are two different GPU configurations, referred to internally as 1 core or 2 cores. A single GPU core in this case refers to 6 EUs, Intel’s graphics processor equivalent (NVIDIA would call them CUDA cores). Sandy Bridge will be offered in configurations with 6 or 12 EUs.

While the numbers may not sound like much, the Sandy Bridge GPU is significantly redesigned compared to what’s out currently. Intel already announced a ~2x performance improvement compared to Clarkdale/Arrandale, and I can say that after testing Sandy Bridge Intel has been able to achieve at least that.

Both the CPU and GPU on SB will be able to turbo independently of one another. If you’re playing a game that uses more GPU than CPU, the CPU may run at stock speed (or lower) and the GPU can use the additional thermal headroom to clock up. The same applies in reverse if you’re running something computationally intensive.

On the CPU side little is known about the execution pipeline. Sandy Bridge enables support for AVX instructions, just like Bulldozer. The CPU will also have dedicated hardware video transcoding hardware to fend off advances by GPUs in the transcoding space.

Caches remain mostly unchanged. The L1 cache is still 64KB (32KB instruction + 32KB data) and the L2 is still a low latency 256KB. I measured both as still 4 and 10 cycles respectively. The L3 cache has changed however.

Only the Core i7 2600 has an 8MB L3 cache, the 2400, 2500 and 2600 have a 6MB L3 and the 2100 has a 3MB L3. The L3 size should matter more with Sandy Bridge due to the fact that it’s shared by the GPU in those cases where the integrated graphics is active. I am a bit puzzled why Intel strayed from the steadfast 2MB L3 per core Nehalem’s lead architect wanted to commit to. I guess I’ll find out more from him at IDF :)

The other change appears to either be L3 cache latency or prefetcher aggressiveness, or both. Although most third party tools don’t accurately measure L3 latency they can usually give you a rough idea of latency changes between similar architectures. In this case I turned to cachemem which reported Sandy Bridge’s L3 latency as 26 cycles, down from ~35 in Lynnfield (Lynnfield’s actual L3 latency is 42 clocks).

As I mentioned before, I’m not sure whether this is the result of a lower latency L3 cache or more aggressive prefetchers, or both. I had limited time with the system and was unfortunately unable to do much more.

And that’s about it. I can fit everything I know about Sandy Bridge onto a single page and even then it’s not telling us much. We’ll certainly find out more at IDF next month. What I will say is this: Sandy Bridge is not a minor update. As you’ll soon see, the performance improvements the CPU will offer across the board will make most anyone want to upgrade.

A New Name A New Socket and New Chipsets
Comments Locked


View All Comments

  • chizow - Saturday, August 28, 2010 - link

    OK it seems as if you were referring to the PCIe lanes connected off the actual P67 chipset, not the native PCIe controller integrated into the CPU. I do recall the P55 chipset supporting PCIe 2.0 but limiting it to PCIe 1.0 bandwidth for interconnects like USB or SATA controllers.
  • chizow - Saturday, August 28, 2010 - link

    Overall it looks like Sandy Bridge is a disappointment. One really has to question why Intel has reversed their Tick Tock cadence this time around by launching their low-mid range parts and platform so soon on the heals of P55/Lynnfield/Clarkfield, but I guess it makes more sense in the light of the fact Intel delayed that platform's launch for nearly a year. I would be EXTREMELY disappointed if I bought a P55 board in the last year only to find out Intel is again requiring a platform/socket change for what appears to be a marginal upgrade.

    There's also some clear deficiencies and disappointments in terms of improvements over last-gen platforms with P67:

    1) No additional L2/L3 cache, in some cases less than previous gen.
    2) No native USB 3.0 support. One can conjure up myriad reasons for why Intel is resisting USB 3.0 adoption, but its clearly obvious at this point that they have been resisting it since its inception.
    3) Limited SATA 6G support. 2/8 ports I believe, but still better than nothing I suppose.
    4) No additional PCIe lanes or PCIe 3.0 support, but at least they're finally going to support their actual PCIe 2.0 rated specs?
    5) Limited/reduced overclockability. Big mistake imo, Intel seems to be forgetting why AMD was the enthusiast's choice back in the early Athlon/P4 days.

    That leaves us with the major improvements:

    1) 5-15% improvement clock-for-clock and core-for-core compared to older Nehalem and Westmere architectures.
    2) Lower TDP
    3) 2x faster GPU that's still too slow for any meaningful gaming

    Hopefully the high-end X58 replacement platform offers bigger improvements. There's also some question as to whether LGA2011 will be HPC/server only and an intermediary platform (LGA1355) is to replace LGA1366, however, early rumors show it will introduce or improve upon many of the deficiencies I listed with P67 and show us what Sandy Bridge is really capable of. Get rid of that extraneous, massive GPU on the high-end and replace it with more L3 and execution units and we'll see some bigger gains than the underwhelming 5-15% we see with this first version of Sandy Bridge.
  • seapeople - Saturday, August 28, 2010 - link

    Remember, that 5-15% clock-for-clock increase includes turboboost functioning on the current processors, which generally ratchets up the clock speed even in heavily multithreaded loads. It looks like the IPC increase with Sandy Bridge is at least 20% here. I would consider that fairly significant considering that Intel is already on top of the market with no real competition, other than for AMD to sell its top-of-the-line CPU's for cheap.

    It's also weird to see people deride IGP improvements that double the performance of the previous version. These integrated graphics are sufficient for probably 85% of the market (pretty much everyone who doesn't need to play current high-end games). Basically, the majority of people will be getting a free $50 graphics card built in to their processor, which itself is giving you a 20-40% performance improvement over a similarly priced last-gen processor.
  • chizow - Saturday, August 28, 2010 - link

    Yeah I actually factored Turbo Boost not working on Sandy Bridge, as otherwise it would probably be closer to 0-10% increase clock-for-clock. Anand pegs SB ~10% faster overall clock-for-clock in his conclusion with another 3-7% with Turbo.

    Also, tempering any excitement over that 10% IPC increase we have the very bad news about Intel limiting overclocking significantly, so for virtually anyone who already owns a P55/Lynnfield/Clarkfield combo anything but a "K" designated chip may actually be a downgrade as you won't be able to easily enable your own homebrewed "Turbo" any longer with most Sandy Bridge SKUs. I'd say the nearly guaranteed 30-40% OC you lose far outweighs the prospective 10-15% clock-for-clock gain you'd see with Sandy Bridge.

    As for the IGP being sufficient or any great accomplishment with what Sandy Bridge brings...I'd disagree. Sure I guess its great news for Intel that SB is actually able to adequately accelerate 1080p, but its still far from replacing even mid-range parts from 2-3 generations ago. If Anand perhaps ran some benchmarks at resolutions and settings people actually used it might be more relevant but the fact of the matter is, ~80% of "gamers" are gaming at resolutions of 1280x1024 or higher according to Steam Survey: http://store.steampowered.com/hwsurvey/

    My issue with the IGP is its going to take up significant die space, I estimate at least as much die area for the 2C IGP relative to the rest of the 4C CPU using Clarkdale as a guideline. For those who have no interest in an IGP or go with the P67 platform that doesn't even support it, that's a waste of die space that you're still absorbing and paying for.

    I just find it amazingly ironic how times have changed where the CPU was once thought of as the "general purpose" ASIC and the GPU was the "fixed-function", inflexible ASIC. How times have changed. With Sandy Bridge, we now have the CPU, an on-die IGP, and now even talk of an integrated super-sekret hardware video transcoder! Roles have clearly reversed as the CPU becomes ever-increasingly segmented and specialized while the GPU continues to evolve toward general purpose flexibility.

    In that sense, I really think AMD has the right approach with Fusion, as their ALU and FPUs will be shared on their Bulldozer and Bobcat designs rather than segregated and specialized like on Sandy Bridge with its single-purposed CPU cores and IGP EUs.
  • DanNeely - Sunday, August 29, 2010 - link

    80% of steam users is not the same thing as 80% of total PC buyers, or even 80% of the total gamers (think facebook games, etc). Serious gamers are not, any more than overclockers, a core market for Intel or AMD's CPU divisions.
  • chizow - Sunday, August 29, 2010 - link

    Yes I'm well aware Steam users do not make up 100% of the total PC market, but I would say it is a fair representation of the kind of hardware and resolution actual gamers use. In those same browser-based games you're referring to, any existing IGP would have been adequate but that's clearly not the market Intel is trying to entice or the point of the comparison, buyers who would otherwise choose discrete GPUs.

    As you can see, most of these users are not using Intel IGPs (only 7%) because they are inadequate for actual gaming at the resolutions ~80% of them game at, 1280x1024 or higher, so benching a bunch of games at 1024x768 and trying to pass off this new IGP as adequate tells me nothing as its not indicative of real world applications.

    Also, I'd take this a step further and argue the vast majority of those buying one of these new Sandy Bridge processors and systems would opt for a much higher resolution than even 1280x1024, as the most common desktop resolutions available for purchase today are going to be wide aspect 1680x1050, 1920x1080, and 1920x1200 displays. When's the last time you were able to buy an OEM build with a 1024x768 native display or even a 4:3 or 5:4 display for that matter?

    If Intel and AT want to pass this IGP off as an HD gaming solution to rival discrete solutions, bench some resolutions and settings people would actually expect to game at.
  • tatertot - Sunday, August 29, 2010 - link

    No, the 10% average outperformance in this review (see the conclusion) is against the i7 880 which has been allowed to turbo.

    Anand uses "clock-for-clock" to distinguish that part from the "same price replacement" the i5 760.

    So it achieves 10% average outperformance against a part that runs ~20% faster on single-threaded loads, ~15% faster on 2 threads... down to a bin or so of turboing on fully-threaded loads.

    That puts the clock/clock performance improvement at around 20%, and this is not including AVX / hardware transcoding.
  • chizow - Sunday, August 29, 2010 - link

    Yes the i7 880 is the basis for the clock-for-clock comparisons to come to 10% increase, with Turbo on SB he expects another 3-7% increase which is again, in-line with my estimate of 5-15% instead of 0-10% gain, clock-for-clock with and without Turbo.

    From the conclusion verbatim:
    "Sandy Bridge seems to offer a 10% increase in performance. Keep in mind that this analysis was done without a functional turbo mode, so the shipping Sandy Bridge CPUs should be even quicker. I'd estimate you can add another 3 - 7% to these numbers for the final chips."

    In almost all of the benches in the test, you are going to be limited to 1 or 2 Turbo bins max which is why Anand limited his estimates to 3-7%, because all of the tests will be using more than 1 core. Under the same tests the benefits of Turbo for both Lynnfield and SB are going to be the same assuming the final Turbo bins and throttling is also the same. So if Lynnfield only gets 1 bin at 2+ cores then SB would only get the same benefit, which is where I'm sure Anand based his estimates (100/3100 and 200/3100).

    Simply put, a 15% or even 20% clock-for-clock increase after 2 years from a new architecture is underwhelming imo, especially considering everything else they've left out, but I guess this is what we've come to expect and be thrilled about in a market dominated by Intel without any real competition. Sorry, I'm just less than impressed, especially given the artificial restrictions Intel plans to place on overclocking, further reducing any IPC benefits from SB compared to Lynnfield.
  • seapeople - Sunday, August 29, 2010 - link

    If you throw out Netburst, which was a significant decrease in IPC from Pentium III, when have we had significantly greater than 20% IPC increase within 2 years for an architecture? I understand your other complaints (although I don't see what's wrong with just buying the K models, which all indications suggest won't be much more expensive), but what were you really expecting in IPC increases? 40%? 60%?
  • chizow - Sunday, August 29, 2010 - link

    Netburst was a reduction in IPC but a tripling of clockspeed compared to P3, but surely you aren't forgetting the incredible gains in IPC from Netburst to Yonah (Core) and Conroe (Core 2)?

    Conroe effectively increased performance 100% clock-for-clock from P4 (or 50% or so from Yonah), as it offered some 50% better performance at 50% lower clockspeeds compared to Netburst. While I certainly don't expect that kind of revolutionary product every 2-3 years, we're not even close to that kind of gain in the 4-5 years since Conroe was introduced with not even that much aggregrate difference from Conroe/Penryn/Nehalem/Westmere to SB. From Conroe to SB, clock for clock, we're maybe looking at 50% improvement?

    That's 2 full Tick-Tock cycles signaling Moore's Law is clearly dead to Intel when it comes to performance, they only loosely follow its cadence in terms of refreshes, die sizes, transistor counts and fab processes. In order to achieve those kinds of gains, they had to redesign their CPU from nearly the ground-up to compete with AMD, which had the performance lead at the time. Intel clearly hasn't felt the need to improve or innovate signfificantly since then as AMD is essentially 2 generations behind still in performance, about on par with their Penryn offerings at this point.

Log in

Don't have an account? Sign up now