AMD's Steamroller Detailed: 3rd Generation Bulldozer Core
by Anand Lal Shimpi on August 28, 2012 4:39 PM EST- Posted in
- CPUs
- Bulldozer
- AMD
- Steamroller
Today at the annual Hot Chips conference, AMD’s new CTO Mark Papermaster unveiled the first details about the Steamroller x86 CPU core.
Steamroller is the third instantiation of AMD’s Bulldozer architecture, first conceived in the mid-2000s and finally brought to market in late 2011. Committed to this architecture for at least one more design after Steamroller, AMD has settled on roughly yearly updates to the architecture. For 2012 we have the introduction of Piledriver, the optimized Bulldozer derivative that formed the CPU foundation for AMD’s Trinity APU. By the end of the year we’ll also see a high-end desktop CPU without processor graphics based on Piledriver.
Piledriver saw a switch to hard edge flip flops, which allowed for a considerable decrease in power consumption at the expense of careful design and validation work. Performance didn’t change, but AMD saw a 10% - 20% reduction in active power. Piledriver also brought some scheduling efficiency improvements, but prefetching and branch prediction were the two other major design improvements in Piledriver.
Steamroller is designed to keep the ball rolling. It takes fundamentals from the Bulldozer/Piledriver architectures and offers a healthy set of evolutionary improvements on top of them. In Intel speak Steamroller wouldn’t be a tick as it isn’t accompanied by a significant process change (28nm bulk is pretty close to 32nm SOI), but it’s not a tock as the architecture is mostly enhanced but largely unchanged. Steamroller fits somewhere in between those two extremes when it comes to changes.
Front End Improvements
One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:
Front End Comparison | |||||
AMD Phenom II | AMD FX | Intel Core i7 | |||
Instruction Decode Width | 3-wide | 4-wide | 4-wide | ||
Single Core Peak Decode Rate | 3 instructions | 4 instructions | 4 instructions | ||
Dual Core Peak Decode Rate | 6 instructions | 4 instructions | 8 instructions | ||
Quad Core Peak Decode Rate | 12 instructions | 8 instructions | 16 instructions | ||
Six/Eight Core Peak Decode Rate | 18 instructions (6C) | 16 instructions | 24 instructions (6C) |
Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle. Don’t expect a doubling of performance since it’s rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller.
The penalties are pretty obvious: area goes up as does power consumption. However the tradeoff is likely worth it, and both of these downsides can be offset in other areas of the design as you’ll soon see.
Steamroller inherits the perceptron branch predictor from Piledriver, but in an improved form for better performance (mostly in server workloads). The branch target buffer is also larger, which contributes to a reduction in mispredicted branches by up to 20%.
Execution Improvements
AMD streamlined the large, shared floating point unit in each Steamroller module. There’s no change in the execution capabilities of the FPU, but there’s a reduction in overall area. The MMX unit now shares some hardware with the 128-bit FMAC pipes. AMD wouldn’t offer too many specifics, just to say that the shared hardware only really applied for mutually exclusive MMX/FMA/FP operations and thus wouldn’t result in a performance penalty.
The reduction of pipeline resources is supposed to deliver the same throughput at lower power and area, basically a smarter implementation of the Bulldozer/Piledriver FPU.
There’s no change to the integer execution units themselves, but there are other improvements that improve integer performance.
The integer and floating point register files are bigger in Steamroller, although AMD isn’t being specific about how much they’ve grown. Load operations (two operands) are also compressed so that they only take a single entry in the physical register file, which helps increase the effective size of each RF.
The scheduling windows also increased in size, which should enable greater utilization of existing execution resources.
Store to load forwarding sees an improvement. AMD is better at detecting interlocks, cancelling the load and getting data from the store in Steamroller than before.
126 Comments
View All Comments
fic2 - Wednesday, August 29, 2012 - link
I was thinking that he actually meant Sandy Bridge or possibly Ivy Bridge instead of Haswell... - it is what he should have meant anyway.jabber - Wednesday, August 29, 2012 - link
..."Shock as AMD chip fails to excite minority tech audience in numerous pointless synthetic benchmarks that no one with a life cares about!"Elsewhere the rest of the world worries about important stuff and tries to make a living.
meloz - Wednesday, August 29, 2012 - link
>Elsewhere the rest of the world worries about important stuff and tries to make a living.And they do this by buying Intel CPUs, apparently, because Intel have over 90% marketshare and it is only increasing day by day. It would appear that having an Intel processor does not get in the way of "important stuff" and "making a living", quiet the contrary.
Only large scale dumping of CPUs / APUs to OEMs at cost price -and their graphic division- has kept AMD alive these past few months. No hope on the horizon, either.
jabber - Wednesday, August 29, 2012 - link
The follow up story - 'IT Folks constantly fail to understand irony shock!'Aaron73 - Wednesday, August 29, 2012 - link
"Elsewhere the rest of the world worries about important stuff and tries to make a living."Apparently not you, as you have posted multiple pointless comments to this article. I happened to notice while on my lunch break.
Sub Zero - Thursday, August 30, 2012 - link
The new AMD architecture is noticeably slower than the last one. My 4 core 965 is faster than the 8150 in many operations - most in fact. It's pathetic and inexcusable.I'm so disenchanted with AMD that even with all of the integrated video stuff on Intel systems and the extra cost, I am probably going to go with Intel for every purchase in the near future. I'm going to recommend Intel to everyone who asks, especially gamers.
AMD is just not worth putting any money into.
Laststop311 - Thursday, August 30, 2012 - link
The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs.-Shame they got lazy and went with an automated design when they could of had faster chips if they designed them by hand.
The architecture is still slated to debut in 2013 on GlobalFoundries' 28nm bulk process. The improvements look good on paper, but the real question remains whether or not Steamroller will be enough to go up against Haswell.
-28nm really? Intel will be on it's 2nd gen of 22nm over a year after 22nm debuts for intel and amd still can't match that size. Will steamroller be enough to go up against haswell, thats not even a legit question, haswell is going to obliterate steamroller in every way imaginable. Haswell will most likely run cooler, use less power while at the same time delivering more performance. The only hope for steamroller will be the fusion chips where amd will actually beat intel on the i-gpu. I see a good market for steamroller HTPC's and also steamroller gaming ultrabooks, if you can crossfire the i-gpu with another radeon card it would also make larger 15-17" gaming notebooks attractive. You basically get a free crossfire set up with a big boost in graphics performance without actually having to have 2 power hungry heat producing gpu's. 7970m(or 8970m if its out) + highest level steamroller fusion i-gpu will produce some pretty smooth sexy graphics without needing all this room and extreme cooling and loud fans that accompany dual gpu laptops (sorry m18x I still love you like my child since I have no children)
hapkiman - Sunday, September 2, 2012 - link
You know I was an AMD fanboy all the way, and waited, and waited....and waited for Bulldozer, expecting a significant step over my Phenom IIx6 1090T. I used to talk trash with my Intel buddies, and I stuck with AMD through some hard times. And when Bulldozer finally came out and was basically a letdown- I felt burned. AMD has got to do something with their core business model and there fabrication process. Some of these FX chips they're producing are ok, and may have a niche market, but geeze AMD, when you have Intel as your main and only competitor - you've got to step up your A game. And AMD had consistently failed to do that.Sorry guysbut I gave up and put together an Ivy Bridge rig - and I am amazed at how much smoother and faster it is than my old 1090T rig.
If they don't hit one out of the park soon, I see AMD turning into a second rate company making low-end APUs for OEMs. and of course graphics cards.
PLEASE prove me wrong AMD. Make Steamroller something special.
mikato - Tuesday, September 4, 2012 - link
Hmm, I have a Phenom II X4 965 (and SSD for OS and programs) and my system is completely smooth and almost all everyday tasks are instant, so I'm not sure how you could be seeing that much of a difference. Maybe your OS got bloated up or something?AVoyeur4U - Friday, September 14, 2012 - link
I have an Intel i7-based notebook provided by my employer, use an HP DL380 G5 (8way 32GB RAM w/ nVidia graphics) as my primary workstation instead, and have a handful of them at home. All of that compute power and I still prefer to use the system which I'm on currently, which is built on an AMD Athlon x64 x2 6400 with and nVidia GeForce 8800 GTS video card and 8GB of RAM.Being an AMD 'fanboy' combined with how well my home desktop has held up and all the hype leading up to Bulldozer, I've held off on upgrading. Heck, I even took down my servers and have had them stacked in the garage for the last couple of years.
Each missed release date only made it that much more disappointing when Bulldozer finally released and fell far short of expectations. I actually found myself looking forward to the 2nd generation release before the 1st became available. After a few more months of waiting I finally decided to pull the plug on AMD and build an Intel-based system.
Out of pure desperation to be proven wrong by AMD I decided to check and see if there was anything in the pipeline that would persuade me to hold off on clicking the check-out button.
I read 2 other articles prior to this regarding their 3rd generation builds. All 3 the same. Consequently, I sit here asking myself WTF!? Not at AMD (this let down was expected) but at myself. There is no other manufacturer, service provider, or producer that I would tollerate this from, why am accepting it from AMD?
No more... Going to go check out after posting. By time I'm ready to purchase again it'll most likely be a choice between nVidia's upcoming CPU line and Intel - no longer hopeful that AMD will be able to become competitive again.
(I'm well aware that AMD is quite competitive, especially when it comes to price/performance; however, they are so far behind in terms of currency. They don't support the latest technologies/components and most likely won't within the next 3 years - I don't plan to upgrade again within that time so they're out.)