AMD's Steamroller Detailed: 3rd Generation Bulldozer Core
by Anand Lal Shimpi on August 28, 2012 4:39 PM EST- Posted in
- CPUs
- Bulldozer
- AMD
- Steamroller
Today at the annual Hot Chips conference, AMD’s new CTO Mark Papermaster unveiled the first details about the Steamroller x86 CPU core.
Steamroller is the third instantiation of AMD’s Bulldozer architecture, first conceived in the mid-2000s and finally brought to market in late 2011. Committed to this architecture for at least one more design after Steamroller, AMD has settled on roughly yearly updates to the architecture. For 2012 we have the introduction of Piledriver, the optimized Bulldozer derivative that formed the CPU foundation for AMD’s Trinity APU. By the end of the year we’ll also see a high-end desktop CPU without processor graphics based on Piledriver.
Piledriver saw a switch to hard edge flip flops, which allowed for a considerable decrease in power consumption at the expense of careful design and validation work. Performance didn’t change, but AMD saw a 10% - 20% reduction in active power. Piledriver also brought some scheduling efficiency improvements, but prefetching and branch prediction were the two other major design improvements in Piledriver.
Steamroller is designed to keep the ball rolling. It takes fundamentals from the Bulldozer/Piledriver architectures and offers a healthy set of evolutionary improvements on top of them. In Intel speak Steamroller wouldn’t be a tick as it isn’t accompanied by a significant process change (28nm bulk is pretty close to 32nm SOI), but it’s not a tock as the architecture is mostly enhanced but largely unchanged. Steamroller fits somewhere in between those two extremes when it comes to changes.
Front End Improvements
One of the biggest issues with the front end of Bulldozer and Piledriver is the shared fetch and decode hardware. This table from our original Bulldozer review helps illustrate the problem:
Front End Comparison | |||||
AMD Phenom II | AMD FX | Intel Core i7 | |||
Instruction Decode Width | 3-wide | 4-wide | 4-wide | ||
Single Core Peak Decode Rate | 3 instructions | 4 instructions | 4 instructions | ||
Dual Core Peak Decode Rate | 6 instructions | 4 instructions | 8 instructions | ||
Quad Core Peak Decode Rate | 12 instructions | 8 instructions | 16 instructions | ||
Six/Eight Core Peak Decode Rate | 18 instructions (6C) | 16 instructions | 24 instructions (6C) |
Steamroller addresses this by duplicating the decode hardware in each module. Now each core has its own 4-wide instruction decoder, and both decoders can operate in parallel rather than alternating every other cycle. Don’t expect a doubling of performance since it’s rare that a 4-issue front end sees anywhere near full utilization, but this is easily the single largest performance improvement from all of the changes in Steamroller.
The penalties are pretty obvious: area goes up as does power consumption. However the tradeoff is likely worth it, and both of these downsides can be offset in other areas of the design as you’ll soon see.
Steamroller inherits the perceptron branch predictor from Piledriver, but in an improved form for better performance (mostly in server workloads). The branch target buffer is also larger, which contributes to a reduction in mispredicted branches by up to 20%.
Execution Improvements
AMD streamlined the large, shared floating point unit in each Steamroller module. There’s no change in the execution capabilities of the FPU, but there’s a reduction in overall area. The MMX unit now shares some hardware with the 128-bit FMAC pipes. AMD wouldn’t offer too many specifics, just to say that the shared hardware only really applied for mutually exclusive MMX/FMA/FP operations and thus wouldn’t result in a performance penalty.
The reduction of pipeline resources is supposed to deliver the same throughput at lower power and area, basically a smarter implementation of the Bulldozer/Piledriver FPU.
There’s no change to the integer execution units themselves, but there are other improvements that improve integer performance.
The integer and floating point register files are bigger in Steamroller, although AMD isn’t being specific about how much they’ve grown. Load operations (two operands) are also compressed so that they only take a single entry in the physical register file, which helps increase the effective size of each RF.
The scheduling windows also increased in size, which should enable greater utilization of existing execution resources.
Store to load forwarding sees an improvement. AMD is better at detecting interlocks, cancelling the load and getting data from the store in Steamroller than before.
126 Comments
View All Comments
StevoLincolnite - Wednesday, August 29, 2012 - link
The Desktop isn't going anywhere, neither is it shrinking, the sales rate is merely slowing down as everyone has one.Netbooks hit the same wall a couple years ago, tablets and phones will hit the wall in due time.
The Netbook didn't kill off the laptop, the laptop didn't kill off the desktop, they all compliment each other.
We have been in a post-pc world since early 2000 and in that time the sales of PC's have tripled.
Conficio - Wednesday, August 29, 2012 - link
I'd like you to see bid frequently (daily or more) and successfully on eBay items on a Nexus 7.Not to mention the crowd of people that actually sell something on eBay. have fun uploading multiple pictures and typing longer descriptions on an eBay item on a tablet.
Hrel - Tuesday, August 28, 2012 - link
Let me know when AMD releases a new CPU that is at least 100% faster than their last CPU. Cause that's the only time I'll consider them even being an option again. Honestly AMD, add SMT. The performance gain/watt is amazing. You can still have more cores, but have SMT too.Taft12 - Tuesday, August 28, 2012 - link
Define faster.CeriseCogburn - Wednesday, August 29, 2012 - link
Something that doesn't get renamed "crapdozer". LOLnicamarvin - Thursday, August 30, 2012 - link
Ivy is only 5% faster than Sandy, let me know when Intel releases a new CPU thats at least 100% faster than their las CPULepton87 - Tuesday, August 28, 2012 - link
Not by a long shot. All we can expect this steaming pile of shitty engineering is to be competitive with nehalem. Still worse ST performance but better MT performance. There's only so much you can do with polishing a turd.CeriseCogburn - Wednesday, August 29, 2012 - link
But what if it becomes a petrified turd from being around so long and getting buried all the time ?Then it seems it could be a really hard, polished up.... legendary find ?
nicamarvin - Friday, August 31, 2012 - link
good thing these are processors and not Turds, and they can and will be polishedBelard - Tuesday, August 28, 2012 - link
AMD as a whole, needs to streamline their entire consumer line. The Steamroller sounds good in everyway - but we need to see it. By the time it comes out, I'll be chugging along with my intel i5 CPU... my Core2Quad is actually holding up pretty damn good.Much of my AMD friends and clients have gone intel already. But, I have no problems building an AMD system as long as it provides good performance for the price... which is something the FX DID not come close to doing. There is simply NO way I can recommend any FX CPU to anyone... The A-series for low-end is fine. Windows8 is another thing to mess things up, hopefully Windows7 will be available for us IT /small tech people to continue building and selling systems.
The problems (I see) with the AMD mess, which should hopefully be cleared up by 2013. Currently, AMD has 3 different sockets on the market. Its confusing as to what chip goes with which chipset etc etc. Socket A+ needs to die. The CPUs need to be like Core i-series, ALL of them have a GPU built in -THAT can be used as a co-processor if not used for graphics at any time. It simplify the SKUs.
Socket FM1 is dead... Socket FM2 is currently shipping only from OEMs (HP, etc). But the bone-head thing is that FM2 is not at all compatible with FM1 - yet current FM2 motherboards use the EXACT same AMD north bridge! WTF?! FM2 doesn't support PCIe 3.0 And according to the LAST AMD roadmap I've seen, AMD won't have a PCIe 3.0 chipset until 2014? Hey, doesn't AMD sell PCIe 3.0 video cards? Yep... and you can't use them on an AMD powered computer... how stupid. FM1/2 chipset are more advanced than AM3 as they have native USB 3.0 support.
AMD needs to get their butt into gear. There should only be FM2, a NEW chipset in 2013 with PCIe 3.0 support. The new Steamroller CPUs should have a whole new brand name and model number. "FX" has been poisoned. AMD ruined the name of FX from the past.
How about Athlon III X4-3400 (quad core @ 3.4Ghz)?
I hope AMD does well... I'm not counting on it... but they may not be as stupid as Microsoft.