The AMD FX (Bulldozer) Scheduling Hotfixes Tested
by Anand Lal Shimpi on January 27, 2012 12:47 PM ESTSingle & Heavily Threaded Workloads Need Not Apply
Remembering what these two hotfixes actually do, the only hope for performance gains comes from running workloads that are neither single threaded nor heavily threaded. To confirm that there are no gains at either end of the spectrum we first turn to Cinebench, a 3D rendering test that lets us configure how many threads are in use:
With one thread or 8 threads active, the FX-8150's performance is unchanged by the new hotfixes. I also ran TrueCrypt's encryption/decryption benchmark, another heavily threaded test that runs on all cores/modules:
Once again, there's no change in performance. Although you can argue that CPU performance is most important when utilization is at its highest, most desktops will find themselves in between full utilization of a single core and all cores. To test those cases, we need to look elsewhere.
79 Comments
View All Comments
wumpus - Friday, January 27, 2012 - link
I'd have to believe that any CPU with SMT enabled will benefit. That is, unless they already have this feature. Of course, Intel has been shipping SMT processors since P4. I'd like to believe that microsoft simply flipped whatever switch to treat bulldozer cores as SMT cores, but I don't have enough faith in microsoft's scheduling to believe they ever got it right.hansmuff - Friday, January 27, 2012 - link
At least Windows 7 (haven't tested anything else) schedules threads properly on Sandy Bridge. HT only comes into play once all 4 cores are loaded.tipoo - Friday, January 27, 2012 - link
Windows already has intelligent behaviour for Hyperthreading. I don't think this will change anything on the Intel side.silet1911 - Wednesday, February 1, 2012 - link
Yes, a website called Jagatreview have review a 2500+patch and there is a small performance increasehttp://www.jagatreview.com/2012/01/amd-fx-8120-vs-...
tk11 - Friday, January 27, 2012 - link
Even if if a scheduler did take the time to figure out when threads shared a significant number of recent memory accesses would that be enough information to determine that the thread would perform optimally on the same module as a related thread rather than an unused module?Also... Wouldn't running code that performed "intelligent core/module scheduling based on the memory addresses touched by a thread" negatively impact performance far more than any gains realized by scheduling threads on cores that are merely suspected to be more optimally suited to running each particular thread?
eastyy123 - Friday, January 27, 2012 - link
could some explain the whole module/core thing to me pleasei always assumed a core was basically like a whole processor shrunk onto a die is that basically right ?
and how does the amd modules differ ?
KonradK - Friday, January 27, 2012 - link
Long sory short:Bulldozer's module consist 2 integer cores and 1 floating point (FPU) core.
KonradK - Friday, January 27, 2012 - link
"Story" no "sory"I'm sorry...
Ammaross - Friday, January 27, 2012 - link
"Bulldozer's module consist 2 integer cores and 1 floating point (FPU) core."However, the 1 FPU core can be used as two single floating point cores or a single double double floating point core, so it depends on the floating point data running through it.
KonradK - Friday, January 27, 2012 - link
Not sure what you are supposing.Precision is the same, regardless of fact whether one or two threads are executed by FPU core. There are single or double precision FPU instructions, but aech thread can use any of them.
However if you mean single or double performance:
If two FPU threads will run on the same module each of them will have half of performance in comparision tothe same two FPU threads running on separate modules.
Just in first case one FPU is shared by two threads.
And it is whole point in the hotfixes - avoiding such situation as long this is possible.