Electromigration: Why AMD Ryzen Current Boosting Won't Kill Your CPU

Name: Electromigration: Why AMD Ryzen Current Boosting Won't Kill Your CPU
Item: Electromigration: Why AMD Ryzen Current Boosting Won't Kill Your CPU
Author: Dr. Ian Cutress

by Dr. Ian Cutress on June 9, 2020 9:10 AM EST

143 Comments | Add A Comment

143 Comments

Where there is a will to get extra performance out of a CPU, there is often a way. Either through end-user overclocking or motherboard vendors tweaking settings to improve their stock performance, at the end of the day everyone wants better performance, and for a multitude of reasons. This insatiable drive for peak performance, however, means that some of these tweaks and adjustments can start to skirt the lines of what is ‘in specification’. And as a result, we sometimes see methods of increasing processor performance that clearly deliver on their promises, but perhaps at the expense of thermals or longevity.

To this end, it has recently come to light that motherboard vendors have been taking advantage of a setting on AMD motherboards to misrepresent the current delivered to the CPU. By doing so, they are able to increase the processor's power headroom, and ultimately allowing for higher performance at the cost of higher thermals. To be sure, this kind of tweaking isn’t new, but recent events have lead to no shortage of confusion over what exactly is going on, and what the ramifications are for AMD Ryzen processors. So to try to clarify matters, here’s our take on the situation.

The Old Fashioned Way: Spread Spectrum, MultiCore Enhancement, PL2

One of the common themes I've noticed throughout my time at AnandTech as our motherboard editor and now our CPU editor is the lengths to which motherboard vendors will go to in order to get increased performance over the competition. We were the first outlet to break out features such as MultiCore Enhancement, way back in August 2012, which led to higher-than-specified all-core frequencies, or in some cases, outright overclocks. But the history of motherboard vendors adjusting and tweaking features for performance goes further back than that, such as variations with the base frequency from 100 MHz to 104.7 MHz with the Spread Spectrum, leading to increased performance on systems that can support it.

More recently, on Intel platforms, we’ve seen vendors increase their turbo power limits so that the motherboard can sustain the highest turbo for as long as the world remains in existence, just because the motherboard vendors are overengineering the power delivery in order to support it. In the past couple of weeks, we have also found examples of motherboards ignoring Intel’s new Thermal Velocity Boost requirements, which is something we'll be delving into more in a future article.

In short, motherboard vendors want to be the best, and that often means pushing the limits of what is considered the ‘base specification’ of the processor. As we’ve regularly discussed on topics like this with Intel’s turbo power limits, the differentiation between a ‘specification’ and a ‘recommended setting’ can get quite blurred – for Intel, the turbo power listed in the documents is a recommended setting, and any value the motherboard is set to is technically ‘in specification’. The point at which Intel considers it overclocking it seems is if the peak turbo frequency is increased.

Tweaking AM4 Above and Beyond

So now we move on to the news of the day, with motherboard manufacturers now attempting to tweak AMD based Ryzen motherboards in order to drive higher performance. As thoroughly explained over on the HWiNFO forums by The Stilt and summarized here, AM4 platforms typically have three defined limiters: Package Power Tracking (PPT), which indicates the power threshold that is allowed to be delivered to the socket; Thermal Design Current (TDC), which is the maximum current delivered by the motherboards voltage regulators under thermal limits; and Electrical Design Current (EDC), which is the max current at any time that can be delivered by the voltage regulators. Some of these values are compared to metrics derived internally in the CPU or externally in the power delivery, to see if these limits have been triggered.

In order to calculate the software-based power measurement for which PPT is compared to, the power management co-processor takes the value of current from the voltage regulator management controller. This isn’t an actual value of current, but a dimensionless value (0 to 255) designed to represent 0 = 0 amps, and 255 = peak amps that the VRMs can handle. The power management co-processor on the CPU then performs its power calculation (power in watts = voltage in volts multiplied by current in amps).

The dimensionless value range has to be calibrated on a per-motherboard layout, based on the componentry used (VRMs, Controllers) as well as the tracing, the board layers, and the quality of the design. In order to get an accurate scaler value for this dimensionless range, a motherboard vendor should accurately probe the correct values and then write the firmware to use that look-up table in the system power calculations.

This means that there is a potential way to fiddle with the way that the system interprets the peak power value of the processor. Motherboard vendors can reduce this dimensionless value of current in order to make the processor and the power management co-processor think that there is less power going to the CPU, and as a result, the package power tracking (PPT) limiter has not been yet achieved, and more power can be supplied. This allows the processor to turbo further than was originally intended by AMD.

This has knock-on effects. The processor will be consuming more power, mostly in the form of increased amps, leading to more heat being generated and increased thermals. Because the processor is turboing further (by being allowed to draw more power than what the software is reporting) the processor will also perform better in benchmarks.

As The Stilt points out, if you are running a CPU with a base TDP of 105 W and a PPT value of 142 W, under normal circumstances you should expect to see 142 W power being reported by the CPU at stock settings. However, if the dimensionless current value is only 75% of its real-world current, then the real world power consumption is actually ~190 W, which is the 142 W value divided by the 0.75 factor. Assuming that none of the other limits have been hit (TDC, EDC), the processor will only report 75% of the original PPT power, causing a lot of the confusion.

Is it Out of Specification?

If we are considering PPT, TDC, and EDC to be the be-all and end-all of AMD specifications for power draw and current draw, then yes, this is out of specification. However, PPT by its very nature is going beyond TDP, so we get into this mysterious world of how to define "turbo", similar to what we’ve covered in detail with Intel.

As we’ve previously discussed, in Intel land, the peak power consumed while in a turbo mode is only provided by Intel to motherboard vendors as a ‘recommended value’. As a result, Intel chips will actually accept any value for that peak power limit, including reasonable values like 200 W or 500 W, but even unreasonable values like 4000 W. More often than not (and depending on the processor) a chip might hit other limits first; but for the high-end models, it is certainly worth tracking. Meanwhile the turbo duration, Tau, which defines how big the bucket of energy that Turbo can draw from, can also be extended: instead of the default of between 8 and 56 seconds, Tau can be drawn-out to what's effectively an infinite amount of time. According to Intel, this is all within specification, if the motherboard manufacturers can build boards that can provide it.

What Intel considers out of specification is when the CPU goes beyond the frequencies listed in the turbo tables for Turbo Boost 2.0 (or TBM 3.0, or Thermal Velocity Boost). When the processor runs above the frequency as defined by the turbo tables, then Intel considers this overclocking, and has no obligation to adhere to the chip's warranty.

The problem is that while we can try and transplant the same rules to the AMD situation, AMD doesn’t really use Turbo Tables as such. AMD processors work by attempting to offer the highest possible frequency given the power and current limits at any given time. As more cores are ramped, the power per core decreases, and the overall frequency decreases. We get into the minutiae of frequency envelope tracking, which can get more complex given that AMD can work in 25 MHz steps rather than 100 MHz steps like Intel.

AMD also uses features that push a chip's frequency above the turbo frequency listed on the specifications page. If you wanted to strictly argue about those being overclocking, then judging by the number on the box, it could very well be. AMD purposefully blurs the lines here, but the upside is often more performance.

Is My CPU At Risk?

To answer the big question right off the bat then, no, your CPU is not at risk. For regular users with enough cooling running at stock frequency, there is no issue to any degree that will matter within the expected lifetime of the product.

Most modern x86 processors come with either a three-year warranty for retail boxed parts, or are sold as OEM parts with a one-year warranty. Past those support periods, while AMD or Intel won’t replace the processor in the event of failure, most processors are expected to live well into the 15+ year range. We are still very happily able to test old CPUs in old motherboards, even though they have gone out of service for a long time (and more often than not, it is the old motherboard capacitors that tend to blow up, not the CPU).

When a CPU wafer comes off the manufacturing line, the company get a reliability report about those processors, which helps get a sense of potential avenues for binning those CPUs. This will include elements such as voltage/frequency response, but also as it relates to electromigration.

Aside from physical damage, or thermal limits being disabled and the CPU cooking itself, the main way for a modern processor to become non-functional is through electromigration. This is the act of electrons making their way through the wires on a processor and ever so slightly bumping into the silicon (and other elements) in that wire to move them out of the crystal lattice. It is in itself a fairly rare event (how long have your wires been in your house, for example), however at the small scale it can affect change in how a processor works.

Adapted From "Electromigration" by Linear77, License: CC BY 3.0

By moving a metal atom from a wire out of place in a crystal lattice, the cross-section of the wire, at that point, is reduced. This increases the resistance, as resistance is inversely proportional to the cross-sectional area of the wire. If enough silicon atoms are moved out of place, the wire disconnects and the processor is no longer useable. This also happens in trasistors, and is commonly referred to as transitor aging, with the transistor needing a higher voltage over the lifetime of the product (voltage drift).

The amount of electromigration increases under certain conditions – temperature, use, and voltage. One of the main ways to get over the increased resistance is to increase the voltage, which in turn increases the temperature of the processor. It becomes a positive feedback loop, building itself for worse electrical performance, over the lifetime of the processor.

With higher voltage (energy per electron), and higher current density (electrons per unit area), this means that there are more chances for an electron migration event to occur. This can get worse at higher temperatures, and and all these elements act as different factors when it comes down to the amount of electrons that might have enough energy to enable an electromigration event. For anyone studying reaction kinetics, this is a similar principle to concentration but with a variable energy per incident.

So this is bad, right? Well, it used to be. As processor manufacturers and semiconductor fabs have iterated through the design of logic gates in CMOS and FinFET processors, there have been active countermeasures put in place to reduce the levels of electromigration (or reduce the effect of the levels of electromigration). As we shrink process nodes, and voltages decrease, it also becomes less of an issue – the fact that wires also decrease in area has the opposite effect. But as mentioned, the manufacturers now actively take steps to reduce the effect of electromigration inside a processor.

Electromigration has not been an issue for most consumer semiconductor products for a substantial time. The only time I personally have been affected by electromigration issues is when I owned a Sandy Bridge-based 2011 Core i7-2600K, that I used to use for overclocking competitions at 5.1 GHz under some extreme cooling scenarios. It eventually got to a point, after a couple of years, where it needed more voltage to run at stock.

But that was a processor I ran to the ragged edge. Modern day equipment is designed to run for a decade or longer. What we are seeing with these numbers, while there is an increase in thermals due to the increased power, isn’t actually a sizable shift. In The Stilt’s report, because the processor sees that it has extra power headroom, then it raises the voltage slightly in order to get the +75 MHz extra that the budget will allow, which increases the average voltage from 1.32 volts to 1.38 volts during a CineBench R20 run. The peak voltage, which matters a lot for electromigration, only moves from 1.41 volts to 1.42 volts. The overall power was increased 25 W, which makes for around 30A more. Not something on the order of a change in the order of magnitude.

So if I end up with a motherboard that adjusts this perceived current value, will it brick my processor? No. Not unless you have something else seriously wrong with your setup (such as thermals). Within the given lifetime of that product, and the next decade after, it is not likely to make a difference. And as stated previously, even if this did affect electromigration on a large scale, the processor manufacturers have built in mechanisms to deal with it. The only way to actively monitor it, as an end user, would be to observe your average and peak voltage values over the course of years, and see if the processor automatically adjusts itself to compensate.

It is perhaps worth mentioning that the dimensionless current value isn’t adjustable by the end user – it is something the motherboard controls through BIOS updates. If you are a user that overclocks, you are doing more towards electromigration than this adjustment ever will. For those concerned about thermals, then I suspect you are already monitoring and adjusting your BIOS limits as needed for your system.

How To Check if My Motherboard Is Doing It

First, you need to be running a stock system. Changing any of the regular PPT/TDC/EDC already means that the system is being adjusted, so we need to only focus on users dealing with stock systems.

Next, acquire the latest version of HWiNFO, and a test that will cause 100% load on the system, such as CineBench R20.

Inside HWiNFO, there is a metric called “CPU Power Reporting Deviation”. Observe that number while the system is at the full load. A normal motherboard should say ‘100%’, while a motherboard with an adjusted current/VRM reported value will say something below 100%.

Just to clarify, this metric is only valid:

If your AMD Ryzen CPU is running at full stock settings in the BIOS. No OC, no adjustments to power or current limits.
When your CPU is running at a full 100% load, such as Cinebench.

If your processor does not match these two requirements, then the value of the Power Reporting Deviation does not mean anything. If it says under 100%, then your motherboard is affected. Please let us know in the comments below.

What Are My Options?

If your motherboard is juicing the processor, but you are happy with the thermal performance of your cooler and the power draw at the wall, then enjoy the extra performance. Even if it is only 75 MHz.

AMD doesn’t necessarily need to comment on the matter, as this is an issue with the motherboard manufacturers. Users might want to probe their motherboard manufacturer, and ask for a BIOS update. Users who want to return their motherboards will have to check on their retailer, as it might depend on where it was purchased.

Given that while it appears to break PPT specifications, it doesn’t actually go beyond any frequency specifications (which are ill defined), it may be similar to how motherboard manufacturers play with power limits on Intel systems, which is to say that it's something that's "just there". Though it would probably be handy to get a BIOS option to enable/disable it.

143 Comments

View All Comments

Dug - Wednesday, June 10, 2020 - link
Can't seem to find 'Power Reporting Deviation' in any of the most recent HWiNFO64 builds.

Wonder why they took it out? Could it have been reporting wrong information?
silverblue - Wednesday, June 10, 2020 - link
There's a beta link on their site for v6.27-4190. I have v6.27-4185 which introduced the power reporting deviation, however since then they have identified that it didn't play ball with Zen and Zen+, and as such have released a new build.
silverblue - Wednesday, June 10, 2020 - link
(which fixes PRD for Zen and Zen+; sorry, I should've stated that at the time)
K_Space - Wednesday, June 10, 2020 - link
@Dug actually it's simply because you clicked on CPU. Instead go to Sensors and scroll down. You'll find Power Deviation there.
silverblue - Wednesday, June 10, 2020 - link
Ryzen 5 3600 at stock using original Wraith Spire cooler, Gigabyte GA-AB350-Gaming 3 with F50a BIOS, 2x8GB Patriot Viper Elite DDR4 3000MHz (CL16), using 1usmus Ryzen Universal power plan - Cinebench R20 MT score of 3403, but more importantly, Power Reporting Deviation of 132% to 135% throughout the run. I can only get it to 118% using CPU-Z (which is outside the scope of this test), so perhaps it's a result of very conservative calibration - after all, on this board, the VRMs don't have a heatsink.
abufrejoval - Wednesday, June 10, 2020 - link
I don't mind at all being able to overclock a CPU, or other parts of a PC.

But I do prefer to be given a choice.

Most of the time the noise and heat a machine generates will matter to me: Ideally I'd want the ability to tell a machine: "Don't use more than x Watts, make the most of it."

Perhaps I'd also want to say something like "Use up to 6 cores, but go full-in on Watts)", because I know that a game or piece of software won't scale further anyway.

I'd love to be able to do this at run-time and I'd really want to be sure, that these limits are not overstepped. And of course, this should work the same on Windows, Linux, BSD or Qubes.

These power limits shouldn't be limited to just the CPU either. Demanding to accomodate USB devices etc. might go a little far, but GPU and memory: That should be included in the calculations or measurements.

I keep rebuilding machines and I will reuse components that are still viable. I have Gold+ rated power supplies that I try to operate at around 80% rated performance for peak usage, but that requires for the computer components to stick to their ratings or settings.

Of course, I measure to make sure, because few things are as nasty as faults induced from borderline power, but I prefer to set limits instead.

I recently tried to put a 65 max Watt appliance together, using existing mini-ITX cases with Pico-ATX PS and external 12V power bricks, but equipped with ECC RAM and ideally 64GB of it. I wanted it to use 8 or more cores, go easy on clocks when loaded, but sprint to 4GHz on single threads, as long as it would never overwhelm the power supply.

It turned out almost impossible, because Intel doesn't stick anywhere near to 35 Watts when you want them to ... unless you buy a notebook instead.

I want run-time cTDP for all the major components (CPU, RAM, GPU for starters), within the limits that they already technically support, but not expose to user control: Is that so much to ask?
Oxford Guy - Wednesday, June 10, 2020 - link
What are these special measures that manufacturers put into place to reduce electromigration?

Where is the data? Let's see some charts.
eastcoast_pete - Wednesday, June 10, 2020 - link
This whole EM topic has once again become more the subject of what is apparently a religious war, and drawn attention away from a key point regarding Ryzen CPUs, also mentioned by Ian and explained in a good write-up by the Stilt (linked to in Ian's piece). In a nutshell, AMD CPUs rely on the MB to tell them how much power they are using, and then adjust accordingly. Unfortunately, at least two major MB makers have used that to boost performance by fudging the values sent to the CPU. Now, AMD CPUs are vulnerable to that because they outsource that function, but AMD doesn't condone the fudging. The solution to that is straightforward: Anandtech and other reviewers, please call the cheaters out, and AMD, please do likewise when it comes to certifying vendors. Lastly, if EM is a major issue of Ryzen chips, why hasn't there been a class action lawsuit here in the US? How many 1st and 2nd generation Ryzens have dropped dead? Lastly, is there software that can reliably discriminate between EM and other causes of performance drops and instability? I'd really like the answer to the last one!
Haltursson - Thursday, June 11, 2020 - link
I had a K6-2 350Mhz that i ran at 430 ish at v2,9 at the time with no issues during use, replaced it after two years with a K6-III 450 i think and in removing it, the whole chip turned to dust as i removed it from the socket....
Big Nish - Thursday, June 11, 2020 - link
CPU: Ryzen 5 3600
Motherboard: ASUS X570 Crosshair VIII Impact
BIOS: 1302
Cinebench Version: R20.060
Cinebench Score (Multihread): 3669

HWiNFO Readings during run:
All core boost: 4050Mhz
CPU Die temp: 81C (Cooler - Fractal Design Celsius+ S28 Dynamic set to Auto)
CPU Package Power: 90.164W
CPU PPT: 88W
Power Reporting Deviation: 91%

This seems to indicate that ASUS has been a little loose with its lookup table.

Electromigration: Why AMD Ryzen Current Boosting Won't Kill Your CPU

The Old Fashioned Way: Spread Spectrum, MultiCore Enhancement, PL2

Tweaking AM4 Above and Beyond

Is it Out of Specification?

Is My CPU At Risk?

How To Check if My Motherboard Is Doing It

What Are My Options?

Related Reading

Post Your Comment

143 Comments

View All Comments

Dug - Wednesday, June 10, 2020 - link

silverblue - Wednesday, June 10, 2020 - link

silverblue - Wednesday, June 10, 2020 - link

K_Space - Wednesday, June 10, 2020 - link

silverblue - Wednesday, June 10, 2020 - link

abufrejoval - Wednesday, June 10, 2020 - link

Oxford Guy - Wednesday, June 10, 2020 - link

eastcoast_pete - Wednesday, June 10, 2020 - link

Haltursson - Thursday, June 11, 2020 - link

Big Nish - Thursday, June 11, 2020 - link

Log in

Don't have an account? Sign up now