Core effective clocks at 100% C0 state

Timur Born · May 16, 2021

So let me simplify my main question: Are the effective clocks reduced by means of the multiplier changing quickly or by other means (with C-states disabled)? If it is the multiplier, shouldn't we be able to catch that unless the multiplier register does not report it?

Martin · May 16, 2021

The Effective Clock represents number of core clock cycles while the core is in C0 and increments at the rate at which the core is actively running.
You can create your own sensor for reporting per-CCD sum of core values via: https://www.hwinfo.com/forum/threads/custom-user-sensors-in-hwinfo.5817/

Timur Born · May 16, 2021

So at 100% C0 state the effective clock is lowered by the core being turned off in between, akin to low C-states (C1-C3 non deep)? Wouldn't that cause on/off current spikes? Does this reduce voltages or just stop the clock?

Thanks for pointing to custom sensors, that looks useful.

Martin · May 16, 2021

It reflects the AVERAGE clock during the refresh interval.
If the clock between (t) and (t - 100 ms) was 1000 MHz for 50 ms and 500 MHz for the remaining 50 ms, the resulting effective clock is 750 MHz.
If the clock between (t) and (t - 100 ms) was 1000 MHz for 50 ms and the core entered a stopped state (higher C-State) for the remaining 50 ms, the resulting effective clock is 500 MHz.

Timur Born · May 16, 2021

This is understood. The question is how is the clock lowered when all C-states are disabled? If it is by multiplier, should we not be able to catch that with a high enough polling rate and long enough measurement time?

Zach · May 16, 2021

How much of a difference are we talking about between discrete clock and effective clock when all higher than C0 state are disabled?

Martin · May 16, 2021

It should be by multiplier, but I wouldn't take the discrete clock values into account as those might not catch the right peak.

Timur Born · May 16, 2021

Difference varies by load profile, but in the screenshot on page 1 it is about 0.25 MHz on CCD0.

Core effective clocks at 100% C0 state

Hello. I wonder on what basis does HWiNFO calculate "Core Effective Clocks" to be lower than core multiplier based clocks when the CPU is 100% in C0 state? And why are "Core Effective Clocks" listed as considerably lower (300 MHz) for CCX0 on my 5900X when CCX1 is considered the worse CCX with...

www.hwinfo.com

Timur Born · May 16, 2021

Martin said:
It should be by multiplier, but I wouldn't take the discrete clock values into account as those might not catch the right peak.

But no effective minimums are measured at all by the multiplier sensors at all. At 50 ms polling rate and long enough measuring we should expect at least some multiplier drops to be measured even if those are done within 1-20 ms.

This seems to suggest that either the multiplier register (?) does not reflect the internal drops or that the drops are done my other means than multiplier (whatever "multiplier" means in technical practice). Forum people keep calling this "clock stretching", but the whole thing is rather whishy-washy on various forums. Thus my questions here to get more insight from HWiNFO's point of view.

Timur Born · May 17, 2021

Over at the Overclock.net forum, someone posted a link to the "Adaptive Clocking in AMD's Steamroller" article describing how AMD's clock-stretching works.

So based on even the small cost of C1 (latency + enter/exit power cost), the explanation of AMD's clock-stretching and our discussion here my current understanding is: "Effective Clocks" (or RyzenMaster) read out a CPU counter register, which in turn is affected by both (optionally) enabled C-states and clock-stretching. In the absence of C-states (or 100% C0 CPU load) only clock-stretching is accounted for.

This has some implications: when C-states are enabled then "Effective Clocks" are to be taken with a grain of salt for performance/vcore measurements, because the counter may account for pauses (C-states) instead of clock-stretching.

Martin · May 17, 2021

I heard about clock stretching before, but what applied to Steamroller might be different for Zen. I tried to gather more information about this even in higher-grade documents, but no luck yet.. Will try to check around even more...

Timur Born · May 17, 2021

The main reason why I am currently preferring clock stretching as an explanation for non C-state induced frequency limiting is that C1 likely is still slower and more costly than clock stretching. And even changing the multiplier could be slightly more costly, which may be the reason why AMD even came up with clock stretching in the past.

I found this: https://patents.google.com/patent/US8941420B2/en

In practice, latency (delay) can be incurred at each frequency transition as frequency-multiplier circuitry stabilizes the system clock at its new frequency following each frequency change.

This pertains to system with broad input frequency ranges, though, while our CPUs only have to deal with one input frequency (100 MHz).

Conversely, injection-locked oscillators exhibit fast lock times, but tend to have a narrow input frequency range and thus limited frequency agility.

So unfortunately still not entirely clear for those of us who don't know how frequency multiplying is implemented in modern CPUs.

Timur Born · May 17, 2021

Here are my custom "sensors" summing up the already existing "Core X Power" sensors. This more easily demonstrates how CCD0 consumes more power under the equally distributed load (P95) compared to CCD1.

Since CCD0's effective clock is lower while consuming more power I will add average effective clocks and multipliers per CCD for even better visualization.

PS: I was surprised about the 2 ms profiling time under load, which drops to 1 ms idle. Does the custom sensor not just sum up the already available values, but poll each existing sensor again?

Zach · May 17, 2021

I guess its common for 2-CCD ZEN3 CPUs to have differently binned chiplets.
Here is an example of a 5950X with ~25% difference in power consumption between the 2 CCD cores.

10c difference between CCD1 and CDD2 on AMD 5950x

I was planing af repaste of my cpu to se if there would be any difference - and to be honest - I could not remember if I had used my new Noctua NT-H2 paste on my own cpu or only on som machine I have build during the last couple of months... Doing testing afterwards - burn in if you can call it...

www.techpowerup.com

Martin · May 17, 2021

The custom sensor does not poll the sensors again, the increased time is most likely due to polling of registry data.

Timur Born · May 17, 2021

Thanks Martin. I am currently trying to create a "Clock Stretching" sensor that calculates C-states out of effective clocks, but polling latency and rounding errors seem to be a problem.

@Zach Yes, it's common, but one would expect the "better" CCD to be more power efficient, instead of the other way around. Anandtech also pondered about that.

https://www.anandtech.com/show/16214/amd-zen-3-ryzen-deep-dive-review-5950x-5900x-5800x-and-5700x-tested/8 said:
Some users might be scratching their heads – why is the second chiplet in both of these chips using less power, and therefore being more efficient? Wouldn’t it be better to use that chiplet as the first chiplet for lower power consumption at low loads? I suspect the answer here is nuanced – this first chipet likely has cores that enable a higher leakage profile, and then could arguably hit the higher frequencies at the expense of the power.

This seems to be a per CCD correlation, though, not a per Core one (my "worst" core is not the most efficient).

Timur Born · May 17, 2021

Is it expected that HWiNFO displays different effective clocks than RyzenMaster when cores are not 100% in C0 state? RM does not offer averages over time, so it's not easy to compare, but the current readings are different. Sometimes RM is higher, sometimes lower, but for a specific load it seems consistent (mostly higher or mostly lower, not shifting much).

Martin · May 17, 2021

It can be different if the Snapshot Polling mode is not enabled. If it is, then difference is only due to a different interval and fluctuations.

Timur Born · May 18, 2021

HWiNFO reacts stronger to C-states than RM (which hardly reacts at all). In the following screenshot and animated GIF (attachment) both are set to 1 second polling interval.

C0 Residency, Core 0 Effective Clock, RM Core 0

That being said, I even prefer that, but only if HI counts times of C-states as 0 MHz (zero). Else it's impossible to calculate out real times of clock stretching (non C-state induced). But it does not seem like HI does that, or does it?

Martin · May 18, 2021

Sorry, I can't follow up what you're asking.
HWiNFO is highly optimized for sensor polling and best results are obtained with the Snapshot Polling mode enabled. I also performed some additional optimizations for polling latency in the last version 7.04 released a few hours ago.

Core effective clocks at 100% C0 state

Well-Known Member

HWiNFO Author

Well-Known Member

HWiNFO Author

Well-Known Member

Well-Known Member

HWiNFO Author

Well-Known Member

Well-Known Member

Well-Known Member

HWiNFO Author

Well-Known Member

Well-Known Member

Well-Known Member

HWiNFO Author

Well-Known Member

Well-Known Member

HWiNFO Author

Well-Known Member

Attachments

HWiNFO Author

Similar threads