AVX offset measured on single cores?!

Timur Born

Well-Known Member
Hello.

As far as I know the AVX offset is applied to all cores at once, even when only a single-threaded (1 core) thread invokes it. So when a single core runs a constant AVX load then all cores are clocked down according to the offset.

Example: 5 Ghz all cores overclock, AVX offset 1 = multiplier x49 on all cores when any AVX load happens.

I can reproduce this using a single thread of P95 AVX load with a CPU affinity being locked to a single core.

So far so good. But why does HWinfo display AVX offsets happening on single cores regularly then? For example, when I put constant x50 (P95) *non* AVX load on all cores that sporadically gets interrupted by AVX load (like anti-virus, indexing and such) then HWinfo displays a lower offset multiplier (x49) on *single* cores instead of all cores. This is with all C-states (including C1E) being disabled and thus confuses me.
 
HWiNFO shows what it observes. Exact AVX ratios depend on CPU family, check the actual Fused and Resolved values in the main window under the CPU node.
 
I manually set the core and AVX ratios/offset, so that part is covered. I should clarify that I meant HWinfo's monitoring. It is reporing single cores to use an AVX multiplier when I would expect all cores to drop the same once any AVX load is happening. So when my CPU is set to use x50 on all cores with an AVX offset of 1 then HWinfo reports single cores to drop to x49 when only occasional AVX load is happening. But as far as I know the AVX offset always applies to all cores at once?!
 
As I stated earlier, that depends also on CPU. Some have a single offset for all cores, while others can have settings depending on number of cores utilized.
That's why I proposed the above.
I'm also wondering how do you know what type of workload the other applications (like anti-virus, indexing and such) are using and if it's really AVX.
 
Thanks for the quick replies. I am using a 9900K, for these tests the CPU is clocked to 5 GHz on all cores with an AVX offset of 1 (4.9 GHz). There is no other throttling happening. I disabled all C-states, disabled Speedshift (via Windows power profile's "Autonomous Mode" setting) and set minimum clock-rate to 100% (via power profile setting).

Here is running a single thread of Prime95 AVX load on a single CPU core affinity (core 7). As you can see this single core's permanent AVX load results in the AVX offset being applied to all other cores, too.

AVX_P95_AVX_single_core.png

Here are 16 threads of Prime95 *non* AVX load running on all cores. As you can see HWinfo reports the AVX offset being applied to single cores intermittently, which seems to contradict the findings of the first test/screenshot.

AVX_P95_non_AVX.png

I'm also wondering how do you know what type of workload the other applications (like anti-virus, indexing and such) are using and if it's really AVX.

By closely watching which processes create CPU load when HWinfo reports the AVX offset dip and by doing some simple tests.

Here is a comparison between mostly being idle and running a Symantec Antivirus Quick Scan.

AVX_Symantec.png

Noteworthy: Symantec's online scanner is less active than Microsoft Defender, which I found to be invoking the AVX offset more often.

Here is a comparison between running Windows' file indexing and pausing it.

AVX_indexing.png
 
What does the Effective Clock report? That might be a better measure of the CPU dynamics.
 
More or less the same thing when I apply full load, else they read 0.x to 1.x GHz on most cores, so they generally of less use to demonstrating AVX load offsets happening.

I did a different test setup, though, that revealed that the multiplier readings don't agree with the effective readings under full load. This is running one instance of P95 *non* AVX on all cores + a second instance of P95 AVX load on a single core (including single core affinity).

AVX_P95_AVX_single_core_vs_non_AVX_all_core.png
 
Thinking about the last post made me realize that with 17 threads of full load being shared on 16 logical cores there is context switching happening. And using only a single decimal digit for "Effective" means that it misses the switching while the Multiplier keeps rounding up and down because of the switching.

I now added two decimal digits to both readings to get a better picture. This should also somewhat remedy my original concern, because the average reading will then at least indicate that Multipliers are switching faster than what HWinfo can display (even at 100 ms). At this point I put the different multiplier readings down to sample-rate and the multipliers being read in a serial fashion one after another instead of all at the same time.

One feature request: Please allow to customize several values at once. Going through every Effective and Multiplier value is quite tedious for CPUs with many cores. ;)
 
Last edited:
Thanks Martin. You are right about the switching frequency, but I still expected *all* core multipliers to be lowered at once during mixed load when AVX load hits any single core, as demonstrated to happen in my constant AVX load test. I did not consider that HWinfo likely reads those multipliers one after another, so when it gets to read the next value the corresponding multiplier it can already have switched in a non-constant load scenario. So the software is not to be blamed, although the readings still suggest something else to happen (single cores switch to the offset) than what really happens (all cores switch to the offset at once). There are limits in software, I do understand that.

Effective clock is of little use for AVX offset measurements in scenarios without constant loads. When I get reading below 3 GHz as "effective" clock then I don't know anything about AVX offset. These counters only help with AVX offset when constant load is applied. Anyway, using 2 decimal digits helps get a better picture with both "Effective" and Multiplier (average) readings.

There is no performance limiting happening in my AVX specific tests, I made sure of that.
 
Last edited:
Back
Top