Dealing with Hyperthreaded Cores in CPU Usage Stats

I have HWinfo for my PC and something called iStat Menus for the Mac. Both my Mac and PC have hyperthreaded cores, and this makes the CPU percentage reading seemingly inaccurate.

For instance, with the HT cores enabled, it's pretty hard to get the total CPU reading above 50%, even when all of the physical cores are running at 100%. Reason-being that, no matter how hard certain threads are working, the HT cores bring the average way, way down, unless something really multi-threaded well like Handbrake is running. Given that HT cores aren't *really* additional CPUs, counting them as full-fledged CPUs mangles the total usage figures.

I always thought for the purposes of CPU monitoring, a better reading would be to only calculate physical cores for 0-100%, while allowing HT to push above 100% to have a potential reading of 130%, or whatever. (I read somewhere where full saturation of physical + HT cores, under very favorable conditions, can net UP TO a 30% increase. Rarely is it a 100% increase, though, and this is what counting HT cores as full procs implies.

Has anyone come up with a good metric + customized multiply/add that you feel gives you a better idea of CPU usage for HT-enabled CPUs?

I'm sure this would be a ton of work, but just thinking off the top of my head, how hard would custom metrics be to add someday ie. CustomerMetric = (T0 core totals added together/ physical CPUS) + ((T1 core total added together / Phys CPUs) * .3)

Sorry, not trying to set a ridiculous bar, what we have already is awesome, just a thought every now and then ;-)
 
Somewhat related, is T0 or T1 on the core considered a "primary" thread, or do each have the exact same priority? This would make figuring out which thread is the "hyperthreaded" core somewhat moot.
 

Zach

Well-Known Member
Have in mind though that not on all CPUs that +30% for logical cores is true. On some could be 20%, on others could be 35%. And to make it even harder for you, one CPU don’t always behave the same. On some workloads the benefit from SMT could be +10%, and on some other could be 40%, and even some other could be 0%.

In any way, this kind of measuring (default of HWiNFO) is counting the utilization of physical and logical cores. And you are talking about performance. Total different things.

SMT (or HT as you call it from past) exists because in 99% of workloads, software threads, are not capable to 100% utilize a core. That’s why most of times there’s room for another thread to run through a physical core.
And your question is... “Then why does it say 100% if there’s room for a second thread?”
This is deep programming stuff that I can’t explain well.

Lets say that CPUs have multiple sets of internal calculators doing(calculate) different things. Along with predictors, decoders and a lot more.
If 1 thread is fully utilize only one set of these calculators and partially utilize a different one inside a core, then this core would be 100% loaded but there would be room for another thread to run inside to take what’s left and maybe steal a little from 1st thread too.

I know it’s a stupid explanation but I hope it makes some sense...
 
Question if I may. I've been trying to get this one working:

((("Core 0 T0 Usage"+ "Core 1 T0 Usage"+ "Core 2 T0 Usage"+"Core 3 T0 Usage"+"Core 4 T0 Usage"+"Core 5 T0 Usage"+"Core 6 T0 Usage"+"Core 7 T0 Usage")/8) + (("Core 0 T1 Usage"+"Core 1 T1 Usage"+"Core 2 T1 Usage"+"Core 3 T1 Usage"+"Core 4 T1 Usage"+"Core 5 T1 Usage"+"Core 6 T1 Usage"+"Core 7 T1 Usage")/8)))

I've moved things around, added, removed parenthesis, etc, but when I run Handbrake taking all cores/threads to 90%-ish, I only see 100-110% on the sensor.

This is on a i9-9900K 8/18 thread CPU.

If I take either side of T0/T1 equation, it works flawlessly, but trying to add them together is what fails.

I'm basically trying to get the average of Thread 0s:

(("Core 0 T0 Usage"+ "Core 1 T0 Usage"+ "Core 2 T0 Usage"+"Core 3 T0 Usage"+"Core 4 T0 Usage"+"Core 5 T0 Usage"+"Core 6 T0 Usage"+"Core 7 T0 Usage")/8)

The average of Thread 1s:

(("Core 0 T1 Usage"+"Core 1 T1 Usage"+"Core 2 T1 Usage"+"Core 3 T1 Usage"+"Core 4 T1 Usage"+"Core 5 T1 Usage"+"Core 6 T1 Usage"+"Core 7 T1 Usage")/8)


And add them together. So if T0 average is 80% and T1 average is 50%, the sensor will read 130%. The scale for the sensor is 1-200%.

I like this better than T0[1-7] + T1[1-7] / 16 as it seems to more accurately reflect what's going on, since 100% on all 8 T0 cores feels closer to 100% load than 50%.

Thanks for any insight.
 

Martin

HWiNFO Author
Staff member
I'm sorry, but I don't understand what you're trying to achieve with the avg(T0) + avg(T1) result. You're summing up 2 average values, which is a different math than the total average of all cores/threads.
 
Hi Martin,

That's correct, it is different math and purposely so.

A machine with 100% usage on T0, but 0% usage on T1, will read 50% on the CPU meter. If all T0 threads are at 100%, however, chances are that the machine is, for practical purposes, running at more than 50% of it's potential. I've hated this for 10 years, as it feels quite inaccurate, and wished that utlitiy makers could design a better metric. Hyperthreading essentially halves your usage #s.

I know that it's highly variable, and that there is no "right" answer, but highly-optimistic predictions for Hyperthreading will give a 30% boost. Rarely does it double.

So, 100% T0 usage is, most likely, close to 100% CPU utilization for the practical purposes of gaining extra speed, with anything that hyper-threading adds simply gravy.

That's why, I want my 0-200% scale, with T0 values determining the main scale, and T1 being added to it.

Instead of each core, both T0 and T1s having an equal share of a 100% scale ... this gives me a more realistic 100% scale, yet allows me to overshoot should the situation calls for it.

What I've found via testing is that when I'm running something heavy, CPU usage will settle somewhere around 100% ... instead of 50% with the current scheme. If the hypertheading is very efficiently used, I'll exceed it and get up to 120%-130%.. Anything over 100% can be considered a "bonus" or "turbo" or whatever. Note that you could also do sum(T0) + sum(T1) *.30. I'm still playing with the formula a bit, but the goal is to not give hyper-threading full input parity.

sum(C[0-7][T0]) / 8 + sum(C[0-7][T1]) / 8 is a far better representation of what's going on CPU-wise then sum(C[0-7][T0][T1]) / 16 ... and those two totals can vary quite a bit at times.

Try it out sometime and tell me what you think.

I still have to mentally add them together via two separate sensors, though, becuase I've been unable to make it display as a single sensor.
 
Last edited:
<b>
Hi there...
I think you are confusing thread usage with "thread performance".
</b>

Unsure what you mean. CPU is 8 cores, proc is hyper-threaded showing x2 cores to the OS. This is represented as T0 and T1 for each core in HWInfo (ostensibly a thread for each virtual core of execution).

While it looks the same to the OS, this is not functionally the same as having 16 physical cores. This being the case, CPU usage is often misleading in hyperthreaded machines. At least it has been in mine.

Each individual core (or thread if you prefer) is measured by what percentage of cycles is executing the Idle process. At least that's my understand to how it works in Windows, and I am far from a Windows expert. Since Virtual cores receive a subset of a core's cycles, they cant do as much work per slice of time. It can under curtain circumstances, but usually it cannot.

I think "thread performance" is moot in this case, unless one wants to attempt to write algorithms for each combination of applications.
 

Zach

Well-Known Member
Apparently you didn't catch it.
Why do you want to measure thread usage differently between T0 and T1(between physical and logical threads)??
Usage is usage... When an app can use all available resources, then it does use them all at 100%.

I very well know that when a CPU has SMT (SimultaneousMultiThreading) that doubles (x2) the available (to OS) threads, the performance is varying between x1~1.3 depending the app.
It's nowhere near the x2.
So an 8 physical core CPU is like having at best +2cores. But this is performance and not usage. And you cant measure differently usage between physical and logical threads.
This is what I mean.
 
Top