Is HWiNFO causing the dreaded WHEA-Logger Event ID XX Cache Hierarchy Errors and sudden reboots on AMD Ryzen systems?

Jackalito · Feb 2, 2021

Martin said:
Thanks for your feedback.
The GPU was my suspect as well and you both guys have the Navi21 (RX 6800 XT). HWiNFO version 6.40 added enhanced support of these GPUs, so maybe it has something to do with that.
Would be interesting to see if the problem happens with the GPU sensor monitoring disabled in HWiNFO.

I'm going to try that next. I've just disabled the GPU sensors (by pressing Del) and I'm going to give it a go! Hopefully, we'll find the root of the problem soon!
Thank you eveyone for your feedback.

Martin · Feb 2, 2021

OK, so here the mentioned new build: www.hwinfo.com/beta/hwi64_643_4362.zip
The Navi21 seems to have introduced some new low power mode and I assume this was causing issues in HWiNFO. A similar case like long ago with the ULPS mode.
This build might also work better in idle state so that values should not be greyed out as much as before (which meant HWiNFO wasn't able to read valid data from sensors).
Initially in idle state there might be some grey values, but it is expected to work better as soon as HWiNFO catches up.
Please try and let me know whether the problem persists or is fixed perhaps. Any observations are welcome.

Zach · Feb 2, 2021

Bloot said:
Thing is, I was using hwinfo for a while and it showed all of my 6800 XT sensors without issues, It started happening on the builds released on early January (maye first or second week). Maybe it's related to the added memory junction sensor on the RTX 3000 series? The 6800 XT also showed this sensor since I bough it, maybe it's interfering with the new one?

I don't know, I will test 4.60 an let you know if this problem reproduces or not.

I believe somewhere at that point HWiNFO 6.40 (30/12/2020) added enhanced monitoring on Vega and Navi1/2 GPUs about power/thermal monitoring and limits. Maybe that was the start of the issue in conjunction with new GPU drivers.

Martin · Feb 2, 2021

Zach said:
I believe somewhere at that point HWiNFO 6.40 (30/12/2020) added enhanced monitoring on Vega and Navi1/2 GPUs about power/thermal monitoring and limits. Maybe that was the start of the issue in conjunction with new GPU drivers.

Yes, this is my assumption as well. Things wouldn't have to go so hard if AMD wouldn't "forget" to send a Nav21 sample this time...

Jackalito · Feb 2, 2021

Martin said:
OK, so here the mentioned new build: www.hwinfo.com/beta/hwi64_643_4362.zip
The Navi21 seems to have introduced some new low power mode and I assume this was causing issues in HWiNFO. A similar case like long ago with the ULPS mode.
This build might also work better in idle state so that values should not be greyed out as much as before (which meant HWiNFO wasn't able to read valid data from sensors).
Initially in idle state there might be some grey values, but it is expected to work better as soon as HWiNFO catches up.
Please try and let me know whether the problem persists or is fixed perhaps. Any observations are welcome.

I will try the new beta as soon as I can and I'll report back my findings.
In the meantime, after disabling the GPU sensors for Navi21 in HWiNFO v6.42 Build 4360:

EDIT: By the way, I'm getting this upon launching the new beta. What was the recommendation in this case as I don't recall it?
Thanks!

Jackalito · Feb 2, 2021

Martin said:
Yes, this is my assumption as well. Things wouldn't have to go so hard if AMD wouldn't "forget" to send a Nav21 sample this time...

That's so unfair. In my humble opinion you should have been one of the first people in getting a sample of the new hardware

Martin · Feb 2, 2021

The ASUS EC sensor is OK in most cases, but it's unpredictable to know whether it might cause issues. Especially problematic is to run this with some other monitoring tool like ASUS AI Suite.

Jackalito · Feb 2, 2021

Martin said:
The ASUS EC sensor is OK in most cases, but it's unpredictable to know whether it might cause issues. Especially problematic is to run this with some other monitoring tool like ASUS AI Suite.

Alright, fair enough. Thanks

Bloot · Feb 2, 2021

Zach said:
I believe somewhere at that point HWiNFO 6.40 (30/12/2020) added enhanced monitoring on Vega and Navi1/2 GPUs about power/thermal monitoring and limits. Maybe that was the start of the issue in conjunction with new GPU drivers.

I doubt the drivers has anything to do, as it happens with any drivers released from the date the 6800 series launched onwards. And it did not happen to me before January, don't know which hwinfo build was I using, but I guess it was 6.40 which in version history says it was released on Dec-09-2020.

I've been testing 6.40 and it seems to not cause this crash, for now at least.

Will test the new build soon.

Greetings. And many thanks for the support and interest on resolving this issue

eXSiR80 · Feb 2, 2021

Hello,
This is my first post here and I just registered to say only one thing.

HWInfo developers, please go and fuck yourself.

I have been using this shit for 2 months.
I have wasted my all time day and night figure out why my pc kept resetting.
I disassembelled 3 times my pc and tried every possible combinations for fucking last 5 days.
Now you are telling us it is a bug.
Congras with this big news.

Shit happens? Well of course but releasing an untested bullshit is just kidding with users.
By the way, OCCT was also crushing, I am thinking why? Because it is using same engine of yours? Maybe?

Fuck HWInfo. I am beyond being angry when I saw this.

Ryzen 5 3600 (with auto OC from BIOS)
TUF Asus B450M Pro Gaming
G.Skill 8x2 GB 3600 Mhz RAM
ASUS Radeon RT 5500 RX 8 GB
Thermaltake 600 Watt

For your information.
Every time, when my BOSD without blue screen, HWInfo was on.
Without installing it and using windows everything was perfect (now I just noticed).
Disabling GPU or unstalling driver and running stress test, everthing was OK.

Whenever using OCCT or HWInfo, my was crashing within 2 hours top.

02.02.2021, started Prime95 blend testing, without HWInfo.
We will see what will happen.

Martin · Feb 2, 2021

I can understand your frustration, but things would be different if AMD would be more responsive or at least provide a hardware sample.
But their total paranoid secrecy doesn't help anyone and this is one of the results. I have been telling them this for a long time, they know it, but don't do anything.
Moreover only marketing is what counts for them, they treat engineers like ....
Anyway, this "issue" is still not confirmed and we're waiting for more test results.

eXSiR80 · Feb 2, 2021

Martin said:
I can understand your frustration, but things would be different if AMD would be more responsive or at least provide a hardware sample.
But their total paranoid secrecy doesn't help anyone and this is one of the results. I have been telling them this for a long time, they know it, but don't do anything.
Moreover only marketing is what counts for them, they treat engineers like ....
Anyway, this "issue" is still not confirmed and we're waiting for more test results.

Hi Martin,
I am sorry by using this kind of language but as you pointed, I am really frustrated.
For more detailed information about my problem, please follow below (Please keep in mind, during every single reboot, HWInfo or OCCT were on).

I have started to use HWInfo for last 2 months and if I remember correctly, before the last update, I faced a few resets, but I am not %100 sure.
I installed latest version on 26.01.2021. Since then, I was using previous version.
During especially web browsing or watching youtube, my pc reboots, even within 30 min to 6 hours.
During my trials, I reinstalled windows over and over. But one time I just used the latest AMD official chipset driver without GPU driver and started a stress test with OCCT and Prime95. It was OK, no reset. But after GPU driver installation, problem was reooccured.
I once used Ubuntu 20.04 from live USB and no reboot for 2 hours then I went back to Windows 10 with HWInfo on (which was autorun mode), reboots started again.
I did all test RAM, CPU and VRAM out of windows 10 and no problem was detected.
3 times I disassembled my pc, change thermal paste, used stock cooler or 3rd part one, unplugged every single part of PC except CPU RAM SSD and GPU. Always reboots.
Every single reboot reported as WHEA 18 with different ACPI numbers. With latest GPU driver radeon wattman failure was also reported.
Tried every single BIOSes starting with 07/2020 (when I bought my pc) including beta once, always reboots.
I tried all bios combinations including DCOP off/on, DRAM 2133/3200/3600 with FLCK auto/non-auto, CBP off/on, PBO on/off, C-State changes, Resizable Bar on/off, Fast Boot on/off ... etc. (You get the point).
I always used only one monitoring app during tests, never HWInfo and Ryzen Master at the same time. I was aware of possible conflict which might cause wrong reporting.
I used OCCT (CPU, RAM and VRAM tests) and Prime95 (with Large FFTs or Blend).
While using OCCT, HWInfo was off since OCCT has builtin HWInfo engine, always reboots within 1 hour top. Only situation not rebooted was one trial without GPU installation previously mentioned.
During Prime95 usage, HWInfo was always on (If I am not wrong, just once I did not use HWInfo and test was successful at the end of one hour).
I never suspected HWInfo usage may cause this kind of issue.

For last hour while writing this post, I have been running Prime95 blend test without HWInfo and just Ryzen Master is on for monitoring purposes, no reboot/ reset (I hope it will continue like this).
Please find my below detailed hardware information for further investigation.

Current Hardware;

AMD Ryzen 5 3600, 4150 MHz (41.5 x 100)
Asus TUF B450M-Plus Gaming (1 PCI-E x1, 2 PCI-E x16, 1 M.2, 4 DDR4 DIMM, Audio, Video, Gigabit LAN) Bios version 2409
G Skill RipjawsV F4-3600C18-8GVK 18-22-22-42 (CL-RCD-RP-RAS) / 64-631-469-289-9-4-44 (RC-RFC1-RFC2-RFC4-RRDL-RRDS-FAW)
ASUS Radeon RX 5500 XT 8GB EVO OC (with default settings), 27.20.14501.28009 - AMD Adrenalin 2020 20.12.1 WHQL
ATI Radeon HDMI @ AMD Navi 10 - High Definition Audio Controller
Realtek ALC887 @ AMD K17.7 - High Definition Audio Controller
CT240BX500SSD1 (240 GB, SATA-III)
WD Elements 2621 USB Device (931 GB, USB)
WDC WD1002FAEX-00Y9A0 (1 TB, 7200 RPM, SATA-III)
Asus VS278 [27" TN LCD] (H4LMQS029644)

BIOS Settings (Currently Testing)
DOCP on with 3200 Mhz Ram speed and FLCK 1600
Thermal Throttle Limited to 70C (safety purposes)
CPB, PBO, fMax enhancer, all are on
Resizable Bar on
fTPM on
SVM on
Secure Boot off (default)

Martin · Feb 2, 2021

Thanks for the detailed report, I'm checking this with AMD right now too..
Can you please test this build: https://www.hwinfo.com/forum/thread...-reboots-on-amd-ryzen-systems.7041/post-28820

Jackalito · Feb 2, 2021

Martin said:
Thanks for the detailed report, I'm checking this with AMD right now too..
Can you please test this build: https://www.hwinfo.com/forum/thread...-reboots-on-amd-ryzen-systems.7041/post-28820

That's great to hear, and I hope they are being helpful.
Testing the new beta release you've shared with us. So far, so good.

Bloot · Feb 2, 2021

So it's been plenty of hours with hwinfo 6.40 stable build and my system has not crashed or rebooted. With 6.42 and previous beta builds it was crashing 2-3 times a day.

I'm going to test the new build.

eXSiR80 · Feb 2, 2021

Please be informed Prime95 test without HWiNFO, 90 mins and no reboots.
Normally it would not go that long.
Since I tinkered OS too much, I will reinstall windows 10 and try new beta with the latest drivers (GPU driver was updated today btw).

Bloot · Feb 2, 2021

I just realized I was talking all the time about 4.60 build when I meant 6.40 build

Sorry I corrected that on all my posts.

Martin · Feb 2, 2021

Thanks for the feedback, seems we're on the right track.
While build 4362 that I posted today is expected to provide a 99.9% fix for this issue, I have just received a suggestion from AMD that should be 100% safe.
I need to evaluate and test this option and will post a new build ASAP.

Jackalito · Feb 2, 2021

Martin said:
Thanks for the feedback, seems we're on the right track.
While build 4362 that I posted today is expected to provide a 99.9% fix for this issue, I have just received a suggestion from AMD that should be 100% safe.
I need to evaluate and test this option and will post a new build ASAP.

Awesome!
Thank you so much for working on this so promptly, Martin

Martin · Feb 2, 2021

If you have any more reports from build 4362 meanwhile, they are welcome...

Is HWiNFO causing the dreaded WHEA-Logger Event ID XX Cache Hierarchy Errors and sudden reboots on AMD Ryzen systems?

Member

HWiNFO Author

Well-Known Member

HWiNFO Author

Member

Member

HWiNFO Author

Member

Member

Member

HWiNFO Author

Member

HWiNFO Author

Member

Member

Member

Member

HWiNFO Author

Member

HWiNFO Author

Similar threads