Yep that's what Im trying to figure out at the moment, I actually have another suspicion, I was using nicehash randomly at the start of may and noticed that after mining if not restarted the GPU would be stuck in 'compute' mode which it automatically switches to(my best guess) as there's no driver setting to enable or disable it, this would affect ReLive recording bitrate & target framerate(so a 60fps recording would come out at ~48fps until reboot), there's a possibility that 'compute mode' was having some issue with HWINFO + ULPS + core parking(that can only be viewed through ryzen master) and maybe resizable bar as the crash frequency _might_ coincide with times I left the PC running AFTER turning mining off and not rebooting, but I wasnt logging the exact dates AND I also had the ~6.4 HWinfo until the 18th which is a known issue, the hierarchy crashes did persist after the 18th & I believe I did run nicehash a couple of times around those dates, Nicehash logging only keeps a week of history & NH had been running hours earlier on the errors of 24th & 25th,, the last time I used the software was on the 25th at ~3am but this still doesnt help prove anything yet.
Obviously I'd have to re-enable one thing at a time minimize variables, Nicehash was the only abnormal use-case combined with HWinfo 'launch on startup' & running in notification tray that might explain or at least have contributed to the random crash behaviour so it's worth mentioning, still,, I
was using Nicehash for months prior without crashes prior to switching HWinfo to start with windows,
I would normally launch HWinfo manually before the month of May after startup so this seems to be a key factor,,, ULPS was on default since getting the GPU in March.. I believe there is an interaction/conflict happening somewhere. MSI Afterburner has also been booting with the system as-well since I was using HWinfo RTSS OSD in conjunction over the previous few weeks but other than those few applications the startup tab is minimal incase the question comes to mind 'how bloated could my system be'.
Here's a few task manager screenshots:
Startup:
https://i.postimg.cc/qMWfD2Kt/image.png
CPU Tab:
https://i.postimg.cc/cJMVJjbY/image.png
Processes sorted by CPU:
https://postimg.cc/XZ2v3YTk
Sorted by Memory:
https://postimg.cc/kVRHMjHY
Could 'low-level access interface' in afterburner conflict with HWinfo if set to Kernel mode? I have it on user at the moment & I honestly dont know how significant this setting might be.
On the plus side, I have done a few ~3hr videos using HWinfo
normally with no crashes, but always turn it off when done recording, so I'm using it like I used to prior to May & had always been using it this way for months prior without problems, the main reason I started booting it with windows was to show VRAM temperatures in the RTSS/Afterburner OSD as I'd sometimes forget to turn it on.
So this was my standard HWinfo usage 2 months prior, I'd use it when stress-testing or doing a driver benchmark test(small series on my channel) but kept it off majority of the time:
And this is when I started running HWinfo at startup, at first not with OSD but for background monitoring of VRAM temperatures, then I added it to RTSS for OSD integration in gameplay recordings, exclusively in May, so it was launching at startup sometime around first week of may when the issue was frequently occuring, after updating to v7.04 on the 18th, it's interesting that there was no crash until the 24th, so the randomness is very hard to be certain of anything without a good 2 weeks of testing, I also ran nicehash around these times though the crashes do not sync up with when NH was run,, 3-4am on the 24th & 12am-12pm on the 25th(this is why I suspect the 'stuck compute mode' possibility), I
did confirm that the nicehash timeline matches my system clock so those times are accurate, if only it went back further than a week.
Basically all I can do for now, after another week of keeping HWinfo off, I'll then re-enable HWinfo 'start with windows' to see if it runs fine with ULPS off, as having them both on together could be something to do with it and thats how it was prior running HWinfo on startup when the issues began, ULPS ON by itself wasnt a problem for months prior. I also have to investigate if Nicehash compute mode may be causing a bug with Resizable bar & CPU core parking(I honestly have no idea HOW it would do that, but its another possibility), but the crash was no occuring with Nicehash alone which was used heavily in March & April, this is why I focused on HWinfo after stumbling across reddit threads on its involvement(though I do not believe HWinfo is directly to blame, there is an interaction or bug happening somewhere), for now I've kept Nicehash off to avoid dealing with too many variables at once.
It's also worth noting,
I have had MSI Afterburner starting with windows the entire time & there was also an update on May 13th to the current version I'm on now, both before and after the hierarchy errors appeared, Afterburner does take priority for my daily usage & OSD needs over HWinfo(though I love HWinfos detail), so just to make that clear, there may simply be a conflict with HWinfo & Afterburner starting with windows together? Currently Afterburner is 4.6.3 Beta 3, Rivatuner 7.3.2 Beta 2. Afterburner is still launching at startup as of now with no apparent issues by itself.
So to summarize a rough timeline over the past 3 months:
March & April:
Nicehash(heavily), system on 24/7 often, MSI Afterburner & Rivatuner at startup
/w ULPS untouched,
manually launching HWinfo64 ~6.42& leaving it running for 24hrs+ even overnight. No crashes.
System mostly used for gaming, relive recording/benchmarking, noticed the Nicehash Compute issue in regards to ReLive Framerate drop.
Agesa 1.1.0.0.
No Hierarchy crashes with all the above.
May:
- Turned HWinfo to 'launch at startup', used Nicehash much less but I cant verify the exact times, otherwise usage was basically the same with more 'idle' time when the first hierarchy errors appeared in the first week of May, at first I assumed it was stability related so spent ~2 weeks trying to address it bios side, increasing Vcore, SoC voltage etc.
- Updated AGESA May 10th, maybe it did, but the issue recurred 6 days later. I hate diagnosing intermittent issues, I WAS trying to reproduce the issue with no success.
- Updated Afterburner & HWinfo in May as well when I saw updates were available, 8 days until the next one.
- Increased Vcore again, had another crash the following day. Nicehash had been running on 24th & 25th, but HOURS beforehand & not at the time of the crashes, HWinfo was running at these times too.
- Stumbled across Reddit and this thread investigating HWinfo as a possible cause(& saw the 6.42 thing) & also read about GPU power being a possibility.
- So disabled both ULPS using Afterburner & turned HWinfo launch at startup
off. Currently crash free, day 4.
To top it all off Windows has also been pushing through its experimental updates that sometimes fix themselves, there was an update on May 16th, and May 24th & I have yet to investigate the possibility they also triggered crashing somehow. It may also be related to drivers beyond WHQL 21.4.1, as I've been testing performance of the 'optional' driver releases on this system, 21.5.1 & 21.5.2.
I still have yet to encounter another cache hierarchy crash as of this post & have been using the system rather heavily with good periods of 'idle' in between over the past few days. Any additional info just ask.