rabidsmurf
New Member
I have been troubleshooting an issue with random operating system freeze/lock ups on a new-ish (most parts are a couple years old) build. When this occurs most things become unresponsive, I can open task manager but the values don't update. I can open file explorer but I just get a blank window and a spinning cursor. If I have music playing it just repeats the same 2 seconds over and over, I don't ever get a BSOD. This typically happens under light/moderate loads like browsing/youtube. I am forced to hard power cycle the PC. The system event logs do not indicate any problems besides a generic Kernel Power event ID 41 error. Once or twice I have seen the dreaded WHEA-18 Cache Hierachy error (Usually not however). This occurs roughly every 4-6 hours, although I have seen multiple back to back freezes, particularly on boot when I had HWINFO and rainmeter configured to auto start.
I know this problem could be literally anything, but after days/hours of beating my head against a wall with different BIOS versions/settings/etc, swapping hardware around, running tests, memtest, diskcheckup, etc, I began to suspect the monitoring software I was using (at the time CPU-ID HWMonitor) was causing the issue. The reason for this was I got tired of troubleshooting and running tests, and just ran my system without any monitoring, and had zero crashes for over 36 hours, I then opened HWMonitor and within 30 minutes had a crash. I then decided to install HWINFO to help diagnose problems and also to feed data into rainmeter. I experienced multiple crashes in a 4 hour period playing with HWINFO. I enabled snapshot polling and shared memory, but otherwise the settings are default. I then grew very suspicious and decided to do a long term test without monitoring, I am now sitting at over 60 hours of uptime without a single issue, in varying loads, a lot of browsing/working from home/email, and quite a bit of gaming.
At this point I strongly believe I have ruled out hardware problems through various tests and process of elimination. I don't even think it's necessarily a problem with hwinfo/hmonitor but is possibly exacerbated by the polling it's doing, my suspicion is this is somehow related to BIOS/AGESA stability.
Prior to installing the board/proc/ram last week, I did not experience these issues at all (previously 6700k w/ ASUS z170).
I would like to get to the bottom of this, as I want to be able to use HWINFO, are there logs/tests or any other things I can do to help narrow this problem down?
System specs:
Gigabyte AORUS x570 Master rev 1.2 - New to this build - Bios F32 AGESA1.1.0.0 (problem was worse with beta F33j - AGESA 1.2.0.2 bios), default settings besides XMP.
AMD Ryzen 7 5800x - New
Corsair LPX 2x16gb 3600 RAM - New, actually swapped between intel and amd optimized ram kits (accidentally bought the wrong one) without any change to stability.
EVGA GTX1080ti FE - A few years old but has been rock solid stable for me.
Corsair ax850i - A few years old but has been rock solid stable for me.
HP ex920 1TB NVME - Boot drive - no SMART errors present
WD SN850 2TB NVME - Storage drive - PCI 4 -no SMART errors present
Windows10 PRO 20H2 - Fresh install, all updates applied
All drivers up to date.
I know this problem could be literally anything, but after days/hours of beating my head against a wall with different BIOS versions/settings/etc, swapping hardware around, running tests, memtest, diskcheckup, etc, I began to suspect the monitoring software I was using (at the time CPU-ID HWMonitor) was causing the issue. The reason for this was I got tired of troubleshooting and running tests, and just ran my system without any monitoring, and had zero crashes for over 36 hours, I then opened HWMonitor and within 30 minutes had a crash. I then decided to install HWINFO to help diagnose problems and also to feed data into rainmeter. I experienced multiple crashes in a 4 hour period playing with HWINFO. I enabled snapshot polling and shared memory, but otherwise the settings are default. I then grew very suspicious and decided to do a long term test without monitoring, I am now sitting at over 60 hours of uptime without a single issue, in varying loads, a lot of browsing/working from home/email, and quite a bit of gaming.
At this point I strongly believe I have ruled out hardware problems through various tests and process of elimination. I don't even think it's necessarily a problem with hwinfo/hmonitor but is possibly exacerbated by the polling it's doing, my suspicion is this is somehow related to BIOS/AGESA stability.
Prior to installing the board/proc/ram last week, I did not experience these issues at all (previously 6700k w/ ASUS z170).
I would like to get to the bottom of this, as I want to be able to use HWINFO, are there logs/tests or any other things I can do to help narrow this problem down?
System specs:
Gigabyte AORUS x570 Master rev 1.2 - New to this build - Bios F32 AGESA1.1.0.0 (problem was worse with beta F33j - AGESA 1.2.0.2 bios), default settings besides XMP.
AMD Ryzen 7 5800x - New
Corsair LPX 2x16gb 3600 RAM - New, actually swapped between intel and amd optimized ram kits (accidentally bought the wrong one) without any change to stability.
EVGA GTX1080ti FE - A few years old but has been rock solid stable for me.
Corsair ax850i - A few years old but has been rock solid stable for me.
HP ex920 1TB NVME - Boot drive - no SMART errors present
WD SN850 2TB NVME - Storage drive - PCI 4 -no SMART errors present
Windows10 PRO 20H2 - Fresh install, all updates applied
All drivers up to date.