WHEA Count

Ganesh_AT

Member
Martin said:
And BTW, the fact that those errors are marked as "Information" means they are of the correctable type. Uncorrectable ones should be marked as Error, but these obviously cannot be catched by an application during runtime, because they result in a BSOD (or reset).
For some pre-production samples, such errors can also be caused by an erratum in the CPU or BIOS that might be fixed later - either an erratum in CPU functionality, or in the MCA reporting logic.
Martin,

Yes, please send me a link to the new build as soon as you can.

The machine is in stock 10586 right now, and still producing the same WHEA error. I am planning to install Win 8.1 later tonight, so I can definitely test out before that.

On a side note, disabling DPS definitely gets down the CPU usage, but a process called 'System Interrupts' begins loading the system to the tune of 8 - 10% CPU usage. Hopefully, Win 8.1 will resolve the problem.
 

Martin

HWiNFO Author
Staff member
Please try this build: www.hwinfo.com/beta/hw64_513_2767.zip
It should show more counters depending on the error type. However I think I'll extend this soon to provide even more details about the error.
It would be great if you could send me the HWiNFO Debug File from that run, so I can check internally whether it works as expected.
 

Martin

HWiNFO Author
Staff member
I have extended WHEA counters even more, so that they reflect more detailed types of errors. I can pass a new build if you wish to run.

BTW, that high load in System Interrupts might be still caused by the same - a lot of WHEA/MCA errors hammering the system (each MCA error causes an interrupt).
 

Ganesh_AT

Member
Martin said:
I have extended WHEA counters even more, so that they reflect more detailed types of errors. I can pass a new build if you wish to run.

BTW, that high load in System Interrupts might be still caused by the same - a lot of WHEA/MCA errors hammering the system (each MCA error causes an interrupt).
Yes, please provide the link to the latest build. I will be testing after I get back home from work - another 3 - 4 hours.
 

Ganesh_AT

Member
Martin said:
Please try this build: www.hwinfo.com/beta/hw64_513_2767.zip
It should show more counters depending on the error type. However I think I'll extend this soon to provide even more details about the error.
It would be great if you could send me the HWiNFO Debug File from that run, so I can check internally whether it works as expected.

Now, I am getting the 'Machine Check Error (AMD64 NB)' count also. Indicates, as expected, that it is responsible for all the WHEA errors. Debug log (zip archive) attached.

[attachment=1770]
 

Attachments

Martin

HWiNFO Author
Staff member
Thanks. Sooo, that 'Machine Check Error (AMD64 NB)' is still not the precise interpretation of the error, it was rather a test version to see if my new code works. Meanwhile I have added a more detailed analysis of the errors received and will give you a new version in a few minutes.
But I had a look at the DBG file you provided and it tells me that the error is in fact a PCIe error which occurs on PCI Bus 0 : Device 28 : Function 7, which is the:
Intel Skylake PCH-H - PCI Express Root Port #8
Now you might want to check in the main window (will need to disable Sensors-only mode if used) - walk thru the PCI Bus, find that device and check what's attached there. According to the dump, the guilty device should be the: RealTek RTL8821AE Wireless LAN 802.11ac PCI-E NIC.
Try to remove it and see what happens ;)
 

Martin

HWiNFO Author
Staff member
And here the mentioned new build: www.hwinfo.com/beta/hw64_513_2768.zip
This should give even more precise error indication, however it won't show details as which exact PCI device is causing it simply because such information does not fit into the sensors screen.
 

Ganesh_AT

Member
Martin said:
Thanks. Sooo, that 'Machine Check Error (AMD64 NB)' is still not the precise interpretation of the error, it was rather a test version to see if my new code works. Meanwhile I have added a more detailed analysis of the errors received and will give you a new version in a few minutes.
But I had a look at the DBG file you provided and it tells me that the error is in fact a PCIe error which occurs on PCI Bus 0 : Device 28 : Function 7, which is the:
Intel Skylake PCH-H - PCI Express Root Port #8
Now you might want to check in the main window (will need to disable Sensors-only mode if used) - walk thru the PCI Bus, find that device and check what's attached there. According to the dump, the guilty device should be the: RealTek RTL8821AE Wireless LAN 802.11ac PCI-E NIC.
Try to remove it and see what happens ;)
:) I wish I had this info in the first pass..

I found out right after posting the debug log that the Realtek adapter is indeed the cause of the issue. I wiped the system and installed Windows 8.1 Professional. On fresh boot, the WHEA problem was NOT there! As soon as I installed the WLAN driver (Realtek 8821AE driver), the WHEA errors started to show. (When I installed Win 10, the driver came along with the installation, but, Win 8.1 doesn't have the Realtek WLAN adapter in the installation ISO). In the video below:

https://youtu.be/clj_XJOEgO4 (you can ignore the soundtrack)

I am showing how enabling / disabling the WLAN adapter in the Device Manager can start / stop the WHEA error count.

I have asked ECS to check at their end. I think this is either a bad WLAN adapter (actually, the Wi-Fi performance is OK - I get ~150 Mbps practical throughput in the testing), or there is some compatibility issue for the adapter with the Skylake platform. I have used the Realtek 8821AE in multiple mini-PCs before (even the LIVA Core and x2) and never had this problem. So, it shouldn't be the driver itself having the problem, but probably a combination of the platform and the driver. I am looking forward to what ECS is going to tell me.

I will check out your new build tomorrow.
 

Ganesh_AT

Member
With the latest build, the errors are tagged as 'PCI/PCIe Bus Errors', which is definitely more specific than the cryptic 'Machine Check Error'.

By the way, ECS responded and we achieved closure on the issue. Looks like it is a Skylake issue and the fix came along with the Prime 95 bug fix in the BIOS. I got the latest BIOS (not yet fully internally qualified by ECS yet) with Intel's patches and the WHEA problem is fully resolved now.
 
Top