Sudden crash of HWInfo... IPMI or WHEA 17 related?

Hi,

Both running in command line and "regular" modes results from time to time in a sudden termination of HWInfo. It may be related to visual inspection of sensor values, but not 100% convinced.
Tried to enclose dbg, but don't know if it reveals anything.

Moreover, I do get random unknown WHEA 17 errors, sometimes one per minute and in other cases none for a whole day, claimed to be related to NMVEs and Primary Bus 43:0:0. I have not been able to figure out how HWInfo can help to find out the hardware on the given bus, incl which NVME it refers to as all 7 are of the same brand with 4 on an Asus Hyper Card. Can HWInfo tell if it is a PCIE problem with an NVME on the card? However, don't think it is related to the termination problems.

Finally, could the HWInfo problems be related to the MB, being a workstation MB, running IPMI/BMC monitoring in the background?

Thanks in advance,

Magnus

AMD Ryzen Threadripper PRO 5975WX 32-Cores; ASUS Pro WS WRX80E-SAGE SE WIFI; ASUS DUAL RTX 2080 Ti; Microsoft Windows 11 Professional (x64) Build 22621.1992 (22H2); 8 x 64GB DDR4-3200 / PC4-25600 DDR4 SDRAM RDIMM (Kingston)
 

Attachments

Yes, this crash seems to be due to IPMI, maybe if there are multiple clients using IPMI at the same time. Disable the "EC Support" option in HWiNFO to avoid this.
As for the WHEA errors, I'd need to see the DBG file which captures such error to determine which exact device is causing it.
 
Yes, this crash seems to be due to IPMI, maybe if there are multiple clients using IPMI at the same time. Disable the "EC Support" option in HWiNFO to avoid this.
As for the WHEA errors, I'd need to see the DBG file which captures such error to determine which exact device is causing it.
Thanks for the info, and hopefully the EC setting will resolve the HWInfo problems.

Enclosed you find a dbg snapshot and during the time it was generated there were literally hundreds of Whea 17 warnings generated. I had a quick look at the dbg and did not really find any interesting info, but did not really know what to look for. In the majority of cases via Primary Bus 0x42, but at other times it is 0x43 which dominates. They are all on the format below. It seems like they only occur following a "cold" start and not a restart, but not 100% certain.

/magnus

- <Event xmlns=" ">
- <System>
<Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
<EventID>17</EventID>
<Version>1</Version>
<Level>3</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2023-07-17T13:51:08.2576222Z" />
<EventRecordID>262294</EventRecordID>
<Correlation ActivityID="{fb03fb0b-679c-40fe-9264-039395c70527}" />
<Execution ProcessID="6216" ThreadID="14476" />
<Channel>System</Channel>
<Computer>Monster_Machine</Computer>
<Security UserID="S-1-5-19" />
</System>
- <EventData>
<Data Name="ErrorSource">8</Data>
<Data Name="FRUId">{00000000-0000-0000-0000-000000000000}</Data>
<Data Name="FRUText" />
<Data Name="ValidBits">0xef</Data>
<Data Name="PortType">0</Data>
<Data Name="Version">0x2</Data>
<Data Name="Command">0x406</Data>
<Data Name="Status">0x10</Data>
<Data Name="Bus">0x42</Data>
<Data Name="Device">0x0</Data>
<Data Name="Function">0x0</Data>
<Data Name="Segment">0x0</Data>
<Data Name="SecondaryBus">0x0</Data>
<Data Name="SecondaryDevice">0x0</Data>
<Data Name="SecondaryFunction">0x0</Data>
<Data Name="VendorID">0x2646</Data>
<Data Name="DeviceID">0x5013</Data>
<Data Name="ClassCode">0x10802</Data>
<Data Name="DeviceSerialNumber">0x0</Data>
<Data Name="BridgeControl">0x0</Data>
<Data Name="BridgeStatus">0x0</Data>
<Data Name="UncorrectableErrorStatus">0x0</Data>
<Data Name="CorrectableErrorStatus">0x1</Data>
<Data Name="HeaderLog">010000040F01004000000742BFFFB415</Data>
<Data Name="PrimaryDeviceName">PCI\VEN_2646&DEV_5013&SUBSYS_50132646&REV_01</Data>
<Data Name="SecondaryDeviceName">PCI\VEN_1022&DEV_1480&SUBSYS_88151043&REV_00</Data>
</EventData>
</Event>
 

Attachments

Back
Top