rabithole1234
Member
In the release notes, it says:
- Added monitoring of NVIDIA PCI Express Error Counters.
Hello Martin, I have the same issue can't see the new error counters, 5090 FEPlease attach the HWiNFO Debug File for analysis. It's quite possible that 50 series work different.
Could you please explane me what does it mean "Recovery Count"? And why the value of it rises a little by little?
Since we have the same GPU and you also get NAKs sent, I'm hoping to get some insight and help us both to say if it's normal:View attachment 12860
Just throwing some data on my 5090 out there...this is after 8 hours of use. Recovery Count = 5,257 and NAKs Sent Count = 8. I'm assuming this is all normal because I haven't encountered any issues...
It seems that NAKs Sent is also related to the memory clock changing state. When the memory clock is maxed at 1750mhz, both recovery count and naks sent don't increase. The moment the memory clock is not maxed, recovery count starts going up and naks sent increases every once in a while as well. Not sure if that's anything to really be concerned about, but there are some other bugs with the 50 series related to the memory clock being too low (gray flickering on the desktop for example in some configurations).Since we have the same GPU and you also get NAKs sent, I'm hoping to get some insight and help us both to say if it's normal:
- Do you use PCIe 5?
- Do you use DirectX11 or 12?
Do these settings help you to get rid of the NAKs sent?
- Setting ASPM off in BIOS
- Setting PCIe link state power management off in Windows power plan settings
- Setting nvidia power mode to "prefer maximum performance"
Could you reproduce the errors by setting nvidia to power mode "normal" and playing any game in DirectX12, during loading screens or main menus and startup?
It would be really helpful if you can share your experience with this, thanks!
That's something I need to monitor, thanks. But it makes sense, since the clock speed also reduces if the link speed is reduced. Maxing out prevents the GPU to power down and thus avoids getting NAKs. It's interesting though, that you are getting NAKs every once in a while when the clock speed is not maxed. 8 NAKs in 8 hours does seem pretty low to me, though. I usually get much more when I have problems – usually 20-60 NAKs during one loading screen and about 40 more recovery counter; only in DirectX12 though. In DirectX11, I only get recovery counter during loading screens.It seems that NAKs Sent is also related to the memory clock changing state. When the memory clock is maxed at 1750mhz, both recovery count and naks sent don't increase. The moment the memory clock is not maxed, recovery count starts going up and naks sent increases every once in a while as well. Not sure if that's anything to really be concerned about, but there are some other bugs with the 50 series related to the memory clock being too low (gray flickering on the desktop for example in some configurations).
Doing some testing with my 4090, it seems that Recovery Count is tied to power management as it increases every time power states increase or decrease, probably due to the PCI-E link needing to be reset as it changes link speeds. It also seems that Bad TLP Count and NAKs Sent are tied together as they always match, but I haven't been able to see any events that match. Hopefully more documentations will be forth coming.