Enabling ADL makes the AMD Fury crash

Szb84

Well-Known Member
Hi!

Just noticed that enabling the ADL support  shows additional information like VRM temperature, voltage and VRM currents, but a few seconds later the card switches to black screen and the fan runs at max speed.

Latest HWINFO stable version and AMD Crimson 16.1
The crash is almost instant and I can only reset it with a manual shutdown.
 
Ah yes - I was expecting that ;)
These GPUs have serious issues when trying to use I2C (i.e. to access VRMs). Thus HWiNFO has intentionally disabled own I2C methods.
I'm currently in contact with AMD if there's a way how to get this working properly. Until then it's not recommended to use I2C on the Fiji GPUs.

EDIT: Sorry, I have only now noticed the topic name where you already said it's the Fury..
 
Ok, thanks.
Not sure how but MSI AB or GPU-Z can read the GPU VDDC but not the rest of VRM related informations.
 
HWiNFO should read the GPU VDDC (core) too. But I don't think anyone is able to read anything via I2C from those GPUs.
 
I only get these without ADL:

[attachment=1784]

And here is a fast capture with ADL

[attachment=1785]

(there are more sensors but they appearing later and the crash happens almost instantly so I can't take a screenshot)
 

Attachments

  • fiji.png
    fiji.png
    14.4 KB · Views: 12
  • adl.png
    adl.png
    12.8 KB · Views: 10
Yes, it crashes because of the already mentioned problem with I2C.
I'm just wondering why VDDC is not shown, because that's not read via I2C. The HWiNFO Debug File might tell me more.
 
Attached the logs - one is with ADL and possibly incomplete due to the crash but the other one should be fine.
 

Attachments

  • log.rar
    101.6 KB · Views: 1
Thanks. I believe the VDDC reported by other tools is not a truly measured value, but rather an expected voltage based on the actual performance state.
AMD's own drivers don't seem to report actual voltage, that might change later with new driver versions.
 
Please try the new Beta build 2781.
I'm not yet absolutely sure about the stability with this I2C access on Fiji.
Note, that you might need to Reset Preferences in HWiNFO, because of the GPU I2C Caching feature which might prevent access the I2C devices deemed inaccessible in the past.
 
Just tried the latest version and it's stable, but there are some incorrect readings (negative/too high values for voltages, current and power) and hwinfo starts a little slower at the first time but that's a minor issue:

Screenshot and debug file attached.

Edit: After a few minutes the same crash (black screen and GPU fan at max) happened again and next time the "GPU [#0]: ATI/AMD Radeon R9 Fury: CHiL/IR PMBus - GPU Core" part disappeard, after reseting the GPU I2C cache is there again.
But it may not be a HWiNFO issue because MSI AB also run (I tried to change the GPU voltage and monitor it in HWiNFO) so it may be an incompatiblity issue when both apps are accessing the GPU or I set too low voltage.
I reverted to the latest 5.20 (non-beta) to see if it helps.

Edit2: looks like it was an incompatibility issue with MSI AB according to the developer it can happen with FIJI http://forums.guru3d.com/showpost.php?p=5202569&postcount=2

"Please take a note that direct access to AMD SMC from multiple simultaneously running hardware monitoring applications can be unsafe and result in collisions, so similar to I2C access synchronization we introduce global namespace synchronization mutex “Access_ATI_SMC” as SMC access synchronization standard. Other developers are strongly suggested to use it during accessing AMD GPU SMC in order to provide collision free hardware monitoring"

The incorrect readings are there however.

Edit3: no, still crashes even without AB.
 

Attachments

  • HWiNFO64.DBG
    675 KB · Views: 1
  • amd.png
    amd.png
    52.1 KB · Views: 6
Well, it looks the I2C protocol is still very unstable. Are you getting better results with ADL enabled or disabled ? Also note, that in the latest HWiNFO Beta I have changed some settings for ADL I2C, so it might be interesting to know when testing with ADL enabled which version works better for you.
Also if possible please attach a DBG file of the BSOD - the one you attached seems to be when all was OK.
 
Ok, I will test the new beta to see if it crashes.

With ADL enabled in the new beta the performance is bad in 3D applications. I get stutters and I can see the GPU indicator LEDs on the card are lighting up from half to maxed constantly. Also in Furmark the GPU VRM Power in/out should be above 200W while with ADL enabled it's only 50 W so the cards performance is limited.

Under heavy load the readings are more accurate - I don't get negative or irrealistic values while in desktop the results are almost instantly bad.

I also noticed a minor issue but sometimes the HWiNFO sensors would disappear and then reappear with all info.

Edit: Added a log with the crash - I was runing a 3D benchmark when it happened.
 

Attachments

  • HWiNFO64.DBG.7z
    59.2 KB · Views: 0
Thanks for the additional information. A couple of additional questions I have:
1. So with the latest Beta ADL I2C is not working well.. Does it mean it did work better in v5.20 ? Was performance better or accuracy ?
2. Are you getting less (or perhaps none?) invalid results when using ADL or without it ?
3. Does a BSOD occur any time, or is it more likely to occur under high GPU load ?
 
1: It works better with 5.20+ADL, definetly less stutters and the VRM power in shows 4-5x more as it should.
2: The VRAM current and power shows 0.0 instantly at start but other sensors are showing correct values longer.
3: first time It happened when the PC was left idle on desktop I would say it's more easier to reproduce with higher GPU load but it will happen anyway sooner or later.
 
Thanks. In the latest Beta I'm accessing the GPU I2C via ADL at a lower speed which should be more reliable. But as you see, it seems to be too slow impacting the entire GPU performance.
So I'll set the ADL I2C speed back to the level in v5.20.
2. Was that answer for ADL or non-ADL mode ?
3. And do the BSODs more likely happen when using ADL or non-ADL mode ? I'm afraid, I probably can't do anything more about this - I'm doing now everything recommended by AMD to avoid these problems, but it still doesn't seem to be 100% reliable. Maybe a later AMD Crimson driver will work better...
 
2: It was for ADL mode with the latest Beta.
3: With ADL enabled I got BSOD faster but it may not be ADL related just a coincidence.

Is there any way to disable some sensors and report only the VRM temperatures ? That's the only sensor (and the voltage but that shows incorrect values) that "concerns" me and I would like to monitor always. Maybe other sensor like current or power are causing the BSOD.
 
Sure, just right-click on the desired value and choose Disable Monitoring ;) Or with the latest Beta you can also hit the Del key.
Disabling most of those sensors should certainly reduce the probability of all issues you see, but I don't think the BSOD is caused by a certain reading. I believe it's the fact that I2C communication still collides with something else in the system. Even though now I use AMD recommended synchronization mechanisms...
 
Back
Top