Sapphire A9RX480 HWiNFO crash

trodas

Well-Known Member
First at all, Sapphire A9RX480 "Pure Innovation" aga "Grouper" mainboard is a VERY nice mobo indeed: http://postimg.org/image/bu2pcmy6j/full/
http://s1.postimg.org/a6d0h180f/Sapphire_A9_RX48_3.jpg

...and mind you, these are OLD pics with the BAD CAPS there, that are (on the top of that) - ughly. The new caps looks like this:
Nichicon_HZ_grouper.jpg


That is my picture, that is how the mobo looked AFTER the catastrophic failure with bad caps of pre-production models. They fixed it with good quality Nichicon (even the gold ones - a true HZ Nichicons, witch are the best elyte caps ever made!) caps.

It is a socket 939 mobo that is known to crash so badly with SpeedFan (w/o the use of the /NOSMBSCAN option after the executable) and cause troubles even for CPU-Z - do NOT recommended is to running CPU-Z w/o this option changed in ini file: SMBus=1 to SMBus=0.

So I was pretty confident that HWiNFO v4.22-1970 portable will crash and it does on this mobo.

Unlike SpeedFan it at least did not require rest the bios settings to get back to working state (it does require a cold start, tough).

However since I already know, that the SMbus is the root of the problem, I on my MSI PM8M3-V mobo changed the prefs of HWiNFO - reset them to defaults and then disable the SMbus support AND activated the debug mode.

And going this way it make the HWiNFO works!

There is a longer delay on the nVidia sensors (dunno what is happening there, right after install the nVidia panel worked like it should, now after just a reboot or two I cannot get into it?! WTF!). but I can provide the values! Also Everest report I include to make support for this mobo.
 

Attachments

  • Sapphire A9RX480 Everest report.zip
    30.1 KB · Views: 0
  • Sapphire A9RX480 HWiNFO reports.zip
    139 KB · Views: 1
Thanks for the report. If you would enable SMBus back and run HWiNFO until the crash and send the DBG file produced, I could have a look if there's something I can do about this problem... Maybe there is a certain device on SMBus causing troubles and I could make HWiNFO to skip touching this device.
 
Definitively is a good idea not touch it on this particular mobo, but I think you did not understand right how hard the crash is. It is not like application crash. The whole machine freeze, screen freeze, everything. Not even hardware reset is working (!)...
(no kidding, this is a very serious crash)

This is not the kind of crash, that allow the HWiNFO to write the debug file. I run HWiNFO in debug mode for a long time (2MB file in the end, lol) and the file is 0 bytes long for the whole time - till the exit.
Therefore I very much doubt, that I see anything else that a zero bytes file after that experiment.

Much more I can tell about what devices the scan passed and on what it hanged. There is a chance, but debug file? I think forget it... but we see. And yes, after I choose to disable the nVidia tray icon, then the options changed and I have to call a nVidia cpl, but it never come and it is also nonexistent in control pannels, witch is weird at least.
Forceware 162.65 for XP/W2k...
 
I well understand that it's a system (not application) crash. I wouldn't consider it as very serious, since such crashes occur quite often and for tons of reasons (faulty drivers, instable kernel, etc). HWiNFO writes the debug file during run-time straight to disk. You can't see the actual size of that file when HWiNFO is active, because the file is locked.
So even after a serious total system crash, the debug file is written to disk and can be retrieved after new boot. I can then analyze it to check what was the last operation that HWiNFO performed and perform required actions.
 
Well, I have a different opinion on crash, that freeze the machine that hard, that not even reset it working. I call that a catastrophic failure.
Normal crash is, when app fails. Fine, noproblemo.
When BDSD happens, well, this is serious crash. But still - reset and you are good to go - just don't do what you did before, lol.
But then not even reset it working (or the extreme case, when absolutely NOTHING is working untill a bios settings reset), then I become concerned and did not exactly want to repeat it...

Oh, well, when you insist... I try to crash my poor mobo gently with the latest beta and debug... even tought I'm convinced I see the debug file as having 0 bytes. We see...

I think that the last activity that freeze the HWiNFO (and machine in general) will be far more interesting and helping to know.

...testing...

So, HWiNFO freezed after that the main window is shown and in the little information "window" are in progress the "Detecting sensors..." and in particular "THMC/FMS/NE #0" ... The mouse stop moving, Ctrl Alt Del don"t react, so only HW reset button is left. And then the computer does not even start. That is why I call this a catastrophic failure. Because to must power off the machine by PSU, then this is IMHO overkill for a crash. Too harsh on the poor machine.

After power up it at least starting w/o reset of the bios (ufff) and debug file... hell yet! It does exist and it have 184kBy! Wow. I did not expect that. So at least let's hope this will be good for something and the next crash of another alpha version will not be as catastrophic, as this one was.
 

Attachments

  • HWiNFO32 debug Sapphire A9RX480.zip
    22.6 KB · Views: 3
It seems there's a device on SMBus which doesn't like either access to it and crashes the system - might be a not properly compliant SMBus device.
Since HWiNFO does only read access via SMBus, I believe no device should cause such problems, so it's a not well designed one.
There are a few mainboards I've seen so far which cause such problems (i.e. many DFI boards don't like such accesses too).
Currently it crashed at address 0x2C, so you might use the "SMBus Device Exclusion" list to disable (select) scanning of particular devices. You might try to check 0x2C-0x2F devices, but it's possible that other addresses might cause such issue too and it would crash at a different address too.
 
No argument about that this should not happen in the first place :sleepy: But no-one evidently betatested the mainboard well (run on it SpeedFan, what more was need?!) before going to production, so, the bug went there...

Glad to hear that my mobo is not alone in this situation. Tough I believe that the severity of the crash is unparallel. Or you ever heard about mobo, that crash that hard, that must be bios settings cleared?
I don't, and I working with the damn computers since 1991, so.. I did not want to brag about it, but this is something I never seen yet before.

Anyway, back to the UK! (Oooops :) )
There are bios screenshot:

Sapphire_A9_RX480_bios_temps.jpg


(no, there is really NOT anything on the mobo that has 63°C, no no no)

And there is the HWiNFO running on the Sapphire A9RX480 with completely disabled the SMBus:

HWi_NFO_Sapphire_A9_RX480_sen.png


And time to check out what happens, when I re-enable the SMBus with the exclusion of 0x2C - 0x2F.
BTW, by default the SMBus have enabled the exclusion of 0x00 - 0x0F and 0x69. Why is that? More mainboards did not like to be touched there, or...?


Wow, it worked right away! :exclamation:

So, for a Sapphire A9RX480 mobo is all, that is need to not crash it, to disable SMBus accesing the 0x2C-0x2F devices :cool:

Of course it is quite possible that it might be enough to disable acces only - say to 0x2C - 0x2E, but is that even worth testing? I dubt that anything usefull can be gained there, because I see more that enought values on the screen, lol.
They are just wrong in many cases, but after some tweaking I bet we get to more ustable state. But mainly your program can become crash-proof on Sapphire A9RX480 mobo (not that much guys can have it, but it at least never crash anymore, because if you detect this mobo, then God forbid you access these SMBus devices, lol).

And there you have a debug from the Sapphire A9RX480 mobo. Oh, wait, you want the report now too, when the SMBus is enabled?


PS. You are obviously very experienced, because your tip worked right away and you have no way to know, if another device did not do the same crash - witch I will undestand and crash-by-crash we can get into the working config, but this was kinda fast, lol ;) It is likely that not much devices are so so sensitive. So now I can report to the autors of CPU-Z, that all what is need on Sapphire A9RX480 mobo is not to touch SMBus devices 0x2C-0x2F to prevent the crash or the necessity of changing the ini file in order to run the CPU-Z on this particular mainboard :huh:

PS2. This is how the testing config looks ATM:
 

Attachments

  • HWiNFO debug Sapphire A9RX480 SMBus on.zip
    44.6 KB · Views: 3
HWiNFO has already a list of mainboards causing such problems, so it avoids to access their problematic devices. Now I'll add your model there too and skip those addresses, so that no other user will need to change any setting (it will be automatically skipped internally).
I also do see according to the BIOS screenshot, that some of the sensor values will need to be adjusted. I'll do this and release in the next build.

EDIT: I'm not sure about the Temperature 2, but I think it might be the CPU Core temperature. You should be able to determine this when putting some load on the CPU and watch how values change. It's quite common that the CPU temperature varies a lot between BIOS and operating system.
 
Sounds promising! :) As far as the BIOS goes, the mobo showing many interesting voltages - the NB 1.2V voltage, then 1.8V that is used for God know what, etc.

Temperature 2 sure as hell cannot be CPU. First at all, 62°C is too high. Second - while you are right, the BIOS/Winblows temp do differ, then it usually is that in BIOS they are higher, because the CPU act as if it is fully loaded, while in Winblows it mostly just rest. And I have a nice Zalman fan, as you can see. 62°C if way off this world.

Including report and I try Prime torture test and we see... together with Everest or even SpeedFan, so we can compare the temps :) Because IIRC SpeedFan worked well with the /NOSMBSCAN :p

Testing... The "chasis" fan (in fact a CPU fan and the only ONE fan present ATM on the mobo) is steady around the 1600 rpm, yet in the "current" culum only it sometimes show as 0 rpm (no, the fan did not stop spinning, lol, the Zalman FanMate just regulated it to be quiet and running w/o any changes with the rpm) - interesting :) Could be related to the problems of adjusting my table too, because I disabled all the culums, except for the "Current" values and this one is causing the problems, lol.



Wow, it failed :s Damn, I have to check out WTF is going on there! This was a stable machine, it is not ever overclocked (yet) ... damn. But you can see the temps and the rise, right?

IMHO External 1 is the CPU core temp (Everest got it right), Temperature 1 (Aux by Everest) is the CPU VRM temp and Motherboard is the mobo sensor (Everest got it again right).
What it get deadly wrong is the Vcore voltage, same as HWiNFO - just exchange the Vcore for Vccp2 voltage (that is the 1.8V for God know what) and all will be fine. + voltages (3.3, 5 and 12V) seems good, but - voltages are missing (so they are in the bios) and VIN5 is probably Vdimm and VIN6 is likely Radeon Xpress 200 or HT link voltage (one of this voltages are missing, tough). All these voltages can be easily identified by changing them in the bios and checking by HWiNFO again.

5V standby is way off and is not even in the bios, as well, as nothing looks like the battery voltage of 3.05V.

Now why the damned suxxking thing failed the test...


PS. tested, VIN6 is the Radeon Xpress 200 voltage. Upped to 1.4V in bios, showing now a 1.376V = I got this suxxka!

PS2. Looks like the SpeedFan ( /NOSMBSCAN or deadly crash) get the voltages reasonably right:


- it got the battery voltage, lol! :D
 

Attachments

  • Sapphire A9RX480 report.zip
    19.5 KB · Views: 1
Thanks for the new tests. OK, I'll adjust the temperature labels as you suggest.
Don't worry about the voltages, I have already figured them out - based on the BIOS screenshot it's pretty clear ;)
 
Sounds good. At least I know now, that the 1.8V is for the PCIE slots, hmmm.
Interesting. Did not AGP used like 1.5V? The more I pushed this AGP voltage on the MSI PM8M3-V mobo, the less it was stable, lol.

New version with support for this Sapphire A9RX480 mobo will be great - just don't rush it, better to polish some bugs first, heh.
 
Hooray! The new build HWiNFO v4.23-1990 fixed the crash!

...and surprisingly, many voltages are now well recognized, great! So, this thread could be marked as "solved", because the problem with Sapphire A9RX480 mainboard are gone - hopefully for good.

There are still some minor interesting quirks, like the Fan5 that get added suddently and showing 2600 rpm, while I have no fan, but the chasis one used for CPU, but what the hell... :)
Most importantly HWiNFO now work and work well on Sapphire A9RX480!

Thank you very much, Martin!
 

Attachments

  • Sapphire A9RX480 report NEW.zip
    16.1 KB · Views: 2
I just discovered, that for the CPU-Z is enough to use the SMBus=0 change in the ini file, the previously mentioned DMI=0 and Sensor=0 settings are not required at all. SMBus is enought.
So I just updated the first post accordingly.
I also did reported your findings, Martin, to the CPU-Z autors. The CPU-Z wrongly (as Everest) claim that my Vcore is 1.780V, while it is 1.400V... (or a bit lower, to be precise) - that would also be better to be fixed. Is there are any possible "guide" or recommendetion that could be made towards CPU-Z autors, so they can detect the Vcore as well, as HWiNFO?

As you can see there: http://valid.canardpc.com/show_oc.php?id=148241
There IS a reason, why the CPU-Z did not show any voltage, lol. Now it is showing, but it obviously picked up the Motherboard 1.8V voltage. A fix would be great, witch is why - if you can include / post there some infos towards better reading of Vcore, then it might be very helpfull :) I pointed them to this thread, so...
 
I'll tell it to Franck (author of CPU-Z) if he asks ;) We're exchanging a lot of information (the same with Tamas/AIDA64) ;)
 
Sounds great, thanks! :)

Already complained there about it: http://www.cpuid.com/contact.html
But no reply yet. Also noticed, that the new version did not show FSB on JetWay V266B mainboard...


Oh, now I got reply from Franck too. But not on the PI-A9RX480 mobo ( :( ) but on my reports that new CPU-Z cannot show FSB on old VIA chipsets:
KT266A:
http://valid.canardpc.com/wdmstd
http://valid.canardpc.com/5cdw60

KT133:
http://valid.canardpc.com/vblm4v

...so maybe the Sapphire PI-A9RX480 mobo issue get fxed with this? :) Dunno, we see. Older CPU-Z had noproblemo with detection of FSB on these oldie systems:
http://img127.imageshack.us/img127/1087/duron700302minif1.gif
...but the modern one seems to have a problem. Not to mention it freeze on Soltek SL-KT600-R (VIA KT600) mobo on startup. No idea why, the mobo is running relatively stable, but truth to be told, the Chemicon KZG caps are bulging and it cannot pass right now a Prime95 test, so I would not consider it stable. Tough CPU-Z startup should not be as demanding, that it freeze the mobo... It freeze on the "Processors" right on startup, used CPU is Duron 1600 :D
 
New CPU-Z beta still crash badly (to the point of bios clearing!!!) the Sapphire PI-A9RX480 :(
And still freeze on Soltek SL-KT600-R (VIA KT600) mobo on startup...

...but it does report the FSB on the KT133 MSI 6340 mobo... and probably will report FSB even on the JetWay V266B one, that is up to the test...
 
New CPU-Z 1.67.0 does not crash on the Sapphire PI-A9RX480 mobo - hoooray! - and it does report Vcore voltage accurately - thanks to you, Martin!

Good work!
http://valid.canardpc.com/lql2fb

Nothing groundshaking, but it works, that is what counts :) Overclocking come later... :)
 
Back
Top