MSI Big Bang x79 hang, maybe SATA related

ssateneth

Well-Known Member
Recently got my hands on the MSI Big Bang Xpower II (intel x79), and have had some frustrating behavior concerning when I open hwinfo64.

Often when opening HWInfo64 (1550 and 1565), it will either hang at the scanning of various sensors, or it'll continue (sort of) like normal. But the hang isn't the killer. What kills it is shortly after opening hwinfo64 (10-30 seconds), my system will start losing responsiveness and eventually completely lockup, forcing me to hard reset my PC. When restarting, my bluray drive will be disappeared which requires a long power off and unplugging my power supply to fix, or I will get an MBR Error 1, requiring me to swap SATA ports on my SSD which my OS is installed on.

Before I subject myself to this scary lockup again, is there anything I can do to prevent scanning SATA devices (or SMART if thats what is causing it), or other possible solutions before I have to resort to enabling debug mode to submit a log and causing another system hang and lockup?
 
I suspect, this might be caused by accessing a certain SMBus device, but I cannot be sure until I get more information.
If you first want to avoid a potential lockup, I'd advise to disable "SMBus Support" and see if that helps. However in this mode you'll loose DIMM and certain sensor information displayed in HWiNFO.
Then if you decide to go further and find a better workaround, I would need the HWiNFO Debug File from the situation when the machine locks-up.
 
Ok, so I ran hwinfo64 and disabled smbus and enabled debug before starting the scan of devices it normally does when starting. I tried starting it 3 times. All 3 got hung on "detecting pci devices". I ended task prematurely on the first two, glanced over the debug, and noted that the PCI scan numbers seemed to be going up, possibly to 255, so on the 3rd run I let it run its course, and waited for disk activity from hwinfo64 to cease. After that, I couldn't end task on hwinfo64.exe in my task manager. It wouldn't even reboot (stuck at shutting down screen), so I had to hard reset. My bluray drive is disappeared now, though I don't need it for now.

Here's debug logs from attempt 1 (closed early) and 3 (let it run til disk activity stopped) respectively.

http://dl.dropbox.com/u/9768004/HWiNFO64 - Copy.DBG
http://dl.dropbox.com/u/9768004/HWiNFO64.DBG\

For what its worth, I also tried AIDA64, and that program caused the same behavior: Seemed to run successfully, but 10-30 seconds after running, my pc starting losing responsiveness until all windows wouldn't react to my inputs and had to force a hard reset.
 
Thanks for the report and I'm sorry about the trouble it has caused.
This is a very strange behavior which I don't remember having seen yet. It seems there are more issues - with PCI bus scan and it seems that there's indeed a problem with enumerating of ATA drives.

I have few suggestions if you want to run further tests.
1. You can re-enable the "SMBus Support" back, because it seems it's not causing this problem.
2. You might try to enable the "Low-level PCI Access" option to avoid the hang during PCI scan.
3. Try to switch IDE Drive Scan from "Safe Mode" to "Low-level IO Access". That would no longer send specific queries to drive controller drivers to read their parameters.

You might want also to try to upgrade the storage drivers to see if that can improve something.
 
Martin said:
Thanks for the report and I'm sorry about the trouble it has caused.
This is a very strange behavior which I don't remember having seen yet. It seems there are more issues - with PCI bus scan and it seems that there's indeed a problem with enumerating of ATA drives.

I have few suggestions if you want to run further tests.
1. You can re-enable the "SMBus Support" back, because it seems it's not causing this problem.
2. You might try to enable the "Low-level PCI Access" option to avoid the hang during PCI scan.
3. Try to switch IDE Drive Scan from "Safe Mode" to "Low-level IO Access". That would no longer send specific queries to drive controller drivers to read their parameters.

You might want also to try to upgrade the storage drivers to see if that can improve something.

Unfortunately, there are no updated storage controllers as far as I know; I've checked multiple times, and there does seem to be a newer revision for RSTe, but it's only for RAID mode. The only thing that struck me as weird was the SATA controller was called C600 as if it were a server chipset, but apparently it's -supposed- to use the server-type driver.

Anyways, I'll turn on SMBus, and switch those two to low-level access and see what happens, will submit debug log when it finishes.

edit: http://dl.dropbox.com/u/9768004/HWiNFO64_lowlevel.DBG

Same hang in scanning PCI bus, but it eventually finished. When the main window came up, there was -some- hanging, but it eventually cleared up. I was also able to cleanly close hwinfo64. Once again, though, it killed my bluray drive. My SSD seems unaffected. Also tried viewing (S)ATA devices in the hwinfo window, there wasn't anything listed.

I remember seeing something that installing the intel sata drivers prevented them from being replaced. If I installed the wrong ones, maybe that'll cause problems? Not sure. I might be due for another format (yuck) in hopes of installing all updated stuff right off the bat, or maybe the Intel RSTe drivers just plain suck. Who knows.

edit: http://dl.dropbox.com/u/9768004/HWiNFO64-1575.DBG
Here's a DBG from 1575 beta. Normal options (smbus enabled, no low levels selected, just the normal things). Same PCI scanning hang. Unplugged my bluray since it seemed to be causing problems. I also did some googling and people are reporting that SMART on C600 isn't able to be accessed (yet). Would this be a problem?
 
Well, hard to say, since this problem is quite hard to track. I would need physical access to such machine and lot of time to determine this.
But so far I think there's something wrong with the SATA drivers.. It's not ok if an application queries a driver for information and that driver causes such problems...
 
Martin said:
Well, hard to say, since this problem is quite hard to track. I would need physical access to such machine and lot of time to determine this.
But so far I think there's something wrong with the SATA drivers.. It's not ok if an application queries a driver for information and that driver causes such problems...

The guys on Mushkin forums believe the SATA drivers are at fault too and suggested using the MS AHCI drivers instead. Not sure how I could go about that though since during setup of RSTe drivers it said installing these drivers were non-reversible. Will edit if I somehow manage to rollback though.

Edit: Drivers lied, I was able to "uninstall" the drivers from my system. Win7 automatically installed "Standard AHCI 1.0 Serial ATA Controller" 6.1.7601.17514 (date 6/21/2006) and now there are no more hangs, albeit a tiny one when scanning memory configuration (maybe 5 seconds, probably because of 8x8GB modules). I can also view SMART from my SSD and my 2TB seagate mechanical drive. Here's the debug, if it's any help.
http://dl.dropbox.com/u/9768004/HWiNFO64-msahci.DBG
 
Thanks for the feedback. I'm glad that you solved the problem and it wasn't a HWiNFO issue ;)
Yes, the memory scanning on these CPUs might take a bit longer, I'll try to tweak it a bit to be faster.

Btw, I suppose your sensor values in HWiNFO might also require some tweaking, since I'm not sure if HWiNFO supports this mainboard's sensors properly. Could you please attach a Report and Debug File including sensor data (open sensors window before creating a report and closing). Also, if there's a tool from MSI that shows correct sensor values (or BIOS), please attach that screenshot too.
 
I obviously know how to attach a debug, but I'm not familiar with a "report". I hope I did it right though.
http://dl.dropbox.com/u/9768004/HWiNFO64_sensors.DBG
http://dl.dropbox.com/u/9768004/reporttest.HTM
http://dl.dropbox.com/u/9768004/testhwinfo.png

FWIW, I don't have any fans attached at all except one fan on my sysfan3 header, and it's real RPM is somewhere around 1900 at the time of screenshot, but the control center doesn't seem to agree. Had to check with the fan RPM by sticking my fingers in the blades to make sure they werent failing :P. Increasing it to about 2700 rpm shows about 620rpm in hwinfo and control center, so probably a bug with bios. Using fan header 1 shows correct rpm.

I believe CC coltages to be correct though.

Also memory scanning is shortened to 1 second now. Sensor scanning seems to be exhaustive for now, taking much longer than normal, but no hands and beta is beta. I'm sure you're doing extra scans to get the right info.
 
Thanks for the data, that's what I needed.
However when I check the Control Center values, I think it might not be correct for other values as well. For example the System Agent voltage seems pretty high (higher than it should be). So I'm not sure how to properly adjust the values in HWiNFO.
Would it be possible to attach a photo of BIOS System Health screen as well? Maybe it displays different (and correct) values.
 
Martin said:
Thanks for the data, that's what I needed.
However when I check the Control Center values, I think it might not be correct for other values as well. For example the System Agent voltage seems pretty high (higher than it should be). So I'm not sure how to properly adjust the values in HWiNFO.
Would it be possible to attach a photo of BIOS System Health screen as well? Maybe it displays different (and correct) values.

I left SA on auto in my BIOS. As I increased my multiplier, my SA and core volts went up too (about 1.34 with 46x multilpier). I manually adjust core volts to be stable with 47x, but left SA on auto. The SA you see is correct.

http://dl.dropbox.com/u/9768004/SA.png

Would you suggest manually changing SA? If so, to what? As I understand, SA = memory controller. Also, as I said before, I have 64GB (8x8GB) of RAM @ 10-10-10-30 2T
 
Martin said:
Indeed, it seems the SA voltage is correct. So I'll make HWiNFO display it the same.

EDIT:
Please check this build (sensors should show values consistent with the MSI Control Center):
www.hwinfo.com/beta/hw64_394_1579.zip

DDR CH_C/D seem to be off. Also not seeing an SA voltage in HWInfo64. Lastly (not a bug) Any idea where the Auxiliary temperature sensor might be? Showing 77C for me. Not much to really compare it with. AIDA64 has even less sensor info, though it does show an SA voltage (even though its wrong by at least half a volt)

http://dl.dropbox.com/u/9768004/HWiNFO64-1579.DBG
http://dl.dropbox.com/u/9768004/1579.png
http://dl.dropbox.com/u/9768004/repor1579.HTM
 
Thanks for the feedback.
VCCSA is the System Agent voltage in HWiNFO.
I don't know what the Aux temperature is. It might be an invalid value as well, but I think only the manufacturer knows to which input it is connected.
I think the DRAM CH voltages should be read from the Fintek chip instead, so I performed additional changes. Please try this build and let me know:
www.hwinfo.com/beta/hw64_394_1580.zip
 
Martin said:
Thanks for the feedback.
VCCSA is the System Agent voltage in HWiNFO.
I don't know what the Aux temperature is. It might be an invalid value as well, but I think only the manufacturer knows to which input it is connected.
I think the DRAM CH voltages should be read from the Fintek chip instead, so I performed additional changes. Please try this build and let me know:
www.hwinfo.com/beta/hw64_394_1580.zip

Auxiliary temperature definately has some real world interactions as the temperature changes seem accurate with my room. Doesn't seem to change with huge CPU loads though. Quite possible the sensor is somewhere where it's at a hotspot but with little to cool it aside from passive air. Maybe the chipset? There are no fans on the Big Bang, even the chipset.

DRAM Channels now agree with msi center. Looks like only 2 mystery voltages left (VIN2 and VIN3).

Any other feedback you'd like?
 
Yes, the Aux temperature might be the chipset, or something else... But it's hard to know for sure without information from the mainboard vendor. Similar applies to the rest of VIN voltages.
Thank you for the feedback, I think this is all I can do for now.
 
Back
Top