Missing DIMM temp data

duanet

Member
Hi:

We have two identical systems, but one shows the DRAM DIMM temp data, but the other one does not. I'm attaching the debug and log files from the system that's not reporting the temp data.

I read in other posts that a driver might be blocking the SMBbus data. Isn't the DIMM temp data coming from the SPD chip via I2C?

Thanks,
Duane
 

Attachments

Martin

HWiNFO Author
Staff member
SMBus is a layer above I2C and DIMM temperature comes from a dedicated TSOD sensor on the module (if present). Do both machines contain the same memory modules?
The attached Debug File however doesn't contain sensor data. Please make sure to open sensors too before closing HWiNFO and then attach the new Debug File.
 

duanet

Member
Thanks for getting back to me so quickly. Yes, we have the same modules in both systems.

I'm attaching the new debug file with the sensor data.
 

Attachments

Martin

HWiNFO Author
Staff member
Thanks. I think I know where the problem is, but need to confirm my assumption. Can you please also attach a similar Debug File for the machine where DIMM temperature is shown?
 

duanet

Member
OK. Here's the Debug File from the system that reports the DIMM temperatures.

It's interesting that I see the opposite behavior in IPMI. There is no DIMM info in the "good" system, but there is DIMM info in the "bad" system.
 

Attachments

Martin

HWiNFO Author
Staff member
OK, so the situation is following. On such systems (Skylake Server) the CPUs have memory module EEPROM (SPD) and DIMM thermal sensor (TSOD) connected straight to the CPU via a dedicated SMBus.
This allows them to support continuous monitoring of memory module temperatures, which is called CLTT (Closed Loop Thermal Throttling).
When this mode is activated in the BIOS, the CPU/PCU takes ownership of the SMBus and periodically queries DIMM temperatures. This is a nice feature for power management, but the down side of this is that any other application cannot access the SMBus to retrieve SPD or TSOD data. This is what you see on the first machine and you will also notice that memory module information is missing there.
So I believe the difference between both machines is a BIOS setting called CLTT (or something similar related to memory thermal management).
Fortunately this is not a showstopper. Even with CLTT activated, you should be able to see memory module temperatures in HWiNFO shown as "Memory Controller X Channel N Rank Max" - this is the same value as read from TSOD.
 

duanet

Member
That's amazing, Martin! Thank you very much!

You're right about the temps showing up in "Memory Controller X Channel N Rank Max". Why does HWiNFO64 report differently? Is it because CLTT takes over SMBus? I would think the TSOD data would be available in either case. Also, there's only 2 channels/3 ranks reported for each CPU.
 

Martin

HWiNFO Author
Staff member
Yes, CLTT takes over SMBus and in that case HWiNFO cannot access the SPD or TSOD anymore.
 

Martin

HWiNFO Author
Staff member
Because that's an internal CPU register where the CPU/PCU stores the data read via CLTT.
 

duanet

Member
Hi Martin:

What’s different between the two systems? Why can one read the SMB and the other can’t? Different CPU revisions or motherboard revisions?

Thanks
Duane
 

Martin

HWiNFO Author
Staff member
As I wrote earlier, I believe the difference is a BIOS setting. Not sure how it's called there, might be something like DIMM Thermal Management, CLTT...
 

Martin

HWiNFO Author
Staff member
The setting is most likely called differently.. But I don't know how it's called there.
Enter the BIOS menu locally and check all settings related to memory thermal management and look for differences.
 

duanet

Member
Checked the IPMI but the BIOS are the same rev. The no temp mobo is a later rev Maybe?

can you recommend any texts? This stuff is really cool
 
Top