RTX 3080/3090 Memory/VRM temperature sensor

falkentyne

Active Member
Can this be added in a future hwinfo?
The board seems to have a way to read this, since people have been getting "Thermal" Limit even with a 55C CPU temperature.


Igor's website seems to show some weird classified (Nvidia?) tool that actually shows the RAM temp. That blurred icon looks a lot like the NV Logo.
And it's reporting other things like GPU clock and so forth. And if that's an I2C / SPI Bus tool, then it's clearly probing a sensor, and the video Vbios has access to it
Since it needs some way to flag "Thermal" For VRAM or VRM.


Please help us, Martin!
 
I'm sorry, I don't have information about such sensor :(
Do you know if it's present on all RTX 3090/3080 boards or is this just some board-specific feature?
It's quite likely that such sensor is accessed via I2C, but also quite likely that it's somehow restricted to internal-only NVIDIA functions. Otherwise I should be able to see such a device on the GPU I2C bus. AFAIK AIDA64 should be able to create a full GPU I2C bus dump, but if this device is somehow masked internally it won't show-up there.
 
The memory thermal sensor seems to be native to all GDDR6/X chips, but accessible using an internal interface only.
Unfortunately we don't know how NVIDIA exposes this information yet :(
 
Hi, just came across this thread, was searching because I just acquired a 3080 Ventus and have noticed the same issue. According to both HWinfo64 and Afterburner my GPU is normally running around 60-65 under heavy gaming loads, with a maximum recorded temperature of 69 degrees, yet I am still regularly getting 'Thermal limit' on HWinfo64. So, should I be concerned, and be trying to adjust my GPU and case fan curves to bring the temps down? Or is this essentially a false reading that will hopefully be addressed in future versions of HWinfo64? And sorry if the last question is poorly worded, I am relatively new to PC gaming.
 
HWiNFO is reporting only what the GPU drivers report. So this is either a false thermal event, or it's due to the GPU memory temperature reaching its limit. But we can't verify that as currently no one except NVIDIA knows how to read the actual memory temperature.
 
Hi, just came across this thread, was searching because I just acquired a 3080 Ventus and have noticed the same issue. According to both HWinfo64 and Afterburner my GPU is normally running around 60-65 under heavy gaming loads, with a maximum recorded temperature of 69 degrees, yet I am still regularly getting 'Thermal limit' on HWinfo64. So, should I be concerned, and be trying to adjust my GPU and case fan curves to bring the temps down? Or is this essentially a false reading that will hopefully be addressed in future versions of HWinfo64? And sorry if the last question is poorly worded, I am relatively new to PC gaming.

If you're using the newest Nvidia driver or the last two Vulkan beta drivers, that's a bug. It flags "thermal" and "power" for a split second at the exact same time, when a load changes excessively.
the 456.98 hotfix driver never, ever did this, not once.

Unfortunately, that does make it rather problematic to see if you are really hitting a Thermal trip :(
 
Interesting, haven't noticed it doing this on my system, i.e. flagging the two simultaneously. I'm curious though: how do you know that this is indeed a bug, and not the GPU memory actually experiencing a temperature spike during excessive load changes?
Either way, I have noticed that ever since I updated to the latest Nvidia driver (460.79), am rarely seeing 'thermal limit' on hwinfo anymore after gaming, though it does still happen occasionally.
 
Interesting, haven't noticed it doing this on my system, i.e. flagging the two simultaneously. I'm curious though: how do you know that this is indeed a bug, and not the GPU memory actually experiencing a temperature spike during excessive load changes?
Either way, I have noticed that ever since I updated to the latest Nvidia driver (460.79), am rarely seeing 'thermal limit' on hwinfo anymore after gaming, though it does still happen occasionally.

Because eVGA cards show the same thing happening but they support temp monitoring on VRM and VRAM, and they are showing 6500C temp blips. Hotter than the surface of the sun.
Safe to say it's a bug. Because this did not happen on 456.98 and older drivers.
 
I finally figured out how to report the GPU Memory Junction Temperature. It will be released today in a new version ;)
Note that this works on NVIDIA RTX 30-series with GDDR6X only and it's the internal (silicon) temperature which reaches higher values than the usual external ones. Throttling should start somewhere around 110 C.
 
Do we know what the temperature reading relates to physically? The 3080 has 10 GDDR6X chips and the 3090 has 24, so is this the value reporting the hottest of the chips? An average?
(I realise you may not actually know, since this is presumably an implementation detail of the board’s BIOS)
 
Thanks Martin, I'll keep an eye out for the new version. I made an account today to let you know your work is greatly appreciated!
 
Thanks so much! I got the newest version, memory junction temperature for RTX 3080 is showing up in hwinfo64, and so far the maximum temperature I've seen is 78 degrees, which will help me sleep better at night :)
 
Hi, will there be an update that will allow users to monitor the GDDR6 memory temps on cards like the RTX 3070 and RTX 3060 ti as well? This would be a really useful feature to have if possible
 
GDDR6 ones don't seem to report memory temperature (internally in driver), so I'm afraid this probably won't be possible.
 
Last edited:
Just registered to say massive kudos @Martin , this is great and a well anticipated feature for the 30xx series cards. You have my curiosity though, what is the secret sauce for reporting this? I'm aware hwinfo is not available for linux users so wondering if this can be extracted from nvidia-smi.
 
@Martin Thanks a ton for spending the time and effort on this program I've been using it for years.

I have the newest update and see the Memory Junction Temperature readout but I have a few issues with it.

1.) I do not seem to be able to make a graph out of it,

2.) When I move it up the list it goes back down to the very bottom column with other random readouts.

3.) And the biggest issue I see is that it does not update live. It only has a readout of the temp when I open it and have to shut the program down and restart it to get the temp to update.

Thanks for any info! This is one of my #1 used programs!

EDIT: I have a 3090FE with an i9 9900K

EDIT 2: This is what worked for me = I went into: Settings -> Layout -> Restore Original Order
 
Last edited:
Try to Reset Preferences in HWiNFO, it could be caused by some glitch in the actual sensor layout configuration.
 
Back
Top