Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
GPU Memory Errors Reading
#1
So whenever I try to reset the number it just doesn't do it. It flat out refuses to reset. I have to do a hard PC reset in order for it to reset the values.
Reply
#2
The reset value is valid only for min/max/average values. Current value is not reset and HWiNFO accumulates this counter because whenever the GPU goes idle, the counter is switched off.
It's sufficient to restart HWiNFO (or close the sensors window) to reset the current value.
Reply
#3
(07-02-2016, 07:04 PM)Martin Wrote: The reset value is valid only for min/max/average values. Current value is not reset and HWiNFO accumulates this counter because whenever the GPU goes idle, the counter is switched off.
It's sufficient to restart HWiNFO (or close the sensors window) to reset the current value.

That's the problem. It doesn't reset. The max doesn't reset, the average doesn't reset and the minimum doesn't reset. The current value just keeps steadily climbing. If I close the whole application, delete the files, download them again and install it again it just picks up where it left off. the ONLY way to reset that sensor is a hard PC reset. The GPU I am using is an Asus R9 280. It's a custom PCB (not sure if that matters).
Reply
#4
Then it must be the GPU hardware counter that doesn't reset. It should do when the GPU goes idle and you restart HWiNFO. If not, then I'm afraid I can't do anything.
Reply
#5
(07-02-2016, 08:57 PM)Martin Wrote: Then it must be the GPU hardware counter that doesn't reset. It should do when the GPU goes idle and you restart HWiNFO. If not, then I'm afraid I can't do anything.

It's possible. Might be something Asus changed on the board design. A friend with an Asus 290 told me that for him this value was basically a random number generator.

So i take it that this means the random errors it reports are likely either false positives or a result of something Asus have changed?
Reply
#6
Hard to say exactly, but I don't think OEMs can change the way how those errors are measured, they can only do something that causes those errors (bad memory, high timings, etc.).
Are you getting the errors at stock settings (clocks+voltages) ?
How many errors are you getting, does it rise constantly ?
Do you observe screen artifacts ?
Reply
#7
(07-02-2016, 09:49 PM)Martin Wrote: Hard to say exactly, but I don't think OEMs can change the way how those errors are measured, they can only do something that causes those errors (bad memory, high timings, etc.).
Are you getting the errors at stock settings (clocks+voltages) ?
How many errors are you getting, does it rise constantly ?
Do you observe screen artifacts ?

All stock
when gaming - about 1 every 2 hours is reported
no screen artifacting. Same with my friend with his 290. He got about 54 without any artifacts.

EDIT: I am also using 2 screens so the card's memory clock is always at 1300MHz (5200MHz effective)

EDIT 2: I've also tried downclocking but that doesn't change anything. I also can't get Unigine to cause an error to be reported - only games. I've ran OCCT error checker and it doesn't find anything. I don't even understand what these errors are.
Reply
#8
That's a very low amount of errors, though it should not happen at all with stock settings. Users pushing memory clock too high or with artifacts are getting hundreds to thousands of errors per second.
You might try with a single monitor when the memory clock drops to lowest state and then restart HWiNFO whether the counter resets to 0.
Reply
#9
(07-02-2016, 10:30 PM)Martin Wrote: That's a very low amount of errors, though it should not happen at all with stock settings. Users pushing memory clock too high or with artifacts are getting hundreds to thousands of errors per second.
You might try with a single monitor when the memory clock drops to lowest state and then restart HWiNFO whether the counter resets to 0.

I did ask a friend of mine who works in this field and he told me that any type of memory, be it HDDs or Flash, accumulates errors and that is the reason for algorithms being implemented to catch and correct them. I'm still not sure what the nature of these is but from what I've read in the only article I could find - GDDR5's EDC is a sensor that monitors the bus for corrupt data and requests a re-send when it encounters errors. Please correct me if I am wrong.

Also - yeah, when I restart HWiNFO with a single screen it resets properly. Though the "Reset" button still doesn't work. I'm also using the Portable version if that matters.

VRAM is SK Hynix, again, if that matters.
Reply
#10
Yes, that's right. This EDC counter doesn't differentiate between correctable and uncorrectable errors, so we don't know what kind the reported ones are.
Since there are only so few of them, I wouldn't worry about it. If you do, you might try to RMA the GPU.

Running with a single display confirms that the hardware counter works properly and resets when the memory clock drops to low state.
Reply
#11
(07-02-2016, 10:58 PM)Martin Wrote: Yes, that's right. This EDC counter doesn't differentiate between correctable and uncorrectable errors, so we don't know what kind the reported ones are.
Since there are only so few of them, I wouldn't worry about it. If you do, you might try to RMA the GPU.

Running with a single display confirms that the hardware counter works properly and resets when the memory clock drops to low state.

I doubt I can get an RMA when the system is working fine. I also tried an old DX8 game which requires compatibility mode to work fine on my system (if I don't use it I get a weird bar flickering in the middle of the screen) and the counter didn't notice that at all. Also, I tried to push the memory as hard as I could and got to 2.4GB/3GB usage and that didn't really affect the error rate in any way. And so far only 3 games have shown to trigger it - Witcher 3, CS:GO and Divinity Original Sin - all of which I run in borderless window. I did run Shogun 2 for 2 hours and it didn't register anything.
Reply
#12
yoyo.

I'm running a Sapphire R9 390 Nitro. Memory running at stock speed but there is a factory core overclock. I have not noticed any issues with my card until HwINFO recently added ability to detect GPU Memory errors.

As I am playing Battlefield 4 or World of Warcraft I accumulate about 50-60 errors also. I am assuming it is normal to have a small amount of errors on these graphics cards? Since you and your friend are experiencing the same issue and my card is brand new. Although it is disheartening to see any kind of error popping up I won't lie it is giving me the 'OCD eye twitch'. I considered RMA'ing the card but I haven't noticed any real world performance loss and all benchmarks are consistent with the performance of this card I suppose we just have to no worry about it? Sadly not much solid information on this right now.
Reply
#13
Perhaps that your card does not have enough capacity to supply chains, such as PSU wires for low current consumption device. The recommended allowable current density of 5 A/mm square. for copper and 3 A/mm square. for aluminum. This is due to the different conductivity of various metals and the value of losses in them. The higher the current the greater the loss in the wires, and cheap PSU manufacturer often saves it in their section. And yet, it can also help - the cause of the error may be interferences in the supply chain and the input line filter will not eliminate them because there is usually a varistor or zener diode bridge in diagonal (in the more expensive models) that limit the noise amplitude. Multilink LC-filter plants pose rare - inductance power inductors at frequencies 50 - 100 Hz is large enough, and do chokes are quite expensive, but they are better as a technical solution as the slope of the roll-off LC filter unit (oscillating circuit) outside its bandwidth is 20 db per decade and more of them, the greater the attenuation of the filter (the number of its units should be multiplied by 20 db). You can solder as close as possible to the jack load power to break the power wires (they are colored) high-frequency ceramic capacitors are communicating with capacity of 0.1 uF and their body connected to a common wire (black), but they are not cheap, and most importantly they have to look at catalogs of specialized firms. The easiest way to buy and replace the PSU. Put example Chiftec, Delta, Hipper Power, Inwin senior (non-budget) series - they correct circuitry and wires, these plants do not save.
Reply
#14
There are 2 kinds of GPU Memory Errors - correctable and uncorrectable ones. Correctable ones are not so serious as uncorrectable. Unfortunately it's not possible to distinguish between these categories when reporting them - they both are summed up.
I think that when there's no overclocking applied, there should be no GPU Memory Errors occurring.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)