Overclock.net banner

Permanent thermal throttling?

1K views 23 replies 6 participants last post by  PR-Imagery 
#1 ·
(I'm not sure how else to phrase this and Google isn't helping.)

TL;DR I'm mining on an R9 280 and, after a few days, it will throttle to 501MHz (P2 state IIRC) and remain there indefinitely. This is fixed with a reboot or by removing and re-adding the device in Windows, but it's annoying and I want a fix to prevent it from happening.

Based on readouts from HWiNFO64, the core remains in its highest-speed P0 state until the temperature maxes out at about 102­­°C, and then the GPU throttles. It will quickly cool down and then proceed to boost back to the highest speed, as expected. However, the problem is that if this happens more than once, the GPU will permanently stop boosting and I have to manually reboot or otherwise restart the drivers. There's no real reason for this since, even when mining, the 501MHz core clock leads to maybe a 50­°C core temperature.

Is there a way to allow the thing to boost again after overheating?
 
#2 ·
Quote:
Originally Posted by CynicalUnicorn View Post

(I'm not sure how else to phrase this and Google isn't helping.)

TL;DR I'm mining on an R9 280 and, after a few days, it will throttle to 501MHz (P2 state IIRC) and remain there indefinitely. This is fixed with a reboot or by removing and re-adding the device in Windows, but it's annoying and I want a fix to prevent it from happening.

Based on readouts from HWiNFO64, the core remains in its highest-speed P0 state until the temperature maxes out at about 102­­°C, and then the GPU throttles. It will quickly cool down and then proceed to boost back to the highest speed, as expected. However, the problem is that if this happens more than once, the GPU will permanently stop boosting and I have to manually reboot or otherwise restart the drivers. There's no real reason for this since, even when mining, the 501MHz core clock leads to maybe a 50­°C core temperature.

Is there a way to allow the thing to boost again after overheating?
Turn your fan speed to 100% so it doesn't overheat.
 
#3 ·
Quote:
Originally Posted by nvidiaftw12 View Post

Turn your fan speed to 100% so it doesn't overheat.
Wow thanks it's like I'm not doing that.
rolleyes.gif


I've got an AIO strapped to it with two fans in push-pull. It's a piece of crap and there could very well be a loose connection, but that shouldn't be the problem. It sits at about 80°C for days on end before getting too hot and throttling.

Also the only other person I've found reporting an issue like mine is this less-than-hepful reddit post: https://www.reddit.com/r/Alienware/comments/5uxr3b/alienware_13_r3_permanent_gpu_throttling/
 
#5 ·
Quote:
Originally Posted by PR-Imagery View Post

Well, when I have that happen it's usually because of a driver crash.
Temperature seems more likely for me. I've seen it happen before when messing with modded BIOSes within a few minutes (~1100MHz core, 1.25V): it heats past 100°C, throttles, boosts when cooled, throttles again, and never boosts again until a reboot. Additionally, since HWiNFO64 records the maximum and minimum values of some measurement, I can tell that it has at some point needed to throttle.

I haven't actually experienced any driver crashes on this system I don't think.
 
#6 ·
Quote:
Originally Posted by CynicalUnicorn View Post

Wow thanks it's like I'm not doing that.
rolleyes.gif


I've got an AIO strapped to it with two fans in push-pull. It's a piece of crap and there could very well be a loose connection, but that shouldn't be the problem. It sits at about 80°C for days on end before getting too hot and throttling.

Also the only other person I've found reporting an issue like mine is this less-than-hepful reddit post: https://www.reddit.com/r/Alienware/comments/5uxr3b/alienware_13_r3_permanent_gpu_throttling/
That's honestly pretty sad. It should cool better than that. I guess the 280 is a hot card, but it can't be that much hotter than mine? I can hold 70 or lower in furmark with my little vapor chamber. Does it get that hot in furmark or does mining push it harder?



63C max, on 6 year old thermal paste in a 24C room. C'mon. You're AIO should perform at least that well.

BTW, when I did that test it blew out so much dust I had a coughing fit and had to go outside. I think it's probably time to clean this computer.
 
#7 ·
Quote:
Originally Posted by nvidiaftw12 View Post

That's honestly pretty sad. It should cool better than that. I guess the 280 is a hot card, but it can't be that much hotter than mine? I can hold 70 or lower in furmark with my little vapor chamber. Does it get that hot in furmark or does mining push it harder?

63C max, on 6 year old thermal paste in a 24C room. C'mon. You're AIO should perform at least that well.
CryptoNight is causing it to sit at a 99% load lol. I'm running stock clocks but undervolted. Stopping mining takes the UPS load from 320W to 190W, so +130W under load. I'm not sure how accurate that is though.

It's probably an issue with the cooler now that I think about it, but the first problem still stands and is hecking annoying.
 
#8 ·
Quote:
Originally Posted by CynicalUnicorn View Post

CryptoNight is causing it to sit at a 99% load lol. I'm running stock clocks but undervolted. Stopping mining takes the UPS load from 320W to 190W, so +130W under load. I'm not sure how accurate that is though.

It's probably an issue with the cooler now that I think about it, but the first problem still stands and is hecking annoying.
I'm trying to help you fix it at a deeper level. Which is that that cooler blows. What happened to the stock one?
 
#10 ·
Quote:
Originally Posted by CynicalUnicorn View Post

Quote:
Originally Posted by PR-Imagery View Post

Well, when I have that happen it's usually because of a driver crash.
Temperature seems more likely for me. I've seen it happen before when messing with modded BIOSes within a few minutes (~1100MHz core, 1.25V): it heats past 100°C, throttles, boosts when cooled, throttles again, and never boosts again until a reboot. Additionally, since HWiNFO64 records the maximum and minimum values of some measurement, I can tell that it has at some point needed to throttle.

I haven't actually experienced any driver crashes on this system I don't think.
Checked event viewer for display errors?

In any case, the heat is making the card unstable and probably causing the driver to eventually crash.
 
#11 ·
Quote:
Originally Posted by iinversion View Post

Sounds like you need a remount/reapplication of thermal paste. It definitely shouldn't be getting anywhere near that hot on a properly mounted AIO. I had 270Xs strapped to H50 AIOs in 2014 mining and they would sit around 50C.
.

I agree with this...Take it apart, if you know how without voiding warranty, etc. Make sure things are copasetic. IE: proper thermal paste, no missing/damaged thermal pads, You see permanent impressions on thermal pads that show some sort of contact. ETC.
 
#12 ·
Quote:
Originally Posted by nvidiaftw12 View Post

I'm trying to help you fix it at a deeper level. Which is that that cooler blows. What happened to the stock one?
Quote:
Originally Posted by iinversion View Post

Sounds like you need a remount/reapplication of thermal paste. It definitely shouldn't be getting anywhere near that hot on a properly mounted AIO. I had 270Xs strapped to H50 AIOs in 2014 mining and they would sit around 50C.
Quote:
Originally Posted by EastCoast View Post

.

I agree with this...Take it apart, if you know how without voiding warranty, etc. Make sure things are copasetic. IE: proper thermal paste, no missing/damaged thermal pads, You see permanent impressions on thermal pads that show some sort of contact. ETC.
Ugh fine. I'll fix it tomorrow I guess.

Still, annoying issue with the drivers.

I'll check event viewer and see if anything has crashed.
 
#16 ·
Quote:
Originally Posted by nvidiaftw12 View Post

Holy crap just fix it before you kill it.
mad.gif


Alright, looks like there was some problem with the cooler's contact with the GPU die. I was worried I had the older version of Tahiti where the support brace is a bit taller than the GPU die, but it looks like I got an updated version where it's flush. Cool. Took off the AIO's block, cleaned off the TIM, added new TIM (the same kind from the same tube lol), and screwed it back on tighter than I reasonably should have.

A Cryptonight load is barely above ambient right now and very, very slowly climbing.

While my problem should be fixed, the question remains: can I disable this safety feature that prevents boosting until a reboot? I remember when I had a similar problem with an old Nvidia card that it would climb to its Tjmax, throttle down about 20°, but continue to boost no matter how many times it had throttled previously.
 
#17 ·
Quote:
Originally Posted by CynicalUnicorn View Post

mad.gif


Alright, looks like there was some problem with the cooler. I was worried I had the older version of Tahiti where the support brace is a bit taller than the GPU die, but it looks like I got an updated version where it's flush. Cool. Took off the AIO's block, cleaned off the TIM, added new TIM (the same kind from the same tube lol), and screwed it back on tighter than I reasonably should have.

A Cryptonight load is barely above ambient right now and very, very slowly climbing.

While my problem should be fixed, the question remains: can I disable this safety feature that prevents boosting until a reboot? I remember when I had a similar problem with an old Nvidia card that it would climb to its Tjmax, throttle down about 20°, but continue to boost no matter how many times it had throttled previously.
https://www.newegg.com/Water-Liquid-Cooling/SubCategory/ID-575
 
#18 ·
#20 ·
Quote:
Originally Posted by nvidiaftw12 View Post

But are the temps fine?

Also think about it. I doubled my gpu power for $45. That's a lot better than fellas that pay $500 every year for 30% gains.
It's hard to tell. NiceHash claims it's running, HWiNFO64 claims it's a 99% load, I went back to my 1120MHz profile (31% more power), but it's been sitting under 40°C for several minutes.

I don't know if it was just mounted that badly or if it isn't mining.
headscratch.gif
 
#22 ·
Quote:
Originally Posted by nvidiaftw12 View Post

Download furmark. Run it for 5 minutes.

And you can thank me later for my continuous nagging. Apparently it works though.
ok mom lol

I think it's just running cool at this point. Turns out heatsinks work better when they touch the hot things, who knew? I'll try to push the volts and core clock a bit when I get time later, but 1200MHz is the maximum I can reasonably expect and that's at most a 7% gain assuming everything relies on the core speed. Memory is stuck at 1500MHz. I can try 1525MHz, but that's less than 2% and becomes unstable quickly.
 
#24 ·
Quote:
Originally Posted by CynicalUnicorn View Post

Quote:
There I fixed it:

Quote:
Originally Posted by CynicalUnicorn View Post

Alright, looks like there was some problem with the cooler's contact with the GPU die.
Are you happy, Mr. I-bought-a-6950-in-2017?

Cooler is fine, GPU is fine, TIM is fine, but the contact came loose somehow.
I bought a GTX 285 last year.
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top