Joined
·
1,362 Posts
So I've got these two Sapphire R9 290s sitting around that I want to put to work as soon as I get my parts and my act together. But in testing, both of them have overheated pretty badly. One of them (we'll call it card #2) overheats even when majorly undervolted and downclocked, so I put it aside for the time being. The one that can almost run at acceptable temp levels @ 935 MHz GPU with a major undervolt was the subject of my TIM replacement.
I opened up the cooler on card #1 - the better of the two - and the first thing I noticed wrong under the hood was that a little doodad fell off the PCB! Specifically, something off a spot labeled C105. Is that a capacitor? Anyway, it just couldn't stand to live life on that card anymore and fell clean off. I lost the cap too 'cuz it's so damn small. I have no idea why it fell off or how long it had been detached from its intended solder point.
The TIM was crusty and horrible, so I replaced it with a very thin layer of CLU which stayed neatly where it was supposed to be. I put the card back together and started it up.
First thing it did was show bogus sensor readings @ idle. @ load the card was fine, and it was doing Ethereum crunching @ 1120 MHz GPU/1250 MHz RAM with a pretty big undervolt (I had previously flashed the card to a low voltage setting). Then the driver crashed and took down the system. Okay, no fun. I figured I'd just have to back off on the GPU clock a bit . . .
well when it came back up, all the sensors were working @ idle (yay) but now it crashes at any clockspeed and generally won't do anything. GPUPI won't run on it, it can't mine Ethereum, it's just . . . hosed.
I backed the GPU clock off to 1050 MHz with another BIOS flash (for whatever reason, Crimson isn't doing anything for it) and that didn't fix things at all, so now I'm stumped. Might be time to throw in the towel on this thing?
edit: I switched the BIOS over to the secondary BIOS and it went back to showing non-functional sensors. Only this time the sensors won't function at load like they did before. It's still causing driver crashes whenever I load up the GPU. I have also tried uninstalling and reinstalling the driver (16.2.1) to no avail.
UPDATE:
Okay, I'm resurrecting this thread to post about this particular card some more. OP updated to reflect new problems.
Basically, I thought I had the card stable as a miner, but after maybe 10 hours of mining, it started crashing nonstop, and it has basically continued since then. It causes driver crashes frequently, sometimes when it isn't even doing anything. This happens even with the stock BIOS. I tried booting into Linux but I found that initializing the card there prevents the OS from loading altogether (seems to crash the desktop environment).
I can sporadically get the card to run ethminer, but it will not last more than a minute or two now, no matter what GPU voltage or clockspeed I pick. Are driver crashes more characteristic of GPU or RAM stability problems?
Also, when it gets particularly cranky, it will cause nonstop THREAD_STUCK_IN_DRIVER errors.
I opened up the cooler on card #1 - the better of the two - and the first thing I noticed wrong under the hood was that a little doodad fell off the PCB! Specifically, something off a spot labeled C105. Is that a capacitor? Anyway, it just couldn't stand to live life on that card anymore and fell clean off. I lost the cap too 'cuz it's so damn small. I have no idea why it fell off or how long it had been detached from its intended solder point.
The TIM was crusty and horrible, so I replaced it with a very thin layer of CLU which stayed neatly where it was supposed to be. I put the card back together and started it up.
First thing it did was show bogus sensor readings @ idle. @ load the card was fine, and it was doing Ethereum crunching @ 1120 MHz GPU/1250 MHz RAM with a pretty big undervolt (I had previously flashed the card to a low voltage setting). Then the driver crashed and took down the system. Okay, no fun. I figured I'd just have to back off on the GPU clock a bit . . .
well when it came back up, all the sensors were working @ idle (yay) but now it crashes at any clockspeed and generally won't do anything. GPUPI won't run on it, it can't mine Ethereum, it's just . . . hosed.
I backed the GPU clock off to 1050 MHz with another BIOS flash (for whatever reason, Crimson isn't doing anything for it) and that didn't fix things at all, so now I'm stumped. Might be time to throw in the towel on this thing?
edit: I switched the BIOS over to the secondary BIOS and it went back to showing non-functional sensors. Only this time the sensors won't function at load like they did before. It's still causing driver crashes whenever I load up the GPU. I have also tried uninstalling and reinstalling the driver (16.2.1) to no avail.
UPDATE:
Okay, I'm resurrecting this thread to post about this particular card some more. OP updated to reflect new problems.
Basically, I thought I had the card stable as a miner, but after maybe 10 hours of mining, it started crashing nonstop, and it has basically continued since then. It causes driver crashes frequently, sometimes when it isn't even doing anything. This happens even with the stock BIOS. I tried booting into Linux but I found that initializing the card there prevents the OS from loading altogether (seems to crash the desktop environment).
I can sporadically get the card to run ethminer, but it will not last more than a minute or two now, no matter what GPU voltage or clockspeed I pick. Are driver crashes more characteristic of GPU or RAM stability problems?
Also, when it gets particularly cranky, it will cause nonstop THREAD_STUCK_IN_DRIVER errors.