I know it's not good forum etiquette to answer your own post but here are my steps on how I managed to (finally) fix my card.
step 1: check for actual hardware defects
I managed to find a small SMD capacitor close to the PCI-E connector that was broken off. I soldered it back. Not sure if this had any effect
step 2: set the card to PCI-E [email protected]
This is the most important part. PCI-E 3.0 when running at max bandwith (ex: loading textures into memory, running without an fps limit) seems to crash(?) the PLX chip for some reason and hence disconnect it while running, giving you a BSOD. You shouldn't loose that much performance in games (1440p) and rendering with the card running at 2.0.
Do not use MSI afterburner with the latest version of windows 10 and the AMD 2019 driver. It can cause crashes.
Get the SDK package for the PLX PEX 8747 chip that connects the 2 GPUs with the PC. You'll want to start the software called PLX GenMon and enable logging. Some of the other programs offer other useful debug functions.
Add a capacitor (15V220uF works fine) to the fan on the water cooler. This makes the pumps run much smoother and can make the card stop thermal throttling.
You can also use clock blocker and hawaii bios editor if you want to stop throttling.
Well, that's what I did. For a test I've been running blender for 2 days now without a crash. Previously this would crash the card after 1-2 rendered frames (VRAM intensive renders)