Sapphire RX 480 Nitro+ eventually stops mining

Hello all!

I have 3 similar rigs:
Asrock H81 PRO BTC
6x Sapphire RX 480 Nitro+ 4GB
2x PSU 830W
2x 4GB DDR3 1600Mhz
6x Powered Raisers
Ubuntu 16.04 with AMDGPU-Pro 16.40

Mining starts fine, without errors. But after some time(about 30minuts - 2h) Several GPU stops mining.
And I can't just stop & run miner, I have to reboot my system to bring them up.

I have tried to switch Risers, Mobo, different miners, tweaking fans and so one... But problem still persists. Maybe someone have faced similar problem and have solution to fix it? Or have clues of root cause?

Thanks in advance

Would be helpful if you mention which mining software you are using. I have a custom go program to watch for dead gpu's and reboot and restart silentarmy miner automatically. As far as I know this is a flaw with the kernel and is a known issue.

Nice setup. it might be an issue with the mining software.
temporary fix: setup a cron job that restart the miner every 30mins.

it would be so nice of you if you share the Sol/s rate for the Rx 480 Nitro.

I have tried Silentarmy & claymore. After GPU stops mining, only hard reset can help. System hangs on on "reboot" command, with solver Zombie processes.

Could be a driver issue.
Did you check /var/log/syslog?

With Silentarmy miner Rig gives about 245 total sols, ~40-41 for each GPU.

1 Like

A bit interesting that claymores kernel dies much the same way that the open source silentarmy kernel does...

Anyway you can use a hard reset command like this for an auto restart.

echo '#!/bin/bash
echo "b" > /proc/sysrq-trigger' > ~/tryagain
sudo chown root ~/tryagain
sudo chmod +x ~/tryagain
sudo chmod u+s ~/tryagain

That's what my go program executes when it finds a dead gpu. Restarting at a set time is a bad idea, just watch for a dead gpu.

I'll be releasing an ubuntu iso with all of these scripts running if there is enough interest in it

1 Like

With this hardware you should try to copy the 1500 memory strap to 2000 in the firmware of the GPUs.
It greatly boosts the mining performance for a small power and heat cost.

Yes, have checked just right now, when GPU stopped. Nothing suspicious there...

Looks like that Claymore use same Silent army source code :slight_smile:

I will try your script, thank you. And of course I will be interested in ubuntu iso.

@secbrain I have a similar problem with a Sapphire 470x on Ubuntu 16.04 . It mines slower and slower and then it stops, and it cannot longer be detected by lspci.

One has to restart the machine, and when doing so I sometimes get 8 beeps meaning the motherboard thinks there is something wrong with the graphics card. Restarting again usually solves it. It is an 8 year old MoBo, so I am assuming that the MoBo is somehow locking the GPU by brownouting it (low voltage), or by crashing it through wrong frequency/timing or such. But since you have new MoBos explicitly made for mining and have the same problem, and it is Sapphire cards again, maybe the problem is in the Sapphire cards?

Many times the card crashes as soon as mining starts, that is it responds to ./silentarmy --list but crashes on e.g../silentarmy . I got it running with only one instance ./silentarmy --instances=1 , so maybe the cards ramps something up, amperage, frequency or such, that the motherboard cannot handle gracefully. I do have a new 700W PSU though. Currently thinking about frequencies & timings but I am a bit out of my league :smiley:

I have the same problem with a 6 gpu rx 480 8gb rig.

Will that work with a 4GB card as well. That's what the OP has here.

1 Like

I don't think on cards because I have tested them separately and everything was fine. On stock or modified bios, doesn't matter :slight_smile:

And cards stop mining at once, there is no slowly falling mining speed. But you have one PSU or two PSU in your rig?

I've tried silentarmy on Ubuntu 16.04, Ubuntu 14.04 (with the fglrx driver) and also the silentarmy AUR package for ArchLinux. Same problem on all which makes me think it's not a driver problem, but closer to hardware/firmware. Claymore also crashes the card on 16.04 (have not tried Claymore on all combinations). I have one PSU and only one card, thinking of getting more but first one card should work of course. It did work splendidly for days, and then it stopped. I did poke bit in the device tree on 16.04:
/sys/class/drm/card0/device/hwmon/hwmon
see: here
and i guess the problems started after that poking although I set everything back. I then flashed the Bios of the old MoBo to a newer version and sometimes it complains that overclocking failed, but I have not overclocked it. So I will just try another MoBo after I get the riser card and see if that works better.

Yes indeed, it works.
I use the 1500 memory strip for the 2000 setting.
It gains me about 15% more hashing power.

But the card i tested gets unstable for gaming. :banana:

I assume AMD APP SDK v3.0 was also installed, right with BIOS on PCI "auto" setting?

@bio don't know if you are asking me but what is PCI "auto" setting? Is that in the motherboard for selecting what graphic card to use? The MoBo has no GPU and it seems to work fine using the Radeon, up until its mining time. AMD APP SDK is installed.

Setting "pcie_aspm=powersave" (without the quotes) in the Linux kernel startup line finally got my Sapphire Rx 470 to mine, at around 115 Sols/s with optiminer. Been stable now for 24 hours.

Problem seems to have been that the card overburdened the PCIE architecture/BIOS of the 8 year old motherboard somehow, and pcie_aspm=powersave tells it to tone it down and downgrade a bit afaict.

Powersafe was a bad choice for me, as it pulls my cards to idle.
Performance was the setting that worked for me.