I have 3 similar rigs:
Asrock H81 PRO BTC
6x Sapphire RX 480 Nitro+ 4GB
2x PSU 830W
2x 4GB DDR3 1600Mhz
6x Powered Raisers
Ubuntu 16.04 with AMDGPU-Pro 16.40
Mining starts fine, without errors. But after some time(about 30minuts - 2h) Several GPU stops mining.
And I can’t just stop & run miner, I have to reboot my system to bring them up.
I have tried to switch Risers, Mobo, different miners, tweaking fans and so one… But problem still persists. Maybe someone have faced similar problem and have solution to fix it? Or have clues of root cause?
Would be helpful if you mention which mining software you are using. I have a custom go program to watch for dead gpu’s and reboot and restart silentarmy miner automatically. As far as I know this is a flaw with the kernel and is a known issue.
I have tried Silentarmy & claymore. After GPU stops mining, only hard reset can help. System hangs on on “reboot” command, with solver Zombie processes.
With this hardware you should try to copy the 1500 memory strap to 2000 in the firmware of the GPUs.
It greatly boosts the mining performance for a small power and heat cost.
@secbrain I have a similar problem with a Sapphire 470x on Ubuntu 16.04 . It mines slower and slower and then it stops, and it cannot longer be detected by lspci.
One has to restart the machine, and when doing so I sometimes get 8 beeps meaning the motherboard thinks there is something wrong with the graphics card. Restarting again usually solves it. It is an 8 year old MoBo, so I am assuming that the MoBo is somehow locking the GPU by brownouting it (low voltage), or by crashing it through wrong frequency/timing or such. But since you have new MoBos explicitly made for mining and have the same problem, and it is Sapphire cards again, maybe the problem is in the Sapphire cards?
Many times the card crashes as soon as mining starts, that is it responds to ./silentarmy --list but crashes on e.g. ./silentarmy . I got it running with only one instance ./silentarmy --instances=1 , so maybe the cards ramps something up, amperage, frequency or such, that the motherboard cannot handle gracefully. I do have a new 700W PSU though. Currently thinking about frequencies & timings but I am a bit out of my league
I’ve tried silentarmy on Ubuntu 16.04, Ubuntu 14.04 (with the fglrx driver) and also the silentarmy AUR package for ArchLinux. Same problem on all which makes me think it’s not a driver problem, but closer to hardware/firmware. Claymore also crashes the card on 16.04 (have not tried Claymore on all combinations). I have one PSU and only one card, thinking of getting more but first one card should work of course. It did work splendidly for days, and then it stopped. I did poke bit in the device tree on 16.04:
/sys/class/drm/card0/device/hwmon/hwmon
see: here
and i guess the problems started after that poking although I set everything back. I then flashed the Bios of the old MoBo to a newer version and sometimes it complains that overclocking failed, but I have not overclocked it. So I will just try another MoBo after I get the riser card and see if that works better.
@bio don’t know if you are asking me but what is PCI “auto” setting? Is that in the motherboard for selecting what graphic card to use? The MoBo has no GPU and it seems to work fine using the Radeon, up until its mining time. AMD APP SDK is installed.
Setting “pcie_aspm=powersave” (without the quotes) in the Linux kernel startup line finally got my Sapphire Rx 470 to mine, at around 115 Sols/s with optiminer. Been stable now for 24 hours.
Problem seems to have been that the card overburdened the PCIE architecture/BIOS of the 8 year old motherboard somehow, and pcie_aspm=powersave tells it to tone it down and downgrade a bit afaict.