I’m running a rig with 4 1060s, 2 1070s, 1 1080, and 8 1080 tis. Unfortunately after about an hour it just completely stops all all GPUs are stopped at 0 Sol/s with the following error:
ERROR: Looks like GPUX are stuck he not respond.
ERROR: Looks like GPUX are stopped. Restart attempt.
I’m not entirely sure how to debug further. I have the cards spread across 6 power supplies and I’m not entirely sure how to debug further.
I just unplugged the last 3 cards and started it to see if maybe it is a single card having a problem?
Any other suggestions on how to debug this? Given it doesn’t fail until after an hour this won’t be fun to debug.