Tips for Nvidia Linux miners

I been absent from the forums as of late. Been very busy with work, not with rigs, they have been flawless. I have not even thought about my rigs in a couple of months, they take care of themselves (they had a power outage a few weeks ago restarted fine on their own). I have a couple of close mates that I wrote a Linux / Nvida instruction manual for, and they both have been running smoothly for several months as well. Have been thinking about posting but have a commercial deal pending that prevents me from doing that.

Anyways, I am converting one rig to Deep Learning AI Rig via CUDA and TensorFlow. Been a royal pain in the ass to get the Nvidia GPUā€™s free from X11 for pure compute mode. It is possible, and I even got it working on one mining rig. To the point, the login loop that happens with Nvidia drivers and the xorg system is caused by the open GL drivers and the integrated Intel GPU on the mobo. You can install the Nvidia drivers without open GL support and the problem goes away (you also need to blacklist the Noveau drivers when you do this, use: ā€œsudo NVIDIA-Linux-x86_64-38x.xx.run --no-opengl-filesā€). Mind you that once open GL is installed, you cant un-install, and have to start over with a fresh install of Ubuntu. If you force the EDID.bin X11 option you can get the Integrated graphics card to run video and keep the Nvidia cards in a compute only mode. Gets you an additional 50-60 Sol on GPU0 and eliminates the login loop issues that happen occasionally. Its working great for TensorFlow but I still see some bugs with the minerā€¦ Iā€™m not spending much time mining on this rig as its now an AI Neural net workstation, but once I have the time, I want to convert to a mining image I can use on the rigs and I will update.

Hope this helps.

2 Likes

I have never been more frustrated with linux than the times I am trying to setup the over clocking and controlling fan speed. I have never reached a point where I can control the fan speed of more than 1 card using cool bits. It have never worked for meā€¦ and I have spent around 80 hrs just to get it up. Now I am running Windows ā€¦ I have encountered all the problems you can think ofā€¦ login loops, xorg reset, xorg crash and the best part isā€¦ following the same process 5 different times, I get 5 different resultsā€¦ I have started to believe Albert Einstein was wrongā€¦ you can do the same things over n over and expect different results when working with linux :wink:

I have the opposite experience. Windows has been nothing but trouble so all my rigs run Linux. Stock Ubuntu to be specific where I never had Xorg problems or controlling individual card clocks/fans.

1 Like

Linux God loves you ā€¦ I am sureā€¦

It took me 2 days to setup my own rig for Linux, quite challenging.

sudo apt install nvidia-375
sudo nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration --enable-all-gpus

reboot
start mining

3 Likes

I agree, Windows is nice for testing but not for a production rig.

You need auto start and auto restart unless you want to baby sit rigs 24X7. They just need to tell you when they have a critical hardware failure like a fanā€¦ the rest they should do on their own.

Case in point: The Temp fluctuation in my facility is different in the summer than winter. In summer the temps get higher at night due to the AC not running as much so the rigs get hotter (its the air flow and thermostat location). They auto down regulate each rigs power at night so that GPU temps donā€™t go over 70C. During the day they up-regulate (when AC is running more). When in auto over-clock mode the software will find the maximum over-clock possible for each GPU that keeps a GPU lockup/freeze rate below once every 24 hours. Once the cycle is complete I can get up to 800Sol/s out of a GTX 1080 ti.

Software needs to detect every rig issue and restart smoothly 100% of the time. Try to do that in Windows.

2 Likes

Ohh I absolutely agree and thatā€™s the reason I tried a lot to put linux on my rigsā€¦ my main motivation for linux was to reboot using the magic key ā€¦ But something or the other doesnā€™t work. The best point we reached was when we were able to control 1 fan but we were never able to control fan for 2 cards, no matter what we triedā€¦ We used 1080ti and Asus z270f ā€¦ we have followed numerous guides online to make it work but nothing worked

Control fan as in controlling each individual fan or individual graphics cards?

Individual cards. My system has never worked beyond 1 card. As soon as I start with controlling second fan it stops working ā€¦ nothing works after the commandā€¦ or gives an attribute errorā€¦

$ nvidia-settings -a "GPUTargetFanSpeed=50" -a "[fan:0]/GPUTargetFanSpeed=60" -a "[fan:2]/GPUTargetFanSpeed=40"  

  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:0]) assigned value 50.                           
  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:1]) assigned value 50.                           
  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:2]) assigned value 50.                           
  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:3]) assigned value 50.                           
  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:4]) assigned value 50.                           
  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:5]) assigned value 50.                           

  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:0]) assigned value 60.                           

  Attribute 'GPUTargetFanSpeed' (miner4:0[fan:2]) assigned value 40.                           

$ nvidia-smi 
Thu Sep 28 01:12:50 2017                       
+-----------------------------------------------------------------------------+                
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |                
|-------------------------------+----------------------+----------------------+                
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |                
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |                
|===============================+======================+======================|                
|   0  GeForce GTX 108...  Off  | 0000:01:00.0     Off |                  N/A |                
| 60%   71C    P2   225W / 220W |    624MiB / 11171MiB |    100%      Default |                
+-------------------------------+----------------------+----------------------+                
|   1  GeForce GTX 108...  Off  | 0000:02:00.0     Off |                  N/A |                
| 50%   62C    P2   217W / 220W |    583MiB / 11172MiB |     99%      Default |                
+-------------------------------+----------------------+----------------------+                
|   2  GeForce GTX 108...  Off  | 0000:03:00.0     Off |                  N/A |                
| 40%   60C    P2   220W / 220W |    583MiB / 11172MiB |     99%      Default |                
+-------------------------------+----------------------+----------------------+                
|   3  GeForce GTX 108...  Off  | 0000:05:00.0     Off |                  N/A |                
| 50%   47C    P2   218W / 220W |    583MiB / 11172MiB |     99%      Default |                
+-------------------------------+----------------------+----------------------+                
|   4  GeForce GTX 108...  Off  | 0000:06:00.0     Off |                  N/A |                
| 50%   67C    P2   221W / 220W |    583MiB / 11172MiB |     99%      Default |                
+-------------------------------+----------------------+----------------------+                
|   5  GeForce GTX 108...  Off  | 0000:07:00.0     Off |                  N/A |                
| 50%   64C    P2   219W / 220W |    583MiB / 11172MiB |     99%      Default |                
+-------------------------------+----------------------+----------------------+

Miner start script commands to set Nvidia fan speeds and overclock in Linux is below:

nvidia-settings -a [gpu:0]/GPUFanControlState=1
nvidia-settings -a [fan:0]/GPUTargetFanSpeed=100
nvidia-settings -a [gpu:0]/GPUGraphicsClockOffset[3]=200
nvidia-settings -a [gpu:0]/GPUMemoryTransferRateOffset[3]=600
nvidia-settings -a [gpu:1]/GPUFanControlState=1
nvidia-settings -a [fan:1]/GPUTargetFanSpeed=100
nvidia-settings -a [gpu:1]/GPUGraphicsClockOffset[3]=200
nvidia-settings -a [gpu:1]/GPUMemoryTransferRateOffset[3]=600

However, nothing will work if you do not have your xorg.conf setup properly. All your cards should show up in the Nvidia control panel (but should also work with all headless). If you cant set the fans and over clock in the control panel then commands above will not work either. If you can only mine on the card with a monitor plugged in then you donā€™t have your xorg.conf setup right. You need to spoof a monitor via the edid.bin option in the Screen section:
Option ā€œCustomEDIDā€ ā€œDFP-0:/etc/X11/edid.binā€

Linux is all about getting X11 configured correctly for Nvidia, the rest is easy.

You seem to be very well informed in the world of programming and running in Linux. I am understanding just about everything I see you post. However I still need help on the how a lot of times. Seeing how you are currently prevented from sharing your code is there any place you be able to point me at to learn more. I seem to have an issue finding any teaching instructions that fall into my range. Its either to simple with too much information or too complex and things are missing. Thanks

set cool-bits to 28 that should enable core/mem overclocking, power changes and fan speed changes for blower-card styles. GPUS with more than one fan arenā€™t controllable on Linux. EVGA ones should have internal fan-speed curves that adjust themselves.

1 Like

Also, you can just add this to your ~/.bashrc file:

alias setup='nvidia-settings -a GPUFanControlState=1; nvidia-settings -a GPUTargetFanSpeed=100; nvidia-settings -a GpuPowerMizerMode=1'
    
# powlvlall [optional_power_level]
powlvlall() {
    # set power level
    if [ -n "$1" ]; then
        local POWER="$1"
    else
        local POWER="250"
    fi
    sudo nvidia-smi -pl $POWER
}

# oclockcount [zero_indexed_gpu_count] [optional_graphics] [optional_memory]
oclockcount() {
    if [ -n "$1" ]; then
        # set graphics
        if [ -n "$2" ]; then
            local GRAPHICSVAL="$2"
        else
            local GRAPHICSVAL="0"
        fi
        # set memory
        if [ -n "$3" ]; then
            local MEMORYVAL="$3"
        else
            local MEMORYVAL="0"
        fi
        # loop
        for i in $(seq 0 $1)
        do
            nvidia-settings -a [gpu:$i]/GPUGraphicsClockOffset[3]=$GRAPHICSVAL
            nvidia-settings -a [gpu:$i]/GPUMemoryTransferRateOffset[3]=$MEMORYVAL
        done
    fi
}

Then, re-source bashrc:

$ source ~/.bashrc

The following would overclock a 12 pascal GPU system to core +60 mem +120:

$ oclockcount 11 60 120 

As ZC93 said, you need your xorg config set properly.

You would do that with:

$ nvidia-xconfig --enable-all-gpus
$ nvidia-xconfig --cool-bits=28
$ nvidia-xconfig --allow-empty-initial-configuration
$ sudo reboot

After the reboot you should be able to use the above. If youā€™re sshā€™ing in you may need to do a display export:

$ export DISPLAY=:0

You can turn that into a bash alias as well if you want to call it from the setup alias (easier).

Iā€™ve found that a python loop that checks statuses/powlevel/temps/oc by using nvidia-smi and calling these aliases with os.system() calls is pretty effective. Maybe Iā€™ll post my python script (for 1080tis) if ZC93 shares more about the compute mode setup.

cheers

You can string arguments together

alias setup='nvidia-settings -a GpuPowerMizerMode=1 -a GPUFanControlState=1 -a GPUTargetFanSpeed=100'
$ nvidia-xconfig -a --cool-bits=28 --allow-empty-initial-configuration

and there is no need to reboot

$ sudo systemctl restart lightdm

Yep. @anon92308673 is correct. I just havenā€™t updated my aliases and copy/pasted.

I wonder if ā€˜sudo systemctl restart lightdmā€™ will restart gpus that fall off of the bus?

1 Like

That just restarts Xorg. PCI initialization seems like it would be the job of the BIOS or kernel.

1 Like

Hey Guys,

during my setting up a linux miner the last two days, I was really struggling getting the machine ā€œrealā€ headless and running a miner + overclocking / underpowering the rig. In the end (after several reinstalls) I just accepted, that the rig will need X11 and those GPUs have a little memory reserved for Xorg.

Ā±----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------Ā±---------------------Ā±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106ā€¦ On | 0000:01:00.0 Off | N/A |
| 63% 54C P2 80W / 80W | 498MiB / 3012MiB | 100% Default |
Ā±------------------------------Ā±---------------------Ā±---------------------+
| 1 GeForce GTX 106ā€¦ On | 0000:02:00.0 Off | N/A |
| 65% 59C P2 80W / 80W | 490MiB / 3013MiB | 100% Default |
Ā±------------------------------Ā±---------------------Ā±---------------------+
| 2 GeForce GTX 106ā€¦ On | 0000:06:00.0 Off | N/A |
| 66% 62C P2 77W / 80W | 490MiB / 3013MiB | 100% Default |
Ā±------------------------------Ā±---------------------Ā±---------------------+
| 3 GeForce GTX 106ā€¦ On | 0000:09:00.0 Off | N/A |
| 63% 55C P2 79W / 80W | 490MiB / 3013MiB | 100% Default |
Ā±------------------------------Ā±---------------------Ā±---------------------+

Ā±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1021 G /usr/lib/xorg/Xorg 0MiB |
| 0 2072 C ./dstm/zm 479MiB |
| 0 2101 G /usr/lib/xorg/Xorg 15MiB |
| 1 1021 G /usr/lib/xorg/Xorg 0MiB |
| 1 2072 C ./dstm/zm 479MiB |
| 1 2101 G /usr/lib/xorg/Xorg 7MiB |
| 2 1021 G /usr/lib/xorg/Xorg 0MiB |
| 2 2072 C ./dstm/zm 479MiB |
| 2 2101 G /usr/lib/xorg/Xorg 7MiB |
| 3 1021 G /usr/lib/xorg/Xorg 0MiB |
| 3 2072 C ./dstm/zm 479MiB |
| 3 2101 G /usr/lib/xorg/Xorg 7MiB |
Ā±----------------------------------------------------------------------------+

Just for efficiency (and cleaning up nvidia-smi) purpose I would love to know how to use the iGPU and prevent the nvidia gpus from loading Xorg. But my linux knowledge is at the limit hereā€¦
@ZC93: Would you share a little more details on e.g. how to blacklist nouveau drivers and how to force the EDID.bin X11 option? (what is EDID.bin?!).

try this pre-built mining pendrive os https://ba.net/zcash-eth-nvidia-mining-os/