Titan V near 1000sol/s

btw, if compile with cuda70, miner crash

OOPS4: id 63.88 vx 7f ux 75 vs 7f
OOPS4: id 63.89 vx 7f ux 75 vs 7f
OOPS4: id 63.90 vx 7f ux 75 vs 7f
OOPS4: id 63.91 vx 7f ux 75 vs 7f
OOPS4: id 63.92 vx 7f ux 75 vs 7f
OOPS4: id 63.93 vx 7f ux 75 vs 7f
OOPS4: id 63.94 vx 7f ux 75 vs 7f
OOPS4: id 63.95 vx 7f ux 75 vs 7f
GPUassert: an illegal memory access was encountered mean_miner.cu 859

thx smartbitcoin!

That’s exactly the same speed as the 1080 Ti … kinda disappointing :frowning:

So the 1 second barrier is yet to be broken…

Download Nvidia profile inspector and turn off CUDA - Force P2 state.

1 Like

Same results here. Best nonce time is 994ms.

Just wondering. What DEDGEBITS is for?

Changing the P2 state through the inspector have no effect on Titan V. You can slightly overclock it but you get no boost unless you’re gaming. Under Linux I was even unable to overclock, so the results of 994ms are for the default clock speed of 1335 MHz which is about 64% power usage. With a small overclock (which I was able to do only under Windows) it goes up to 1575 MHz which is noticeable gain, but still no boost (leads to lower voltage), and power usage is still low at around 75%.

I’ll try to compile the mean_miner.cu under Windows. With it I shall be able to go under 1 sec.

Power Mod 200+/200+MHz
2x Power Draw: GPU Shunt Shorting Mod (Titan V) - YouTube maybe modding and 1100sol/s

Compiler flag -DEDGEBITS=29 defines preprocessor symbol EDGEBITS as 29.
This is the number of bits in an edge index. So there are 2^29 edges, and twice as many nodes (2^30).
The GPU code is written specifically for 2^30-node graphs and likely to break if you redefine this symbol.

Not sure if this is related but tried F@H and Titan V vs Titan X (Pascal) are giving exactly the same performance results, similar with what you’re measuring vs 1080ti. The V is not boosting, but X is. The V is using just 40% TDP while the X is at 80%. The X is actually reaching thermal limit as it’s at the top of 3 cards.

For me this seems like the V is having hidden potential, but the software is not optimized for it.

I don’t think it’s possible to optimize any software to reach >= 80% of TDP.
With Cuckoo Cycle, there is simply not that much to compute, as the siphash
computations (each using just 4 64-bit ints) are very cheap, and there’s a lot to sort.

Gamers Nexus short shunt the gpu and reached pulling double the power.

Any ways the Titan V has 4x4GB of ram installed on their chip and one is disabled (many say that the processor is same as server model).

If one day anyone will find out the way to optimize the performance of these cards, Volta’s is going to be the new cards for mining. I’m already considering buying one, ROI is longer than a 1080ti, but the efficiency and actual potential is priceless. Value of these cards will stay stable for a pretty long time hopefully and finding a second hand buyer should take no time (if all goes wrong)

While writing a GPU tuning guide, at

I realized I can already achieve sub second times (0.98 s) on a 1080 Ti by using 128 thread blocks. I’ve now made that the new default. You may want to try the Titan again and use the tuning guide to see if performance can be further improved.

1 Like

@smartbitcoin what program are you using to mine zCash with Titan V. I can’t get DTSM or bminer to work :(.

izhekov@threadripper:~/cuckoo/src$ ./cuda30 -r 10 -b 128
TITAN V with 11GB @ 3072 bits x 850MHz
Looking for 42-cycle on cuckoo30(“”,0-9) with 50% edges, 128*128 buckets, 240 trims, and 128 thread blocks.
Using 2680MB bucket memory and 21MB memory per thread block (5392MB total)
nonce 0 k0 k1 k2 k3 a34c6a2bdaa03a14 d736650ae53eee9e 9a22f05e3bffed5e b8d55478fa3a606d
4-cycle found
2-cycle found
16-cycle found
26-cycle found
40-cycle found
findcycles completed on 38906 edges
Time: 904 ms
nonce 1 k0 k1 k2 k3 be6c0ae25622e409 ede28d78411671d4 74ffaa51c7aa70ac 2ab552193088c87a
4-cycle found
282-cycle found
1006-cycle found
390-cycle found
findcycles completed on 33234 edges
Time: 896 ms
nonce 2 k0 k1 k2 k3 543485efe8555e24 2d56d531df445967 821a4361f6f57e4 2eefbb55f1490553
210-cycle found
996-cycle found
findcycles completed on 36891 edges
Time: 885 ms
nonce 3 k0 k1 k2 k3 5d869ae27494696c cbd3d38a013269 ba7c8d12fef80ffc 955b4761ba671c90
10-cycle found
124-cycle found
findcycles completed on 34629 edges
Time: 888 ms
nonce 4 k0 k1 k2 k3 8f593cc09c669cb9 8f050519b946b0a 51e5183ef55c246f e2c928f1b6fcd8c0
2-cycle found
findcycles completed on 35331 edges
Time: 885 ms
nonce 5 k0 k1 k2 k3 799e44a827e86345 2fb8339b6210d9ae bc1499774d8b80d5 672972b5c29e3401
2-cycle found
28-cycle found
604-cycle found
142-cycle found
findcycles completed on 31370 edges
Time: 889 ms
nonce 6 k0 k1 k2 k3 848726caa726cb62 99ef1cb697424e37 7a588cccbdf97a19 d2322df1b4982e77
24-cycle found
58-cycle found
30-cycle found
142-cycle found
findcycles completed on 35652 edges
Time: 886 ms
nonce 7 k0 k1 k2 k3 e1a9db0a3c0febde 5217c17370e90996 876375a3b56f19cc 288821230e8bc959
4-cycle found
6-cycle found
304-cycle found
findcycles completed on 35921 edges
Time: 882 ms
nonce 8 k0 k1 k2 k3 e887f5b75a21ee32 93f2db86af87b9ce c4effacacf573e88 5a41208f467f8e41
14-cycle found
40-cycle found
findcycles completed on 31755 edges
Time: 884 ms
nonce 9 k0 k1 k2 k3 5952591849c8c5f7 fdf6b4aa3335fad5 fb11814cb76c01b4 c7405dbe8c2433a3
2-cycle found
58-cycle found
144-cycle found
findcycles completed on 34965 edges
Time: 888 ms
0 total solutions

Btw, I failed to compile it for Windows. Too much Linux dependencies I was unable to solve.

Thanks for the results. Your 882 ms is the official world record time for solving a billion node Cuckoo Cycle. The previous record was 964 ms by an NVIDIA 1080 Ti.
So the Titan is 9.3% faster. Not quite enough to justify the extra cost, but
a record nonetheless!

Actually… after a couple of more attempts…

izhekov@threadripper:~/cuckoo/src$ ./cuda30 -r 10 -T 64 -V 384 -Y 1024
TITAN V with 11GB @ 3072 bits x 850MHz
Looking for 42-cycle on cuckoo30(“”,0-9) with 50% edges, 128*128 buckets, 240 trims, and 128 thread blocks.
Using 2680MB bucket memory and 21MB memory per thread block (5392MB total)
nonce 0 k0 k1 k2 k3 a34c6a2bdaa03a14 d736650ae53eee9e 9a22f05e3bffed5e b8d55478fa3a606d
4-cycle found
2-cycle found
findcycles completed on 6 edges
Time: 885 ms
nonce 1 k0 k1 k2 k3 be6c0ae25622e409 ede28d78411671d4 74ffaa51c7aa70ac 2ab552193088c87a
4-cycle found
findcycles completed on 4 edges
Time: 856 ms
nonce 2 k0 k1 k2 k3 543485efe8555e24 2d56d531df445967 821a4361f6f57e4 2eefbb55f1490553
findcycles completed on 0 edges
Time: 861 ms
nonce 3 k0 k1 k2 k3 5d869ae27494696c cbd3d38a013269 ba7c8d12fef80ffc 955b4761ba671c90
10-cycle found
findcycles completed on 46 edges
Time: 857 ms
nonce 4 k0 k1 k2 k3 8f593cc09c669cb9 8f050519b946b0a 51e5183ef55c246f e2c928f1b6fcd8c0
2-cycle found
findcycles completed on 114 edges
Time: 854 ms
nonce 5 k0 k1 k2 k3 799e44a827e86345 2fb8339b6210d9ae bc1499774d8b80d5 672972b5c29e3401
2-cycle found
28-cycle found
findcycles completed on 78 edges
Time: 858 ms
nonce 6 k0 k1 k2 k3 848726caa726cb62 99ef1cb697424e37 7a588cccbdf97a19 d2322df1b4982e77
24-cycle found
findcycles completed on 24 edges
Time: 855 ms
nonce 7 k0 k1 k2 k3 e1a9db0a3c0febde 5217c17370e90996 876375a3b56f19cc 288821230e8bc959
4-cycle found
6-cycle found
findcycles completed on 10 edges
Time: 858 ms
nonce 8 k0 k1 k2 k3 e887f5b75a21ee32 93f2db86af87b9ce c4effacacf573e88 5a41208f467f8e41
14-cycle found
findcycles completed on 32 edges
Time: 855 ms
nonce 9 k0 k1 k2 k3 5952591849c8c5f7 fdf6b4aa3335fad5 fb11814cb76c01b4 c7405dbe8c2433a3
2-cycle found
findcycles completed on 32 edges
Time: 859 ms
0 total solutions

I order 2x Titan V and next week order next two i decision rebuild my rig.

Those latest times are no good, I’m afraid. You can see that you’re finding much fewer cycles than before. I just fixed some bugs in my code related to thread syncing. If you try the latest code, you should see runtimes going back to your earlier runs.
Sorry about that…

EWBF miner. The newest version is 0.3.4b that I know of.

Ahh, gotcha. Bummer, I was hoping you would say DSTM or bMiner or something else. I’m getting 30-40 SOLs more with GTX 1060 and 50 SOLs more with GTX 1080s on DSTM/bMiner versus EWBF. Before you responded, after I tried DSTM and bMiner and they failed, I tried EWBF and it worked right away. My guess however is that if DSTM or bminer were streamlined on the titan, they would give close to 100SOLs more than EWBF.

80% Power, Core clock +120 Memory not OC

Is very impressive I order more

I tested OC 120MHz is crash 2h I change to 110MHZ 80% Power. Result 1020/1040 sol/s and final stabile +100 Core clock 1010/1025sol/s

1 Like