Tromp's solvers


#78

Compared to xenoncat, whose methods are described in

https://github.com/xenoncat/equihash-xenon/blob/master/notes/algorithm%20description.pdf

my solver differs in having way more buckets, wasting some memory, having simpler pair compression, being multi-threaded, and supporting (144,5).

And of course in not using any assembly.

Oh, and having some cool visualization of bucket size distribution...


#79

I would prefer to not be refunded and still have access to your code. Your code is in C, so I can tinker with it. I don't know assembly. Also, the CUDA support is also valuable.


#80

Once the xenoncat performance claims are confirmed as I expect they will, then I'll offer full (or partial, if you like to support Cuckoo Cycle) refunds, and open source my solvers anyway.


#81

Wow, you are an absolute legend. Thankyou!


#82

Also thanks to xenoncat, whoever he or she may be (my hat off to you) ...


#83

Performance of xenoncat is confirmed, but due to API mismatch, correctness of solutions found is not confirmed yet. No doubt that will happen soon, and I'm already preparing my commits...


#84

OK; I've decided to bite the bullet. Full source is available at

https://github.com/tromp/equihash

Just run

git clone git@github.com:tromp/equihash.git

make all

and enjoy. I will be contacting contributors in decreasing order of donation, and asking how they want to be refunded....

Whoever sent 1BTC to OgNasty, please let him know how to handle your refund.


#85

Awesome! You rock, Mr. Tromp.


#86

What kind of dependencies are required? Haven't tried to compile yet.


#87

You have open sourced your work -- Thank you! -- please keep my small donation.


#88

No dependencies, I think?! Let me know if you find otherwise...


#89

git clone https://github.com/tromp/equihash

works for me.

I get these results:

3 solutions
3 total solutions
1.75user 0.10system 0:01.86elapsed 99%CPU (0avgtext+0avgdata 216080maxresident)k
0inputs+0outputs (0major+7298minor)pagefaults 0swaps

#90

which cpu are you using?


#91

Intel Core i7-47090K @ 4.00 GHz


#92

Hey, I thought you were going to open source it! But from what I see, it is proprietary software that nobody else has the right to use or redistribute without prior permission from the author. :wink:

If you want a suggestion, you could add something like this:

Copyright 2016 John Tromp
You may use this package under the MIT Licence. You may use this package under the Transitive Grace Period Public Licence, version 1.0, or at your option, any later version. (You may choose to use this package under the terms of either licence, at your option.) See the file COPYING.MIT for the terms of the MIT Licence. See the file COPYING.TGPPL for the terms of the Transitive Grace Period Public Licence, version 1.0. See TGPPL.PDF for why the TGPPL exists, graphically illustrated on three slides.


#93

I am getting core dumps on Ubuntu 64 in a VirtualBox on a Debian64 host

jank@ubuntu-modeli:~/equihash$ make all
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DATOMIC equi_miner.cpp blake/blake2b.cpp -o equi
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DSPARK equi_miner.cpp blake/blake2b.cpp -o equi1
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DJOINHT -DATOMIC equi_miner.cpp blake/blake2b.cpp -o faster
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DJOINHT equi_miner.cpp blake/blake2b.cpp -o faster1
g++ -g equi.c blake/blake2b.cpp -o verify
time ./equi -h "" -n 0 -t 1 -s | grep ^Sol | ./verify -h "" -n 0
Verifying size 512 proof for equi("",0)
Command terminated by signal 4
0.00user 0.00system 0:00.20elapsed 0%CPU (0avgtext+0avgdata 2760maxresident)k
0inputs+0outputs (0major+124minor)pagefaults 0swaps
time ./equi1
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Command terminated by signal 4
0.00user 0.00system 0:00.20elapsed 0%CPU (0avgtext+0avgdata 2692maxresident)k
0inputs+0outputs (0major+120minor)pagefaults 0swaps
Makefile:47: recipe for target 'spark' failed
make: *** [spark] Error 132
jank@ubuntu-modeli:~/equihash$ ./equi
WARNING: use of atomics hurts single threaded performance!
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Illegal instruction (core dumped)
jank@ubuntu-modeli:~/equihash$ ./equi1
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Illegal instruction (core dumped)
jank@ubuntu-modeli:~/equihash$ ./faster
WARNING: use of atomics hurts single threaded performance!
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Illegal instruction (core dumped)
jank@ubuntu-modeli:~/equihash$ cat /proc/version
Linux version 4.4.0-42-generic (buildd@lgw01-13) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #62-Ubuntu SMP Fri Oct 7 23:11:45 UTC 2016
jank@ubuntu-modeli:~/equihash$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

#94

Thank you, @tromp! Testing this on our "super" box, which you also have an account on and can use for testing now that the code is (almost) open source (need a license, as Zooko pointed out), the eqcuda and feqcuda sometimes fail to find solutions (and take multiple seconds to complete in that case). For example, the first time I ran them, they reported 0 solutions. Trying other nonce values, I got them to non-zero solutions, and then trying nonce 0 again finally gave the expected 3 solutions. Retrying after some other tests - and it's 0 solutions again. You probably have an uninitialized variable somewhere.

Failing run:

$ time ./eqcuda -n 0
Looking for wagner-tree on ("",0) with 10 20-bits digits and 8192 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 3.900 seconds.
0 solutions
0 total solutions

real    0m5.344s
user    0m2.875s
sys     0m2.281s

Working run:

$ time ./eqcuda
Looking for wagner-tree on ("",0) with 10 20-bits digits and 8192 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 0.096 seconds.
3 solutions
3 total solutions

real    0m1.532s
user    0m0.081s
sys     0m1.265s

0.096 would suggest 1.88/0.096 = 19.6 Sol/s, right? Per nvidia-smi, this runs on Maxwell Titan X. The box also has old Kepler Titan, but you don't seem to have included an option to choose the CUDA device.

I also tried CPU runs. Works great on i7-4770K, but the scaling to 32 threads on 2x E5-2670 in this "super" box is poor - perhaps running some independent instances with fewer threads each (maybe just 1 thread/instance) would be faster (but would eat up more RAM, which is fine at least for testing - got 128 GB here). Feel free to experiment with this, too.

Edit: "-t 12288" (upping CUDA thread count in accordance with the difference between GTX 980 and GTX Titan X) somehow makes the speed slightly worse for eqcuda, but improves it for feqcuda, which now gets (also not all the time, but when it's lucky):

$ time ./feqcuda -t 12288
Looking for wagner-tree on ("",0) with 10 20-bits digits and 12288 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 0.076 seconds.
3 solutions
3 total solutions

real    0m1.524s
user    0m0.070s
sys     0m1.328s

This is apparently 1.88/0.076 = 24.7 Sol/s.


#95

MIT LICENSE added...


#96

Thank you! Looks like blake2b.cu is third-party code (right?) - are you sure its author is OK with the code being placed under MIT license? Was it already released under a MIT-compatible license?

// Blake2-B CUDA Implementation
// tpruvot@github July 2016

#97

there is some bug left in faster[1] with -r option that I'll try to iron out soon