Tromp's solvers

tromp · October 16, 2016, 3:17pm

dear ivianoo,

has installation guides for Linux, Mac Os X, and Windows.

-John

tromp · October 16, 2016, 3:40pm

I think I contacted all contributors of >= 0.05 BTC.
Anyone else who would like a full or partial refund, please send me a private message.

JanKalin · October 16, 2016, 6:24pm

I ran a few benchmarks on a Intel(R) Core™ i7-3770 CPU @ 3.40GHz running Debian. It’s a 4-core processor with hyperthreading accessing 4 sticks of 4GB 1600MHz DDR3 memory. Three benchmarks were on native machine, two were in a Ubuntu VM. Three benchmarks were made with multithreaded solver, two were with multiple instances of singlethreaded solvers. Parameters for solvers were -n 1000 -r 200, producing 390 solutions.

There is no benefit from going from 4 to 5 processors (engaging hyperthreading) - in case of equi solver it actually produces worse results. Increasing the number of processors further has a larger effect on the multithreaded solver but little on the singlethreaded solver.
There is a large difference between VM and native solving, making native solving much preferred to VM solving. On the same machine the benchmarks using zcash-cli zcbenchmark solveequihash 30 produced very similar results natively and inside the VM, around 0.17H/s. Tests using zcash-miner connected Suprnova pool confirmed this. With this solver, using just 2 native processes is faster than running 4 cores inside the VM (and actually using 4 cores on the host). Running 4 cores natively outperforms running 4 cores in the VM by a factor of 1.8.
In all cases running N instances of singlethreaded solver is faster than running N threads of the multithreaded solver.
If this solver makes its way into a standalone miner, I’ll use 4 singlethreaded solvers. This still leaves a bit of processing power for other operations and uses half the memory as compared to using 8 processes, for which the gain of 8% is almost negligible. I suspect that any high-performance solver will show the same behaviour.

sarath-hotspot · October 16, 2016, 8:50pm

@JanKalin Are you able to integrate equihash algo into zcash daemon code?

If yes, please let us know the changes. Currently I am working on this integration and it will save us lot of time.

JanKalin · October 16, 2016, 9:08pm

Sadly, no. Even though I have done a lot of programming, cryptoworld is a completely new field to me and It would take me far too long to get familiar with it. But @str4d has announced that he will be working over the weekend towards integrating this with his standalone miner - with the caveat that he might have other work, so not to expect too much

str4d · October 16, 2016, 9:10pm

Make that “I have had other work, so it’s not getting done by me this weekend, so someone else should have a go.”

tromp · October 16, 2016, 10:58pm

I just committed the combined version, obsoleting the former “faster*” executables. The new “equi*” are almost as fast, and I have a few more micro-optimizations to make…

tromp · October 16, 2016, 11:01pm

I changed the name of the thread since it is no longer about the crowdfund or Cuckoo Cycle. Just about my solvers…

dcale · October 17, 2016, 5:55am

When I run eqcuda I get this:

$ time ./eqcuda
Looking for wagner-tree on ("",0) with 10 20-bits digits and 8192 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 0.000 seconds.
0 solutions
0 total solutions

Why isn’t it “really” computing?

sarath-hotspot · October 17, 2016, 8:08am

@dcale try running it multiple times
or
try different nounce with -n

./eqcuda -n 56

neo · October 17, 2016, 8:22am

We will never really know its performance unless ported and use on beta2.

dcale · October 17, 2016, 9:11am

still not working… But I have an old nvidia 660, maybe that’s the issue?

JanKalin · October 17, 2016, 9:42am

try ./eqcuda -n 1000 -r 100. This should produce 188 solutions.

dcale · October 17, 2016, 9:43am

it “produces” them instantly… which commit are you using? I’m on 3cc4e5ae30343fd7d.

tromp · October 17, 2016, 5:57pm

I (finally) tried changing the number of buckets used in my solver, and seem to get more than 50% speedup with 2^12 buckets instead of the former 2^16. Still cleaning up the code; should be committed soon…

And done…

Next on to-do list: port recent improvements over to CUDA…

JanKalin · October 17, 2016, 8:26pm

Here are the updated benchmarks

and the table

+--------+-----------+-----------+----------+----------+----------+---------+
|   n    | e1_nat_MP | f1_nat_MP | f_nat_MT | e_nat_MT | f1_VM_MP | f_VM_MT |
+--------+-----------+-----------+----------+----------+----------+---------+
|   1.00 |     2.99  |     2.51  |    1.98  |    1.71  |    1.39  |    1.06 |
|   2.00 |     5.77  |     4.63  |    3.73  |    3.22  |    2.48  |    2.04 |
|   3.00 |     8.25  |     6.41  |    5.24  |    4.53  |    3.48  |    2.89 |
|   4.00 |    10.11  |     7.76  |    6.21  |    5.32  |    4.27  |    3.63 |
|   5.00 |    10.04  |     7.77  |    6.24  |    5.13  |     nan  |     nan |
|   6.00 |    10.62  |     8.05  |    7.19  |    5.77  |     nan  |     nan |
|   7.00 |    10.83  |     8.30  |    8.01  |    6.22  |     nan  |     nan |
|   8.00 |    10.76  |     8.45  |    8.27  |    6.35  |     nan  |     nan |
+--------+-----------+-----------+----------+----------+----------+---------+

The latest solver is e1_nat_MP, the others and the test machine have been described a few posts back.

tromp · October 17, 2016, 9:56pm

I benched my code with callgrind and it shows 33% of time being spent on blake2b hash computations. wish i could just plug in xenoncat’s asm optimized code for that:-)

counsellor · October 18, 2016, 6:24am

Its really great news gentlemen. Mr. Tromp if I understand correctly your solver utilizes CUDA subsystems of NVidia cards?

counsellor · October 18, 2016, 6:34am

If anybody has modern nVidia GPU cards could You please share your results in sol/s using current tromp’s solver?

xfcash · October 18, 2016, 7:47am

Yop Im also intrested. If someone could share binary/or tell me how I can test that…, I could provide speedtest on my own GTX780.