Tromp's solvers

dear ivianoo,

has installation guides for Linux, Mac Os X, and Windows.

-John

1 Like

I think I contacted all contributors of >= 0.05 BTC.
Anyone else who would like a full or partial refund, please send me a private message.

I ran a few benchmarks on a Intel(R) Core™ i7-3770 CPU @ 3.40GHz running Debian. It’s a 4-core processor with hyperthreading accessing 4 sticks of 4GB 1600MHz DDR3 memory. Three benchmarks were on native machine, two were in a Ubuntu VM. Three benchmarks were made with multithreaded solver, two were with multiple instances of singlethreaded solvers. Parameters for solvers were -n 1000 -r 200, producing 390 solutions.

  • There is no benefit from going from 4 to 5 processors (engaging hyperthreading) - in case of equi solver it actually produces worse results. Increasing the number of processors further has a larger effect on the multithreaded solver but little on the singlethreaded solver.
  • There is a large difference between VM and native solving, making native solving much preferred to VM solving. On the same machine the benchmarks using zcash-cli zcbenchmark solveequihash 30 produced very similar results natively and inside the VM, around 0.17H/s. Tests using zcash-miner connected Suprnova pool confirmed this. With this solver, using just 2 native processes is faster than running 4 cores inside the VM (and actually using 4 cores on the host). Running 4 cores natively outperforms running 4 cores in the VM by a factor of 1.8.
  • In all cases running N instances of singlethreaded solver is faster than running N threads of the multithreaded solver.
  • If this solver makes its way into a standalone miner, I’ll use 4 singlethreaded solvers. This still leaves a bit of processing power for other operations and uses half the memory as compared to using 8 processes, for which the gain of 8% is almost negligible. I suspect that any high-performance solver will show the same behaviour.
4 Likes

@JanKalin Are you able to integrate equihash algo into zcash daemon code?

If yes, please let us know the changes. Currently I am working on this integration and it will save us lot of time.

Sadly, no. Even though I have done a lot of programming, cryptoworld is a completely new field to me and It would take me far too long to get familiar with it. But @str4d has announced that he will be working over the weekend towards integrating this with his standalone miner - with the caveat that he might have other work, so not to expect too much :slight_smile:

Make that “I have had other work, so it’s not getting done by me this weekend, so someone else should have a go.”

1 Like

I just committed the combined version, obsoleting the former “faster*” executables. The new “equi*” are almost as fast, and I have a few more micro-optimizations to make…

I changed the name of the thread since it is no longer about the crowdfund or Cuckoo Cycle. Just about my solvers…

When I run eqcuda I get this:

$ time ./eqcuda
Looking for wagner-tree on ("",0) with 10 20-bits digits and 8192 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 0.000 seconds.
0 solutions
0 total solutions

Why isn’t it “really” computing?

@dcale try running it multiple times
or
try different nounce with -n

./eqcuda -n 56

We will never really know its performance unless ported and use on beta2.

still not working… But I have an old nvidia 660, maybe that’s the issue?

try ./eqcuda -n 1000 -r 100. This should produce 188 solutions.

it “produces” them instantly… which commit are you using? I’m on 3cc4e5ae30343fd7d.

I (finally) tried changing the number of buckets used in my solver, and seem to get more than 50% speedup with 2^12 buckets instead of the former 2^16. Still cleaning up the code; should be committed soon…

And done…

Next on to-do list: port recent improvements over to CUDA…

7 Likes

Here are the updated benchmarks


and the table

+--------+-----------+-----------+----------+----------+----------+---------+
|   n    | e1_nat_MP | f1_nat_MP | f_nat_MT | e_nat_MT | f1_VM_MP | f_VM_MT |
+--------+-----------+-----------+----------+----------+----------+---------+
|   1.00 |     2.99  |     2.51  |    1.98  |    1.71  |    1.39  |    1.06 |
|   2.00 |     5.77  |     4.63  |    3.73  |    3.22  |    2.48  |    2.04 |
|   3.00 |     8.25  |     6.41  |    5.24  |    4.53  |    3.48  |    2.89 |
|   4.00 |    10.11  |     7.76  |    6.21  |    5.32  |    4.27  |    3.63 |
|   5.00 |    10.04  |     7.77  |    6.24  |    5.13  |     nan  |     nan |
|   6.00 |    10.62  |     8.05  |    7.19  |    5.77  |     nan  |     nan |
|   7.00 |    10.83  |     8.30  |    8.01  |    6.22  |     nan  |     nan |
|   8.00 |    10.76  |     8.45  |    8.27  |    6.35  |     nan  |     nan |
+--------+-----------+-----------+----------+----------+----------+---------+

The latest solver is e1_nat_MP, the others and the test machine have been described a few posts back.

2 Likes

I benched my code with callgrind and it shows 33% of time being spent on blake2b hash computations. wish i could just plug in xenoncat’s asm optimized code for that:-)

3 Likes

Its really great news gentlemen. Mr. Tromp if I understand correctly your solver utilizes CUDA subsystems of NVidia cards?

If anybody has modern nVidia GPU cards could You please share your results in sol/s using current tromp’s solver?

Yop Im also intrested. If someone could share binary/or tell me how I can test that…, I could provide speedtest on my own GTX780.