Z7 speeds and feeds

There might be aspects of Equihash that can be parallel processed as well as techniques for mitigating memory contention for those portions of sorting but my guess is that adapting zcashd to efficiently take advantage of available cores and memory channels to run additional instances would be the more attainable of those two goals.

From my understanding, Equihash was intentionally designed to significantly reduce the multithreaded capabilities of a single solution. I would assume Zcash simply runs many separate instances rather than allow multiple cores to collaborate on a single solution. Itā€™d be nice for Zcash to provide some metric though of the total throughput of all individual instances of the hashing function.

When a single core is fast enough to consume all or even most of the memory bandwidth, attempting to put other cores to work in that space is probably not worth the effort. But, for slower cores (ARM and other low power CPUs), itā€™s possible that it might require more than one core to do that.

Looks like the multithreading is certainly performing better than a single core for sure. In the same period of time, that 64-core system solved 20 blocks, while my 6-core 5820k @ 4 GHz solved 3 blocks. That is roughly in line with the expected relative performance comparing my solutions-per-second numbers, although the difficultyā€™s jump from 60 to 160 while I was running the 64-core system could have thrown some screwballs.

1 Like

Thatā€™s awesome. All your data agrees with what I would expect. I had estimated there were about 120 cores on the testnet the last few days, so I know 17 out of 36 blocks is valid if you were using 64 cores.

1 Like

Does it matter if its is DDR4 2400 or DDR 2133? Ä°s higher better ? And what about the CL of the RAM ?

1 Like

Testing different RAM speeds and latencies on a consistent (overclockable) system is something Iā€™ve been meaning to doā€¦ And I encourage everyone to try it.

On the topic of memory bandwidth saturation (1 thread vs multiple threads slowing each other down), my very limited testing shows it isnā€™t a terribly large problem. Iā€™ll test this on a higher-core system (like that EC2 box I played around with earlier) at a later point in time.

Test system: Ubuntu 14.04 x64 VM running in VirtualBox on a Windows 7 host: i7-5820k @ 4.0 GHz, 32 GB of DDR4 (quad-channel) @ 2133 MHz (CL=15 clocks). 1-core and 6-core clockspeeds are the same (no boosting higher for single-core workloads).

When running a benchmark thread on an otherwise idle system, I get a runningtime average of 44.591 seconds. When running a benchmark thread on a system which already has 5 threads performing Equihash mining, I get a runningtime of 53.386 seconds (average of three samples for both).

So effectively (on this machine with its hardware profile), 6 cores should mine about 5.012 times faster than a single core. Not linear scaling, but they certainly arenā€™t choking one another horribly.

I monitored CPU-z during these tests, clock speeds stayed at 4.0 GHz constantly.

1 Like

If the CPU is fast enough, I believe 2400 might be better by a factor of 2400/2133. I doubt itā€™s going to be worth the cost.

Concerning checking your ram speeds on Linux, use this to check read and write:
I get about 2 to 3 GB/s for writes, and 4 to 6 GB/s for reads on various DDR3 1333 MHz. The claim for the technology is 10.6 GB/s.

mkdir RAM_test
sudo mount tmpfs -t tmpfs RAM_test/
cd RAM_test
dd if=/dev/zero of=data_tmp bs=1M count=512
dd if=data_tmp of=/dev/null bs=1M count=512
dd if=/dev/zero of=data_tmp bs=1M count=512
dd if=data_tmp of=/dev/null bs=1M count=512
cd ā€¦

Youā€™ll have to work out how youā€™re going to get it going on 6 threads at a time.

DDR4 2133 MHz CL=15 clocks gets 4.4 GB/s write, 7.4 GB/s read.

See Make "zcbenchmark solveequihash" take a number-of-threads argument Ā· Issue #1147 Ā· zcash/zcash Ā· GitHub for progress on this.

1 Like

Thanks!

Did some more digging with that x1.32xlarge EC2 instance.

When running 1 single thread for benchmarking, I averaged around 60 seconds runtime for the benchmark thread.
When running 63 equihash mining threads + 1 benchmarking thread (all physical cores used), I averaged around 63 seconds runtime for the benchmark thread.

The numbers in-between lined up about as Iā€™d expect, very gradually decreasing in performance as the number of concurrent mining threads increased.

So for that machine at least (1952 GB DDR4, 64 physical cores), it seems that when all 64 cores are doing equihash mining, the memory bottleneck exists but is insignificant.

1 Like

Thatā€™s interesting - Iā€™ve been assuming that an i5 or i7 class CPU core is capable of using all the bandwidth of a single memory channel while mining Zcashā€¦ It was fairly clear to me that an AMD A10 CPU I did some testing on wasnā€™t using all the available memory bandwidth. Although, something to bear in mind is that youā€™re testing on a virtualised system so itā€™s not as easy to know exactly what is going onā€¦

Where can I download Z7 or Z8

Check out the installation instructions here.