Multi thread mining cannot be used

Multi thread mining cannot be used
My CPU is the 16 core, can only use one of the core mining

I refer you to this post: Zcash mining pool - #12 by daira

And I invite you to do better if you think compiling for foreign systems isn’t challenging.

@lug You might be interested in this issue from the Zcash github: Make "zcbenchmark solveequihash" take a number-of-threads argument · Issue #1147 · zcash/zcash · GitHub

The standard -genproclimit should work, the same as upstream. So e.g. zcashd -gen -genproclimit=4 for 4 threads, or zcashd -gen -genproclimit=-1 for as many threads as you have cores. Note that this corresponds to issue #1147 linked above, and runs multiple instances of a single-threaded Equihash solver; the solver itself is not yet multithreaded.

When the solver itself is multi-threaded will it be able to solve faster per core?

It should be faster per solver, yes, although not directly corresponding to the number of threads. There will also of course be a trade-off in the number of solvers that can be run.

From my understanding, each thread does an entirely separate equihash solve.

Try ./zcash-cli setgenerate true 16 to mine on 16 independent threads. The design of Equihash punishes multithreaded implementations which work on the same solution collaboratively, so Zcash solves multiple solutions (1 per thread) simultaneously instead.

@lug

I’m very interested in this thread - while I"m on vaca at the time, when I get home I’m planning on putting together a few test miners

All of us would love to know if @Vorksholk’s post allowed you to run multiple threads

Incorrect command prompt can not find the file

@lug If you are in the root folder of the Zcash project and your have successfully built zcash, and zcashd is running, try:
./src/zcash-cli setgenerate true 16

z8 is using up 1/2 of my read BW and not hardly any of my write BW. If the 4 cores are colliding at read 1/2 the time and that’s taking away 50% of CPU capability, then I think ideal thread code would give 2x better performance. From my experiments, CPU seems to be 1/2 the hold up, and if that is not due to read collisions, ideal thread code might give only 50% better without a faster CPU. A sort that trades reads for writes might do better. These are my guesses based on no knowledge of these things.

Cache coherence protocols in modern systems are pretty sophisticated, and read-mostly workloads on blocks of memory that are disjoint between cores are what they are optimised for. Don’t worry too much about loss of performance due to false sharing; this is likely a small factor relative to missing optimisations in the code. At some point we’ll get to doing a deeper analysis of cache usage with VTune or some similar tool.

2 Likes

Not likely. It depends on how the allocation of memory locations maps to the memory channels as to how fast it can be shuffled around. The algorithm is designed to limit solve rate by the speed of memory. Not CPU or the amount of RAM. I have run the miner for about 9 hours so far on a machine with a 1333 DDR3 single DIMM, and it didn’t do that badly compared to desktop motherboards with dual and quad channel, at 1600. This was only with one core also. One core on my fitlet was maxed out and maybe I could squeeze another 20-30% out of it by running two. I currently don’t have a wired internet connection so I am not keen to do a lot of mining until I do.

If you ran the Fitlet with only 1 core (genproclimit = 1 and not -1 or 4) and got more than 3 blocks in 9 hours, I would be surprised. I think your average will be about 1 every 12 to 24 hours with 1 core. Slow computers saturate the RAM bandwidth faster, or at least my measurement of it, for reasons I do not understand. My 2010 PC bogs it down from 5 to 1 GB/s with 4 cores. My 2013 PC may bring it down from 5 GB/s to 3 GB/s, getting twice as many blocks. Methods of measuring RAM bandwidth require CPU time also, so my measurements may be an illusion, and they may not be utilizing more than 1/6 the bandwidth, which is why future parallel code for GPU’s is a threat.

So what system would be faster in general mining?

System 1: CPU 3,2 GHz with 4 threads, 8GB RAM
System 2: CPU 2,6 GHz with 8 threads, 8GB RAM

Motherboard an the other components are equal on both systems, same generation.

If the 8 threads are from hyperthreading on a regular 4-core CPU, then probably the 1st option.

Does anyone know why I am getting consistently getting lower solveequihash average results when having genproclimit set to 1 rather than 4 or -1?

(System 1: CPU 2.3 GHz with 4 cores, 8 threads, 12GB DDR3 RAM)

If your motherboard has more than one memory channel, it’s possible that mining on more than one core consumes enough memory to force the benchmark to occur in a less contested region of memory with a separate memory channel.

@Voluntary: I think that by “lower”, @Hawkeye means faster results.

Note that the solveequihash benchmark is making a separate measurement that does not directly use genproclimit, but if other threads are mining in the background then they will be taking memory bandwidth, and so it’s expected that the benchmark will be slower the more background threads are running.

zcash-cli setgenerate true 16

Does this really enable you to mine 16 different instances of the solver on each core or is it still mining the same instance on all cores since the miner is not multi-threaded yet? Or is it? I am putting together a server with a ton of cores and want to make sure I am configured to mine optimally on each core. Any suggestions for my config file? Also how do I find blocks I have solved on your block explorer web page? Sorry for all the questions, I am new to Zcash but not new to crypto.

Thanks!
Jarid

I just ran this with network difficulty at 28, are these numbers any good? I don’t really have a reference point yet of what they should look like…

jarid@Zcash:~/zcash/src$ ./zcash-cli zcbenchmark solveequihash 10
[
{
“runningtime” : 247.51932900
},
{
“runningtime” : 173.48537800
},
{
“runningtime” : 38.90601900
},
{
“runningtime” : 111.30959600
},
{
“runningtime” : 180.96685900
},
{
“runningtime” : 244.15432200
},
{
“runningtime” : 141.82862500
},
{
“runningtime” : 200.55349100
},
{
“runningtime” : 35.43115600
},
{
“runningtime” : 50.09029600