Equihash - ongoing memory requirement adjustments?


#1

I see from https://z.cash/blog/why-equihash.html that ZCash's PoW alg will use Equihash and be set to require about 1GB of ram per thread.

I've only skimmed the equihash paper thusfar, but my understanding is that there's a non-linear relationship between compute time and memory; ie, with more memory, hashing will be non-linearly-faster. Great. First question: If the target is set at 1GB, is there an advantage to having >1GB per thread, or is the target actually more or less optimal? (I assume the latter).

Second question: Are there any plans to bake scaling of the optimal memory amount into the protocol? Obviously a few years from now the math for building ASICs with 1GB of mem available per core will probably be very different. Shouldn't the mem target scale roughly with Moore's law? If the consensus is yes, please consider building such scaling in early (eg, take a lesson from Bitcoin's ridiculous stagnation/politicization on blocksize and don't just assume the rational course of action will be taken years from now).

Thanks.

[edited slightly for clarity]


Zcash Pool- Zcash.flypool.org
#2

I'm no expert, but I was mining for a while yesterday and I noticed that Zcash did not use all of memory, there was plenty unused. So I guess there is no advantage if you can afford >1GB per mining thread.

I share the same concern about your second question. Memory is getting cheaper. Even if they set a high requirement for RAM by now, it will eventually become low-end in several years and hence the PoW would no longer be ASIC-resistant and botnet-resistant.

Memory does not follow Moore's law. Also, it is not actually a law, just a prediction. The best way is would be a mechanism to adjust PoW parameters dynamically without hard forking, but i guess that's impossible.


#3

First question: The target memory is roughly optimal. Note that the testnet currently uses parameters with a much lower target memory than we expect to use on the production network (if we had used the latter with the current unoptimized Equihash implementation, mining would have required several gigabytes and each Equihash solution would have taken a long time to find, which would have posed problems for testing and granularity of difficulty adjustment).

Edit: the target memory is roughly optimal taking into account the algorithm binding. Without the algorithm binding, it would be possible to use extra memory to perform an efficient multicollision attack against the Generalized Birthday Problem. This is undesirable because the optimal strategy would then be to use much more memory than general-purpose CPUs and PCs are able to connect to. So the algorithm binding is an important aspect of Equihash's security.


#4

Second question: It's difficult to predict in advance how the memory target should change over the long term. After we have experience with this PoW in production, we can consider changing that with a hard fork. Our attitude toward the risks of hard forks is somewhat different to that of the Bitcoin Core devs, so I wouldn't expect that to be an undue obstruction.


#5

so a 4 cores cpu mining would takes 4 gb ?

That's fair I guess, even my cheap laptop have 16 gb ram


#6

Yes, running 4 simultaneous mining instances would take 4 times the memory. This might not require 4 physical cores; I suspect it is possible to mine reasonably well on an SMT thread. The optimal number of mining instances for a given computer will depend partly on memory bandwidth, as well as the amount of memory.


#7

I tried to understand this. :dizzy_face:

So if the target memory is 1GB, having more than 1GB ram/ CPU core doesn't give you extra reward, right?

I guess this is same question:
Which one will mine more coins if the target memory is 1GB?
(1) Running a pc with a single core cpu with 2GB memory
(2) two pcs with singe core with 1GB memory each?

Is it (2)? Am I right?


#8

Yes that's correct, (2).


#9

Thanks for the details. I'm encouraged to hear that the ZCash team's attitude toward hard-forks is different to that of Bitcoin Core (presumably, you're much more willing to do them), but it seems pertinent to note that Bitcoin's attitude toward change/improvement was the same way when Satoshi was still around. Thus, taking a lesson from Bitcoin, I think it's prudent to start planning now for a time when the current ZCash team may not be involved, or at least, may not have outsized influence on the community.

It 2011/2012, it was unthinkable that the Bitcoin community wouldn't just raise or remove the 1MB blocksize anti-dos hack once we started to get anywhere close to the limit, yet 4 years later, it's an ugly political mess, with people trying to use it as an economic throttle to incentivize different use of blockspace. Ask nearly anyone active in the bitcoin community 3 or 4 years ago, and they'd find the present-day situation absurd.

tldr: It seems important to bake scaling means, even if somewhat inelegant, into the protocol early.


#10

Daira,

Thanks. Good to be right. :slight_smile:

I have two more questions about this.
1.What is the current target memory amount?
2. If the pc has more than one core cpu, is the program automatically use the amount of memory, (target memory x number of cores in the pc)? For example, if my pc has two cores, without any setting it uses 2GB memory when the target memory is 1GB?


#11

Hey guys, great discussion. While reading through this I realized one of the most important questions with regard to a system build would be: What is the memory bandwidth (GB/s) usage of 1 core running with 1 GB of RAM.

Let's take a quick look at the implications of the question. The 6th generation Intel chips have a max memory bandwidth of 34.1 GB/s across 2 memory channels. If zcash gobbles up all 34.1 GB/s from 2 cores with 2 GB of RAM, then an i7-6700 quad core would have no benefit over an i3-6300 dual core. The bottleneck being memory bandwidth adding more memory or adding more cores (and running zcash on them) effectively does squat and could potentially even slow you down.

Now, if zcash is using 8 GB/s of memory bandwidth per instance of 1 core with 1 GB of RAM you could run on an i7-6700 using all 4 cores and double the performance of the i3-6300. Unless zcash can utilize hyperthreading. Then you could max out (or close to it) your total memory bandwidth of 34.1 GB/s with the i3-6300 2 cores with 4 threads. Then we're back to the i7-6700 showing no substantial performance boost over the i3-6300.

I have a feeling the answer is somewhere close to 8 GB/s. The equihash white-paper shows the following for a quad-core 1.8GHz processor with proof of 500MB, n=144, k=5:

Threads Solve Time
1 67 sec
2 49 sec
4 33 sec
8 31 sec

This would indicate that an i7-6700 (8 threads) would only slightly out perform an i3-6300 (4 threads). That reduction of 2 seconds may not be worth the $200 chip upgrade.

The question is also relevant if you're installing on an existing build. I have an i7-4771 quad-core with hyper-threading and wonder if I am actually shooting myself in the foot by setting genproclimit = -1. Perhaps the most effective usage would be 4 of my 8 threads?

Running zcbenchmark solveequihash 20 8 (comparing to 20 4) doesn't seem to work like I expected. Is it adding the solve time across 8 cores 1 core at a time?

Anyway, I thought perhaps someone has already been giving this some thought. Let me know what you guys think.

Thanks!


#12

On my 2013 CPUs with DDR3 1600 MHz, 4 cores is 2.74x more than 1 core. Yes, hyperthreading reduces results. Make genproclimit=4, not -1.


#13

I ran the benchmark with 1, 2, 4, and 8 threads with the following results:

8 threads - 305 seconds per solve;
4 threads - 98 seconds per solve;
2 threads - 56 seconds per solve;
1 thread - 38 seconds per solve.

I'm running on a 3.5 GHz i7-4771 quad-core with hyper-threading (8 threads) 2 memory channels max bandwidth of 25GB/s, 16GB of DDR3-1600MHz, ubuntu 16.04 native OS, running on a 300/20 Mbps connection with 11ms latency.

So, either the zcbenchmark command isn't functioning as expected (running the additional threads consecutively rather than concurrently) or the inclusion of additional threads just clogs up the memory channels and slows everything down.

The other possibility is that I have a bottleneck somewhere else that's causing the issue. I've only had the opportunity to compare notes with one other beta user and he is experiencing almost identical results with a 4.0 GHz AMD on 8 GB of RAM.

Any thoughts?


#14

I don't think the benchmark for multiple threads is correct. Your 1 thread is as it should be. 4 cores will get you 2.74x more blocks if it turns out similar to my older DDR 1600 MHz system. Your hash rate for 4 cores will be about

1.88 / 38 x 2.74 = 0.136 H/s

Attempting the hyperthreading will lower the hash rate. The 1.88 (from Daira and Tromp) is more correct than the 2 everyone has been using, so use 2 to compare to other people. I do not know if the getnetworkhps is based on 2 or 1.88.


#15

The zcbenchmark command currently gives misleading results for multiple threads. If you know how to use git then you can check out the current master branch shortly (after this PR has merged), which will fix this problem.


#16

getnetworkhashps returns the estimated total solution rate of the network, computed from the block time and difficulty over the past nPowAveragingWindow = 17 blocks. It doesn't use the same median computation as the difficulty algorithm so it is more prone to error if block times are inaccurate. However, it should correspond to the actual (estimated based on difficulty) solution rate rather than a guess based on the number of solution runs.


#17

Thanks for the info daira; it was very helpful.

Can I confirm that hyper-threading indeed reduces performance?

If so, when genproclimit is set = 4 will zcash use all 4 cores (on a quad-core) or will it use 4 threads on 2 cores?


#18

I would not expect using hyperthreads (i.e. more threads than the number of cores) to get much, if any, speed-up. There will be too much resource contention.

But yes, genproclimit will use as many threads as is set, regardless of the number of cores.


How to (mining) benchmark a system (not single core)?