Breaking equihash in Solutions per GB second


#1

As some of you may know, I've been busy developing a miner for Equihash.
For now, I can share a single data point for comparing my miner
with the default one.

I measured the latter by hacking str4d's standaloneminer.cpp to
iterate over nonces and report number of solutions found for each one.
On a core i7 machine, it took 1520 seconds to find 97 solutions for a solution rate of roughly 0.063 S/s. It also seemed to use about 540MB on average
(I may be quite wrong about this; if anyone has a better estimate of memory use averaged over runtime, then please let me know and I'll update my figures. I could also redo the numbers in terms of peak memory, but while I know mine, I don't know that of the default miner...).

That gives it an estimated single-threaded time*space performance of
0.063 S/s / 0.54GB ~ 0.12 S/GBs (using S as shorthand for solutions).

I can report that my miner achieves an estimated 5.1 S/GBs.
Running 6 threads boosts that to 18.7 S/GBs.

I'm currently porting the pure C code to CUDA to see how much faster a GPU might be, but that will take a while to complete...


#2

Wow…it was always stated that the Zcash miner was not optimized, but that is a more substantial improvement than I expected. Congrats.

Did you get a sense of whether the memory bus was near-saturated when you upped the threads to 6? The 3.67x multiplier would indicate that there's some kind of competition for memory…no?


#3

Thank you for the report, John!

Folks, let's standardize on a common measurement. Unless someone has something better, I propose that we standardize on John's "solutions per gigabyte * seconds", written "S/GBs". A "gigabyte" is 10⁹ bytes, and it is often confused with 2³⁰ bytes, so it might be helpful to spell that out prominently or just to dodge the issue by writing it as "S/10⁹b*s".

I'd love to see https://benchmark.minezcash.com/index.php?title=Main_Page extended to show the measurement as "S/10⁹b*s" so that I could easily compare John's data point from this thread with the measurements on that wiki page.


#5

dear Zooko,

According to Wikipedia, what I was using can be denoted unambigiously as a Gibbibyte

Of particular importance is "In the context of computer memory, Gigabyte and GB are customarily used to mean 1024^3 (2^30) bytes".
So it's OK to stick with GB, which is also a hell of a lot easier to type:-)

If the minezcash page is extended, it should also have a column to indicate which miner program was used (complete with revision, if relevant).


#6

@tromp The wiki can be edited by anyone, please feel free to do so whichever way you guys decide.

And it may actually be better if you have one or two specific calculations you can explain for anyone to easily add.


#7

Please don't use "GB" to mean 2³⁰ bytes. That's at best ambiguous, and at worst simply incorrect. If you want to use gibibytes, please abbreviate it "GiB" or, like I said, find an unambiguous spelling like "2^30" for GiB or "billion" for GB or anything unambiguous.

I don't mind if we standardize on GiB for this, but I mind if we use incorrect, ambiguous, or contested spellings or abbreviations.

Here's my argument for why to use GB (10⁹) instead of GiB (2³⁰):

It was cute to use "kilobytes" to mean 2¹⁰, back when 2¹⁰ was what we were usually working with, and it was only 2.5% wrong. If you saw 420,213, and you mentally truncated 3 digits to approximate it as 420 KiB then you were about 2.5% off (depending on the interaction of rounding-down with the inaccuracy of using KB as an approximately for KiB). Then when we got up to millions of bytes, this became 5% off. If you see "25,420,213" bytes and you mentally approximate that as 25 MiB you're about 5% off.

Now that we're up to billions of things, you're about 7.5% off if you see "9,125,420,213" and you approximate that as 9 GiB. It is pretty close to 9 GB, but it is 7.5% further from 9 GiB.

Note that normal non-mathy users are never going to learn to do anything other than truncation for this approximation. I also don't like using cute jargon that signals that we know the traditions / in-jokes, but are confusing for outsiders.

Let's get off this train and go back to using convenient base-10 like the rest of the SI system. :slight_smile:

But if not, fine, I don't have to time to argue about it, but don't call it "GB" if it's not 10⁹ B's.


#8

Sol/s and Sol/(GiB•s) are two distinct metrics. Both are important.

Equihash mining has four constrained resources:

  • Compute units
  • Memory space
  • Memory bandwidth
  • Power

Of those two, memory space and bandwidth requirements tend to be mostly proportional to each other, and memory space is easier to measure, so I would be comfortable ignoring the bandwidth metric most of the time. Compute units and power are both important and need to be independently accounted for.

Trying to use a single metric to encompass all three/four of these resources would result in lost information, which would make accurate comparisons difficult. Not everybody has the same electricity price, for example, so different people will want to make different tradeoffs between power and compute performance.

Sol/(GiB•s) is a useful metric, but we also need Sol/(s) and Sol/J (or J/Sol). Occasionally seeing Sol/(GiB•s^2) as a measure of bandwidth efficiency would be cool too.


#9

With 1 core running and 4 core running, my average was 300 MiB to 315 MiB per core. DDR3 1600 MHz. i5 3rd generation.

[ edit: this is with FREE and TOP which MiB. Let's stick with GiB, not GB, because that is what TOP shows ]


#10

Using zawy's estimate of default miner memory usage, we get

str4d miner : 0.063 S/s / 0.3GB ~ 0.21 S/GBs

tromp miner : 5.1 S/GBs

Still a 24x improvement.

PS:
Zooko's argument above, while relevant for the practices of disk manufacturers, ignores my quoted "In the context of computer memory".
When Zooko buys a 4GB laptop, is he pleasantly surprised to find he got an extra 294967296 bytes?

PPS:
I just realized there is some advantage to using GiB; it's pronouncable!
S/GiBs makes for some nice solgibs :slight_smile:


#11

My "8 GB" card is showing 8047524 KiB in top. My numbers above were MiB, which I just corrected.

As a reminder, my 4 cores are 2.74x faster at getting blocks on betanet as my 1 cores and my 1 cores were 0.044 S/s on the benchmark.

1 core: 0.147 S/GiBs
4 core: 0.100 S/GiBs


#12

These are great results, @tromp. Regarding the metrics, I'm with @jtoomim - we can't use just one, and in fact many (or most?) people with CPUs and GPUs will gladly trade a lot of memory (as long as they have it in the system anyway) for a little speedup - e.g., 2x speedup when going from 1 GB to 20 GB usage would be a good tradeoff for someone who has 32 GB RAM installed. This is especially true for GPU cards, where the memory would usually be wasted if not used. Yet another metric to consider is time between sets of solutions - if some hypothetical highly parallel implementation produces 1 million solutions every 1000 seconds, it will be mostly unusable because 1000 seconds is beyond the target time between blocks.

I've been playing with the reference implementation of Equihash lately, and it's also quite a bit faster than zcashd's for the current 200,9 parameters (this wasn't the case for 144,5, which Zcash used before), but it needs some hacks to make it work for those (and the way it uses BLAKE2b hashes is different from Zcash's, so it's not a drop-in). Dmitry committed a fix correcting a bug with the initial list size a few days ago - with that fix already in, you only have to increase MAX_N from 32 to 40 and increase LIST_LENGTH and FORK_MULTIPLIER somewhat, such as to 10 and 5, respectively, to have it find 90%+ (but not 100%) of solutions for 200,9. With these settings, it's about 10 seconds at about 1.7 GiB (trivial to halve that?) on one core in i7-4770K (stock clocks, dual-channel DDR3-1600, 4x 8 GB DIMMs installed). Running 4 instances on the i7-4770K, it's 13 seconds per set of solutions per instance. Running 8 instances, it's 23 seconds per set of solutions per instance. We need to multiply this by, say, 1.8 solutions (on average) per instance per invocation (ideally, it's closer to 1.9 solutions, but like I mentioned this implementation doesn't find 100% of them with these settings). This gives 8*1.8/23 = 0.63 S/s, which is 10x faster than what you quote for zcashd's (but different CPU maybe). If we factor in the 8*1.7 GiB, of course it'd be a lot worse in terms of S/GiB*s, but many people would not care. (And it'd be better for only 4 instances concurrent in terms of that metric.) I am listing peak memory usage by the processes here; average is less (I think you used average, which would matter for optimal de-sync of multiple instances?)


I want to create a new miner
#13

Can you share the S/s figures as well, please? As long as memory permits, I mostly don't care about its usage, except as a metric relevant to other scenarios and reuses of the algorithm/code. Thanks.

Can you also share more details about your setup, please? CPU, memory channels, etc. Thanks.


#14

I do not want to share any more figures until it's clear whether Zcash can be persuaded to compensate me for open sourcing it (I've already decided against submitting to the contest).


#15

Well, there will be the contest, I hope you get those 30K$ and the world gets your miner :slight_smile:


#16

That's a pity. How could Zcash possibly provide a yes/no answer re: possible compensation when you're not even willing to share the full performance figures? (No idea whether they'd consider it either way, but your position sounds like it makes this particularly unlikely.)


#17

Be careful here. When working in parallel we'll often be accessing the same memory space with multiple processors -- possibly over the same bus. So in some cases the memory bandwidth requirements may be a function of the number of processors working on a given memory space -- and not simply a constant multiple of the memory space requirements.


#18

Then you better hope what you have is better than what wins the contest... or else you're giving up on any chance of being compensated at all.


#19

Wait, why wouldn't they instead run 2 or 4 or 32 miner processes in parallel?


#20

By "2x speedup", I was referring to throughput increase achieved between two miner versions/settings when either is running whatever number of instances is optimal for the hardware (CPUs and memory channels, if it's a typical computer) in terms of throughput alone. IOW, yes, they'll run multiple instances, but this was already factored in. Also, they won't run more instances than the number of logical CPUs (e.g., 8 for a quad-core Core i7).


#21

When I suggested that Zcash would get much more mileage out of the contest if people can see and build upon my ideas as well, they asked what I wanted in return for open-sourcing it.

I proposed the following:


You may be aware of the bounties I have on

https://github.com/tromp/cuckoo

for doubling various measures of my miner's performance.
I'd love the make these bounties really substantial, without
losing any sleep over it.

I'd like to reward each 2^k improvement by k * 10BTC.

The funds would be put in escrow, with a timeout
of 4 years (when they can flow back to zcash).
Both me and zcash would need to sign for a bounty payout,
after verifying that the conditions have been met.

Chances are that very few such bounties would be claimed.
Since I have no way to apply the funds to myself, I don't lose
sleep over it. You can cap the commitment at 100BTC,
allowing for an extremely unlikely 2^10 improvement of Cuckoo Cycle.

You can also let the community chip in for the Cuckoo Cycle Bounty Fund,
and make money if Cuckoo Cycle mostly holds up for 4 years.

In summary, I want to be able to advertise 100BTC in bounties for
breaking Cuckoo Cycle without losing any sleep over it.

Maybe in future, you might even decide to add Cuckoo Cycle as an
alternative PoW in Zcash, and then all this money is just something
you would have wanted to commit anyway to gain more assurance of its
being optimization free.

As long as Cuckoo Cycle is unused, people are much less likely to
forego a huge bounty to pursue a lucrative private miner sale.

Even if zcash cannot commit funds for 4 years, you may want to see if
one of your investors is interested in providing this "insurance", for the sake
of having a much more level mining playing field.


With Zcash having declined, I'd be happy to extend this offer to any other party interested in seeing this miner as open source.

It is pure C code, multithreaded, and has a peak memory usage 47% over the average. The CUDA gpu miner is half finished, mostly needing the blake2b primitives and some tuning. Both cpu and gpu miner are standalone engines and need integration with stratum.