Easy to follow description of the equihash algorithm

I have to say, I'm unable to figure out what the equihash algoirhtm is. I've read the white paper. I'm a programmer by trade and I'm still not sure what the algorithm is.

Is there not a simple consolidated description of what the algorithm is doing? Anywhere? Can someone spend 10 minutes to describe the naive implementation of what is being solved? I'd love to know if I could potentially optimize any of the current solver code, but I have no idea what it is doing, so it's rather difficult to take a stab at.

4 Likes

second that .... :))))))

Ok, I’ll try.

Part I, the overall mining scheme since megacrypto has asked for it.

To mine you get a “work” which is first part of final block, consisting of version (4 bytes), prevhash (32), merkle root (32), reserved (32), time (4), bits (4) and nonce (32) total 140 bytes. When pool mining, you will get the first part of nonce from the pool, and only the second part you will choose by yourself. Otherwise if you sole mine then all nonce is yours.

Then you make 2**20 = 1048576 hashes, each is 50 bytes of BLAKE2b function of 140 bytes of “work” concatenated with 4 bytes of hash index. Then you split each hash in the middle to two strings of equal size, getting 2**21 = 2097152 strings of 25 bytes (200 bits) each. This requires at least 50 megabytes of memory, effectively kicking out modern FPGA and ASIC approaches to high performing mining for other coins.

Note, number 21 is equal to N / (K + 1) + 1 and number 200 is equal to N, where N = 200 and K = 9 are Equihash parameters chosen for Zcash.

Next you solve Wagner's algorithm. What is it? You must peek 2**K = 512 different strings from those 2097152, such as binary XOR of them is zero. Moreover, some other properties of your selection must be assured. First, on each split by two of your indexes the left ones must be ordered, and second, on each split by two of your indexes resulting XOR of strings must have 20*height zero bits from the left. More on algorithm is in the part II.

When you find some solution (typical nonce provide around of two solutions on average), you fill the rest of block with solution length (3 bytes in special encoding always the same value 1344) and solution itself, which is encoded by 21 bits per index, thus 21*512 bits that is exactly 1344 bytes.

There is a target, which means the difficulty barrier. You make sha256 hash of sha256 hash of your total block of 140+3+1344=1487 bytes, and then your compare your double sha256 with the target. Below the target is good, and then you can propagate your block if sole or submit your job results if you do pool mining.

Caveat of bit ordering and start of indexing everywhere.

Part II, the Wagner’s algorithm, which is the core of Zcash mining.

You will make K+1=10 steps, and in the beginning you have 2097152 original strings of 25 bytes each (200 bits). On each step you will be interested in next 20 bits from those 200 bits strings, i.e. bits from 0 to 19 on first step, 0 to 39 on second etc (counting from zero from left). On each step you try to combine pairs of strings from previous step and keep only the pairs that have common interesting bits (in other words, XOR of which is zero in interesting bits). So, on each step you will get growing binary trees of indexes in the original strings: one index on step 1, two indexes on step 2, four indexes on step 3 etc. Note that all indexes in one tree must be unique. At the final step you will have 512 unique indexes combined into desired properties of the algorithm.

There are some optimizations implemented in reference miner and more optimizations in winners of miner contest. Basically it’s about peeking correct pairs. Note that you don’t need to try each index to each, you can first sort them and then try to combine only while interested bits are the same (reference miner), or you can put strings in the buckets indexed by interesting bits (other miners), and keep those buckets compact to fit into processor caches (that’s the key of speed). Also, you can do things in parallel to use multicore CPU and GPU advantages.

11 Likes

@mtve ... I just saw that now ... thanks :))))

since the forum interface changed, i'm not catching a lot of posts !!!

Edit: now that i read it all .... this will need some good decrypting to understand what you just explained :)))))

You may find these helpful:

https://github.com/xenoncat/equihash-xenon/blob/master/notes/algorithm%20description.pdf
https://github.com/davidad/libeqh/blob/master/doc.pdf
http://www.openwall.com/articles/Zcash-Equihash-Analysis

These are, in order: algorithm description from the CPU winner of the Challenge, problem description from another Challenge submitter team (who unfortunately started late and never completed their solver), and the article I wrote (on Zcash Co's request and with their support) on topics including Equihash optimization potential.

3 Likes

hi @mtve could you please explain this “solution length (3 bytes in special encoding always the same value 1344)” ?

how can do this ? what do you mean with special encoding ?
Cheers

ZCash block header structure is described in https://github.com/zcash/zips/blob/master/protocol/protocol.pdf paragraph 6.3

Solution length is always 1344 bytes and this number is encoded (serialized) in so-called compactSize encoding from bitcoin.

For the value in the range [254;65535] it’s a three bytes (as intended for this field in the header) 0xfd 0x40 0x05, where the first is the marker (253 in decimal) and the second and the third are 1344 decimal in little-endian format (so 0x0540=0d1344):
https://github.com/zcash/zcash/blob/master/src/serialize.h#L254
https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer

See also my answer here:

Hey im a new dev in the scene and im wondering if i can work with someone to figure out why my solutions are too low difficulty. Im pretty sure im finding the collisions 20 bits at a time per round but the end product doesnt seem to be working for the local pool i setup. Any help would be appreciated.

Thanks!

you may try to post here your calculations, full block header including nonce, some steps of your algorithm including results, so hopefully somebody will try to check it.

if you implementing standard Equihash algorithm you may take any working miner, add printing some debug outputs to it on each step, and then compare its output to yours, just to figure out where you start to differ.

if you have some original algorithm and don’t want to disclosure it, just post your input and your solutions.

Link to input and solution

Ok here is the input and results I have. Maybe we can discuss how i can put the input into a miner and see the results they get as well.

@mtve
here is the logic i planned out by round:

N - 12 bit row (4096 rows)
B - 8 bit bin

Round 0

Source

xxxx xxxx xxxx xxxx xxxx xxxx 0000 0000 xi6 v3.s0
0000 0000 0000 0000 0000 0000 0000 0000 xi5 v2.s4567
0000 0000 0000 0000 0000 0000 0000 0000 xi4 v2.s0123
0000 0000 0000 0000 0000 0000 0000 0000 xi3 v1.s4567
0000 0000 0000 0000 0000 0000 0000 0000 xi2 v1.s0123
0000 0000 0000 0000 0000 0000 0000 0000 xi1 v0.s4567
0000 0000 0000 0000 0000 NNNN NNNN NNNN xi0 v0.s0123

Destination - Actual memory

0000 0000 0000 0000 0000 0000 0000 0000 xi5 v2.s567, v3.s0
0000 0000 0000 0000 0000 0000 0000 0000 xi4 v2.s1234
0000 0000 0000 0000 0000 0000 0000 0000 xi3 v1.s567 v2.s0
0000 0000 0000 0000 0000 0000 0000 0000 xi2 v1.s1234
0000 0000 0000 0000 0000 0000 0000 0000 xi1 v0.s567 v1.s0
0000 0000 0000 0000 0000 0000 0000 NNNN xi0 v0.s1234

Round 1

Source

0000 0000 0000 0000 0000 0000 0000 0000 xi5
0000 0000 0000 0000 0000 0000 0000 0000 xi4
0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 NNNN NNNN NNNN BBBB BBBB RRRR xi0 //bin = xi0 >> 4 & 0xFF, row = xi0 >> 12 & 0xFFF

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 0000 xi5
0000 0000 0000 0000 0000 0000 0000 0000 xi4
0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 NNNN NNNN NNNN BBBB BBBB RRRR xi0

Round 2

Source

0000 0000 0000 0000 0000 0000 0000 0000 xi5
0000 0000 0000 0000 0000 0000 0000 0000 xi4
0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 NNNN NNNN NNNN xi1
BBBB BBBB RRRR RRRR RRRR BBBB BBBB RRRR xi0 //bin = xi0 >> 24

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 0000 xi4
0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 0000 0000 0000 NNNN NNNN NNNN xi0 //row = xi0 & 0xfff

Round 3

Source

0000 0000 0000 0000 0000 0000 0000 0000 xi4
0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
NNNN NNNN NNNN BBBB BBBB RRRR RRRR RRRR xi0//bin = xi0 >> 12 & 0xff, row = xi0 >> 20

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 0000 0000 0000 0000 0000 0000 xi0

Round 4

Source

0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 0000 NNNN NNNN NNNN BBBB BBBB xi0

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 0000 NNNN NNNN NNNN BBBB BBBB xi0

Round 5

Source

0000 0000 0000 0000 0000 0000 0000 0000 xi3
0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 NNNN NNNN xi1
NNNN BBBB BBBB RRRR RRRR RRRR BBBB BBBB xi0

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 0000 0000 0000 0000 NNNN NNNN xi0

Round 6

Source

0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 NNNN NNNN NNNN BBBB BBBB RRRR RRRR xi0

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 NNNN NNNN NNNN BBBB BBBB RRRR RRRR xi0

Round 7

Source

0000 0000 0000 0000 0000 0000 0000 0000 xi2
0000 0000 0000 0000 NNNN NNNN NNNN BBBB xi1
BBBB RRRR RRRR RRRR BBBB BBBB RRRR RRRR xi0

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 0000 xi1
0000 0000 0000 0000 NNNN NNNN NNNN BBBB xi0

Round 8

Source

0000 0000 0000 0000 0000 0000 0000 NNNN xi1
NNNN NNNN BBBB BBBB RRRR RRRR RRRR BBBB xi0

Destination - Actual Memory

0000 0000 0000 0000 0000 0000 0000 NNNN xi0

Round 9

Source

0000 0000 0000 0000 0000 BBBB BBBB RRRR xi0

round 0 input:
Blake 2b (128 bytes):
07633EA25BDF0121

should be 50 bytes divided by two parts, so 25 bytes each string on input. you say 128 bytes and show 8 bytes.

for this header input strings should be

    0 302b73bb4f258cf918ad9d9555321f5f7a9e88392a1a4b674e
    1 4c420c5ecba665b5c412edda230dc93a4fce4956bc8f8dce74
    2 016ee597094962b6410ecd9d23eaf4f513d84746e956facfae
    3 9ce8cb1147f17ac3c44113d59e471b9d4c05f5f431894a9f91
    4 ed141f007be5df50dc2bd9160eac96768257196a35b1361200
    5 5d83aa700ee391015da178ff48e2a6797610891772b9896cfe
    6 aaeae37ac85690fb3d523a9f8620555f945711416bfa54c3e9
    7 ce78ce97072f416ab1654d1671d004cebba0e6fdaaf642379a
    8 a594b8f6d3c71658a668256510423d66d6a1eba2bd572350fb
    9 94399347c41cf339fbb347aeed0222bf97f78d91d0ff9a5e7b
    ...

would you please redone from here? your following notations are not clear for me.

1 Like

The 128 bytes were split up into 16 lines of 8 bytes because I thought it would be easier to read. You see I pass in 25 ulongs (ulong Blake[25]) as input into round one.

I’ll take more time to make sure what I post makes sense.

Thanks

@mtve looks like when i called BlockHeader.GetHash it was returning the wrong values, which was causing the wrong solution to be sent to the pool and potentially correct solutions being dropped.

Also the way i was doing the collision’s was not the “same” as other algorithms and I dont know if that caused also the wrong solutions to be found. I have yet to revert back to the old algo to see if that works or not.

Thanks for all your help and tips about how to debug this.

-maztheman

1 Like

Please, somebody explain this part for Dummies.
I can not figure out what the tree should look like.
How to store values and indexes …

You may find some clues in the presentation https://www.cryptolux.org/images/3/3b/Equihash-ndss.pdf slides 15-19, and in the other resources from https://www.cryptolux.org/index.php/Equihash

1 Like

Hello, I’m new to this forum. I’am a programmer with 20 years of experience and I’am very good in code optimizations. Just started to write my own Equihash solver. I am writing it in other less common language Pascal (Lazarus). Pascal is manny times much more faster than other languages in case of speed, so I want to try my coding skills. Firstly I started to write methods to communicate with pool, it is done now, secondly i have written my own Blake2b algo in pure assembler. It is faster or at least as fast as other implementations. Now I’m attempting to write Wagner’s part of Equihash but stuck on Blake2b result. I have read many of papers about Equihash but they are not clear for me or there are some omissions in these publications.

Can someone direct me to proper way of executing Blake2b?

Let’s assume that I have an input given on this page:
040000007003225F30E1015EE14609B222E6036B7572963E18355688A966BB0E00000000E53B8DC142BBF69584F26B5F1D2125B79F238E77D04A37CDFAE64D3CFE9B2E730000000000000000000000000000000000000000000000000000000000000000C85BDA595DB6121C4FFFFFFF05000000000000000000000000000000000000000000000000000000

@mtve says that input should be concatenated with 4 Bytes of hash index. So I assume it will be 144 Bytes long (140 base + 4 index) giving input to Blake2b like this?:
040000007003225F30E1015EE14609B222E6036B7572963E18355688A966BB0E00000000E53B8DC142BBF69584F26B5F1D2125B79F238E77D04A37CDFAE64D3CFE9B2E730000000000000000000000000000000000000000000000000000000000000000C85BDA595DB6121C4FFFFFFF0500000000000000000000000000000000000000000000000000000000000001

Should index strat with 0 or 1 (above is 1) ?
One of parameters in Blake2 method is output length. What should I set in this param?
If it is Blake2b-256, should I set 32 (32 bytes x 8 bits = 256). Am I right to this point?
Can someone provide result of fist Blake2b first iteration with given header?

I understand from papers that I have to truncate result to 25 Bytes? Am I right? Truncate left or right?

Can someone give me a proper explanation to this point what should be done with header, parameters to Blake2B and finally result of this function? Maybe someone has a EXE with debug mode giving also results of Blake2b?

Thanks in advance.
Lukas.

4 bytes of index should be in little-endian order, “01000000” for index 1.

index starts with 0.

BLAKE2b output length should be 50, and don’t forget so-called “personal” parameter of BLAKE2b instance which is string “ZcashPoW” and Zcash parameters N=200 and K=9 again as 4-bytes intereges in little-endian order.

Resulting 50 bytes of each BLAKE2b hash you split in the middle, which gives you two 25 bytes strings.

I believe first correct strings for this blockhead are given earlier in this thread, i can recheck if you get another strings.

1 Like

I managed to retrieve good hashes. Thanks for help. Moving on forward with miner.