Pedersen Hash Collision Resistance

The Pederson hash function Pederson(D,M) is defined on a personalization input, D, and a the message, M, that we want to hash. According to the zcash protocol document the Pedersen hash has the following collision properties:

I would like to first confirm my understanding of the the above statement:

  1. There is no way to cause a collision between Pedersen(D,M) and Pedersen(D’,N) where D not equal D’ for all M,N
  2. If someone knew the personalization input D he can use use a message N with a different length than M to cause a collision (i.e Pedersen(D,M) = Pedersen(D,N)

Suppose we had a commitment scheme where the Pedersen hash is used to hash the note into a commitment. In this scheme, a user would prove ownership of the commitment on the Merkle tree by proving that he has a note that hashes to the commitment. Would it not be possible in such a scheme for an attacker to falsely prove that he has a note that hashes to a commitment on the Merkle tree. He would need to create a fake note with a different length than the real note and by #2 he can cause a collision.

If the above is possible how does the zcash protocol protect against such an attack?

2 Likes

1 and 2 are correct. 1 is a stronger property than the spec claims. The spec claims nothing about collisions across different personalisations. If the hash were allowed to take zero-length inputs, then those would have the same hash for different personalisations, but actually it’s only defined on nonzero-length inputs. It is possible to prove --although it isn’t proven in the spec-- that for such inputs, property 1 holds based on hardness of discrete log in the random oracle model (for BLAKE2b as the random oracle).

The Zcash protocol never uses the hash with a variable-length input.

3 Likes

Note that if you want to use this hash for variable-length inputs, it is sufficient to prefix (not suffix) the input with a fixed-length encoding of the length.

3 Likes

Thank you for your reply Daira,

You say, “The Zcash protocol never uses the hash with a variable-length input” but would it be possible for an attacker to somehow modify his input to the Pedersen hash to change its length. Is there a length check in the proof protocol to prevent this?

2 Likes

No that would not be possible. There doesn’t need to be an explicit check, because all inputs to the Pedersen hash and to the Pedersen commitments are constructed from fixed-length field encodings.

As an additional question : where I can find the Pedersen implementation used in the Zcash circuits ?

Maybe here ?

2 Likes

How is discrete log related for finding a collison? I m meaning finding a logarithmic relation between which numbers and how.

This is because I found an implementation that use 62 bits numbers for outputing 124 bits hashes (62 bits X and 62 bits Y). That seems weak to me but that s still too strong to bruteforce (without conterexamples the scheme is assumed resistent).

That sounds extremely weak to me too. If the output of the hash is a curve point with 62-bit coordinates, there are, depending on the curve, at most ~2^63 valid hash output points (because for each valid x there are at most two possible valid values of y, one positive and one negative), so the hash would be vulnerable to a birthday attack requiring at most ~2^32 effort. It should be trivial to find a collision.

2 Likes

Question, why are you square rooting that 2^62? If 2^32 is the real effort here we will assist to some collision very soon.

If a hash function’s output space is n bits, then you can find a collision in 2^(n/2) work using a birthday attack. The intuition for why this works is that given k hash function outputs, there are around k^2 pairs between them, and each pair has a 1/2^n chance of colliding. So given 2^(n/2) outputs it’s likely that there will be a collision somewhere in that set of outputs, and you can use pollard’s rho algorithm to find a collision without using much memory.

2 Likes

No, as the packed coordinate forms a 124 bits hash (62 bits x and 62 bits y on the BabyJubjub curve with a 63 bits finite field). So that the system works in an environment dealing with 128bite max Integers.

Finding the discrete log seems to be the tool as it would be too large for a birthday attack. But if I find the discrete log over BabyJubjub between Generators with that by the way 4 bits window implementation, how to use it in order to create a collision on a same bits length number (496 bits)?

A 124-bit hash produced by using 62-bit x and y coordinates would not actually have the same security as a 124-bit hash, because there are far less than 2124 points on such a curve, as explained above. I’m a bit confused because BabyJubjub’s coordinates are ~254 bits, not 62 bits.

Perhaps you’re looking at an implementation of BabyJubjub that uses four 64-bit integers to represent a field element? Or it’s taking some part of the coordinate to use in the hash?

Feel free to DM me the name of the project and I’ll check if there’s a problem and report any I find to the developers.

There is a relation between x and y. There would be at best 63 bit of entropy.

1 Like

Yes, I was talking about hash length, not strengh. But for birthdays attack there s far more than 2**32 as a result.

Just explain me what I can do with the discrete logarithm relation between the generators in order to build a collison.

Yes, but that doesn t change to blind bruteforce the packed coordinate. Except for negative cases.

The birthday attack should still work even if the hash length is 124 bits. I’d need to see a precise spec of the hash function to know how to turn a discrete log relation into a collision for it.

4 bits Windows were the flipping bit is the 4th bit and shorter chunks per generators (more exactly, each chunks use 10 Windows from input).
I m meaning each independant chunks is seeded by a starting point on the babyjubjub curve (x,y). Then babyjubjubadd those resulting chunks in order to get the final point to be packed.
Each chunk is independent in the way that bits from that aren t part of the chunk don t influence the result/coordinate of the chunk.

Those seeding points are genererated pseudorandomly where what matters is the coordinate has to be on the curve (then used statically in the code).

By the way, even if it is that time impossible, I don t understand how finding the discrete log on the Pedersen generators of Zcash can affect Zcash too.