Pedersen Hash Collision Resistance

musi00 · May 23, 2019, 5:46pm

The Pederson hash function Pederson(D,M) is defined on a personalization input, D, and a the message, M, that we want to hash. According to the zcash protocol document the Pedersen hash has the following collision properties:

I would like to first confirm my understanding of the the above statement:

There is no way to cause a collision between Pedersen(D,M) and Pedersen(D’,N) where D not equal D’ for all M,N
If someone knew the personalization input D he can use use a message N with a different length than M to cause a collision (i.e Pedersen(D,M) = Pedersen(D,N)

Suppose we had a commitment scheme where the Pedersen hash is used to hash the note into a commitment. In this scheme, a user would prove ownership of the commitment on the Merkle tree by proving that he has a note that hashes to the commitment. Would it not be possible in such a scheme for an attacker to falsely prove that he has a note that hashes to a commitment on the Merkle tree. He would need to create a fake note with a different length than the real note and by #2 he can cause a collision.

If the above is possible how does the zcash protocol protect against such an attack?

daira · May 24, 2019, 9:26am

1 and 2 are correct. 1 is a stronger property than the spec claims. The spec claims nothing about collisions across different personalisations. If the hash were allowed to take zero-length inputs, then those would have the same hash for different personalisations, but actually it’s only defined on nonzero-length inputs. It is possible to prove --although it isn’t proven in the spec-- that for such inputs, property 1 holds based on hardness of discrete log in the random oracle model (for BLAKE2b as the random oracle).

The Zcash protocol never uses the hash with a variable-length input.

daira · May 24, 2019, 1:03pm

Note that if you want to use this hash for variable-length inputs, it is sufficient to prefix (not suffix) the input with a fixed-length encoding of the length.

musi00 · May 24, 2019, 3:03pm

Thank you for your reply Daira,

You say, “The Zcash protocol never uses the hash with a variable-length input” but would it be possible for an attacker to somehow modify his input to the Pedersen hash to change its length. Is there a length check in the proof protocol to prevent this?

daira · May 25, 2019, 5:32am

No that would not be possible. There doesn’t need to be an explicit check, because all inputs to the Pedersen hash and to the Pedersen commitments are constructed from fixed-length field encodings.

ytrezq · July 11, 2023, 10:22pm

As an additional question : where I can find the Pedersen implementation used in the Zcash circuits ?

james_katz · July 11, 2023, 11:03pm

Maybe here ?

ytrezq · August 27, 2023, 4:47pm

How is discrete log related for finding a collison? I m meaning finding a logarithmic relation between which numbers and how.

This is because I found an implementation that use 62 bits numbers for outputing 124 bits hashes (62 bits X and 62 bits Y). That seems weak to me but that s still too strong to bruteforce (without conterexamples the scheme is assumed resistent).

earthrise · August 28, 2023, 12:27am

That sounds extremely weak to me too. If the output of the hash is a curve point with 62-bit coordinates, there are, depending on the curve, at most ~2^63 valid hash output points (because for each valid x there are at most two possible valid values of y, one positive and one negative), so the hash would be vulnerable to a birthday attack requiring at most ~2^32 effort. It should be trivial to find a collision.

Cris83 · August 28, 2023, 8:08am

Question, why are you square rooting that 2^62? If 2^32 is the real effort here we will assist to some collision very soon.

earthrise · August 28, 2023, 3:39pm

If a hash function’s output space is n bits, then you can find a collision in 2^(n/2) work using a birthday attack. The intuition for why this works is that given k hash function outputs, there are around k^2 pairs between them, and each pair has a 1/2^n chance of colliding. So given 2^(n/2) outputs it’s likely that there will be a collision somewhere in that set of outputs, and you can use pollard’s rho algorithm to find a collision without using much memory.

ytrezq · September 21, 2023, 6:01pm

No, as the packed coordinate forms a 124 bits hash (62 bits x and 62 bits y on the BabyJubjub curve with a 63 bits finite field). So that the system works in an environment dealing with 128bite max Integers.

Finding the discrete log seems to be the tool as it would be too large for a birthday attack. But if I find the discrete log over BabyJubjub between Generators with that by the way 4 bits window implementation, how to use it in order to create a collision on a same bits length number (496 bits)?

earthrise · September 21, 2023, 10:41pm

A 124-bit hash produced by using 62-bit x and y coordinates would not actually have the same security as a 124-bit hash, because there are far less than 2¹²⁴ points on such a curve, as explained above. I’m a bit confused because BabyJubjub’s coordinates are ~254 bits, not 62 bits.

Perhaps you’re looking at an implementation of BabyJubjub that uses four 64-bit integers to represent a field element? Or it’s taking some part of the coordinate to use in the hash?

Feel free to DM me the name of the project and I’ll check if there’s a problem and report any I find to the developers.

hanh · September 21, 2023, 11:14pm

There is a relation between x and y. There would be at best 63 bit of entropy.

ytrezq · September 22, 2023, 2:49pm

Yes, I was talking about hash length, not strengh. But for birthdays attack there s far more than 2**32 as a result.

Just explain me what I can do with the discrete logarithm relation between the generators in order to build a collison.

ytrezq · September 22, 2023, 2:54pm

Yes, but that doesn t change to blind bruteforce the packed coordinate. Except for negative cases.

earthrise · September 22, 2023, 8:43pm

The birthday attack should still work even if the hash length is 124 bits. I’d need to see a precise spec of the hash function to know how to turn a discrete log relation into a collision for it.

ytrezq · September 23, 2023, 7:01am

4 bits Windows were the flipping bit is the 4th bit and shorter chunks per generators (more exactly, each chunks use 10 Windows from input).
I m meaning each independant chunks is seeded by a starting point on the babyjubjub curve (x,y). Then babyjubjubadd those resulting chunks in order to get the final point to be packed.
Each chunk is independent in the way that bits from that aren t part of the chunk don t influence the result/coordinate of the chunk.

Those seeding points are genererated pseudorandomly where what matters is the coordinate has to be on the curve (then used statically in the code).

By the way, even if it is that time impossible, I don t understand how finding the discrete log on the Pedersen generators of Zcash can affect Zcash too.

Topic		Replies	Views
Batching Cipolla-Lehmer-Müller's square root algorithm with hashing to elliptic curves Applications	15	845	May 5, 2023
Hiring programmer to modify Python code for different n,k parameters Mining	2	970	February 17, 2020
Pairing crate for Rust Community Collaborations	0	1687	October 12, 2017
Zcash VS Bytecoin Technology	4	5344	February 4, 2016
[RFC] Explicit UTXO Selection in z_sendmany for Greater Control zcashd client	4	112	March 11, 2025

Pedersen Hash Collision Resistance

Related topics