I’m hoping someone can help me with this. I’ve written the code twice, two different ways but still getting the same problem. i.e. Way too many duplicates! Is it normal that after every round the resulting hashes increase? I’ve tried various ways to cut down on duplicates but nothing seems to work. Either I end up with 2 million solutions at the end or no solutions or trillions of solutions.
Just to confirm in case I’m making a fundamental error… after sorting by the first 20 bits, say you have 4 colliding hashes: A, B, C, D. Out of those 4 I’m making 6 index tuples e.g.
A.B
A.C
A.D
B.C
B.D
C.D
That’s correct right?
I’ve tried knocking out collisions based on a histogram table ( > 13) but it doesn’t help much. I’ve also tried knocking out those that result in solution too early. Again doesn’t solve the problem. I ether get too many solutions or it runs out of hashes before it gets to step 9.
You should only combine A.B where the indices in A and B are disjoint.
Since testing index disjointness is somewhat involved, you can instead use the following trick.
// To eliminate trees with duplicated indices,
// we simply test if the last 32 bits of the xor are 0,
// and if so, assume that this is due to index duplication
// In practice this works very well to avoid bucket overflow
// and produces negligible false positives
Thanks for the tip! Just to be clear, the “last 32 bits of the xor” includes the 20 bit word? So in other words after the xor, you expect the 20 bits to be zero but if there are an additional 12 bits zero then assume duplicate indices and discard?
No, it doesn’t include the 20 bit word. This is the code testing whether the final 32 bits of the xor will be 0 (the current hashes have prevhtunits many words):
// test whether two hashes match in last TREEBITS bits
bool equal(const htunit *hash0, const htunit *hash1) const {
return hash0[prevhtunits-1].word == hash1[prevhtunits-1].word;
}
Appreciate the clarification. One more question, how effective is this method at getting rid of false positives due to duplicates? e.g. If a particular block header has only one correct solution, is it likely that only one solution will be produced? I’m fiddling with my code and thanks to your help I’m getting 8 solutions (as opposed to millions). I’m wondering if a few false positives are inevitable? (requiring checking every solution).
You’ll of course need to do a full index disjointness test in the final round. But that will only eliminate a tiny fraction, as most of what you have at that point are real solutions.
I’m getting 8 solutions
That’s just one instance. Try running a thousand different instances and you’ll see the average number of solutions is closer to 1.89