Long-term technology roadmap for scalable financial privacy

Cryptographer James A. Donald, who back in 2008 became the first person to comment on Satoshi’s proposal for Bitcoin, recently shared his thoughts on making financial privacy scalable. I thought it would be interesting to share an edited version here.

Does the Zcash community have a similar vision of scalability for privacy-enabling blockchains? If so, how much does it have in common with Donald’s?

For full scalability, we want to end with a proof that a merkle root is the root of the set of valid transaction outputs, with no one person having the full set of valid transaction outputs, nor the full set of transactions necessary to prove that those outputs are valid. Rather each full peer keeps around those branches of the merkle tree that contain the outputs and transactions that he cares about.

[For that], you need a proof generation and verification engine that operates on a virtual machine.

And you need a programming language for this virtual machine, so that you can do actual operations on immense piles of data, and produce a concise proof that you did the operation on a pile of data presented by the root of a merkle tree, that the merkle tree has a certain property – even though no one party has the full, and potentially vast, merkle tree, rather every party has those branches of the merkle tree that he cares about.

We want to prove that each transaction output is an output from a transaction that is facially valid, its inputs are equal to its outputs, that each transaction input was consumed in one, and only one transaction, and that this merkle root represents a collection of transactions with a certain weight of stake or weight of work, so that if there are competing merkle roots, which should happen from time to time due to one merkle root containing one branch of a double spend, and another merkle root containing the other branch of a double spend, everyone takes the heaviest root as authoritative, and building the next merkle root on the heaviest merkle root of which he is aware, so that the total weight grows forever, each merkle root having its own weight, plus the weight of all the roots on which it is built.

Fundamentally, ZK-Starks prove that the prover knows a solution to an NP-complete problem. Unfortunately the proof is quite large, and the relationship between that problem, and anything that anyone cares about, extremely elaborate and indirect.

So you need a language that will generate such a relationship. And then you can prove, for example, that a hash is the hash of a valid transaction output, without revealing the value of that output, or the transaction inputs.

But if you have to have such a proof for every output, that is a mighty big pile of mighty big proofs, costly to evaluate, costly to store the vast pile of data.

So, rollups. You produce a proof that you verified a pile of proofs. You organize the information about which you want to prove stuff into a merkle tree, and the root of the merkle tree is associated with a proof that you verified the proofs of the direct children of that root vertex. And proof of each of the children of that root vertex proves that someone verified their children. And so forth all the way down to the bottom of the tree, the origin of the blockchain, proofs about proofs about proofs about proofs.

And then, to prove that a hash is a hash of a valid transaction output, you just produce the hash path linking that transaction to the root of the merkle tree. So with every new block, everyone has to just verify one proof once. All the child proofs get thrown away eventually.

Which means that peers do not have to keep every transaction and every output around forever. They just keep some recent roots of the blockchain around, plus the transactions and transaction outputs that they care about. So the blockchain can scale without limit.

The root language for zero-knowledge rollups needs to be a language that drives a virtual machine operating on merkle trees, and written in this root language, there needs to be a domain specific language for each particular blockchain, which blockchain must be structured as a roughly balanced merkle tree, rather than as a chain.