Can Zcash scale to a million users?

I started replying to the quote below:

and then realised that topic was mis-named (it is actually proposing a specific solution, not discussing the general question), so I’m creating this topic for the general discussion.

Someone asked me some similar questions a few weeks ago:

If 10 million people tried at the same time to make a transaction what would happen?

I responded with the following rough arithmetic, which assumes 2-in 2-out transactions (400 bytes for transparent, 2800 bytes for Sapling), as that seems more reasonable than 1-in 2-out (since it allows for merging notes as well as splitting them). I assume that all 10 million transactions are either only-transparent, or only-Sapling.

  • A block can fit 5000 transparent txs, or around 714 Sapling txs. So that’s around 67 transparent txs per second, or 9.5 Sapling txs per second.
  • If 10 million transactions were created at once, and sent to a node with the default for -mempooltxcostlimit, all but 20,000 transactions would be dropped.
    • The mempool size limit exists for DoS mitigation, per ZIP 401. Individual node operators can bump this limit, but the default is set such that the mempool can hold 40 block’s worth of transactions, which is the default expiry height.
    • At these transaction sizes, both transparent and Sapling txs have the minimum mempool cost, under current cost weightings (that were selected to ensure Sapling transactions were not evicted from the mempool preferentially over transparent transactions).
    • If spread uniquely across the 150-200 good zcashd nodes we have, that could be up to 3-4 million txs in the global mempool.

Now assume no mempool size restrictions; all 10 million transactions are in the mempool.

  • If they were all Sapling, it would take around 14,000 blocks to mine them all (over 12 days).
    • The default transaction expiry is 40 blocks, and almost all Sapling transactions use the default, so we could realistically only mine around 28,500 transactions before the remainder expired. This is intentional behaviour, per ZIP 203.
  • If they were all transparent, it would take around 2000 blocks to mine them all (a bit under 2 days).
    • AFAIK transparent transactions usually don’t bother to set a transaction expiry (because they are created with Bitcoin libraries for which expiry is a foreign concept, and we didn’t make transaction expiry a required field), so the funds would be stuck in the mempool until the transactions were mined.

You are correct: because detection of inbound notes cannot currently be performed without trial-decrypting every output in the chain, the time to sync a wallet to its fully-accurate balance are linear in the number of new Sapling outputs (and thus will increase both with length-of-time since last sync, and with more Sapling transactions per block).

One way around this is to move detection out-of-band, by having the sender tell the recipient where on-chain the transaction is mined (and optionally give them the note directly, though the recipient still needs to have the chain data to create a witness). But that requires an interactive channel, and doesn’t help for non-interactive payments.

Another way around this, at least from a usability perspective, is to not require wallet balances to always be fully-accurate. I have a design I’m working on for a way to sync wallets that should be orders of magnitude faster in the average case, at the expense of a slightly more complex UX (which I believe @geffen can handle :upside_down_face:). It still doesn’t avoid the linear scan for detecting new notes, however.

Moving to ideas that require network upgrades: if we can figure out a way to make detection keys usable, this would allow light clients to offload detection to a third party. The third party would learn which transactions belong to the user (possibly with some noise depending on the protocol), but would not learn transaction contents. If the protocol were efficient enough and suitably compatible, it might be possible to execute it in a PIR scheme to greatly reduce what the third party learns.

None of this addresses the on-chain scaling issue: there’s a limit to how much data we can shove into blocks, and how long a block can take to verify. I’m not going to go into that right now though, because it’s 1:20am :sweat_smile:

7 Likes

I spend about half the post discussing and then answering the question, then propose a solution for how Zcash can handle that scale and more with greater ease (imo). So I think the post is aptly titled :slight_smile: But happy to accept alternate title suggestions if it helps keep the forum better organized.

This could be interesting to have as a “best case scenario” option for users. I’m imagining a service like Signal, where the username is the user’s shielded address and messages are encrypted directly to the address as they are with shielded memos. A “sealed sender” type of feature would protect sender anonymity. The user’s wallet will receive the notification right away or next time they turn their device on/connect to the internet, and know exactly where to find their transactions. Then as you say, for users who don’t have such an interactive channel they will have to use existing protocols (or others under development) to get their wallet balances.

I’ve been wondering if anyone is working on PIR for wallet balance info, good to know some folks are thinking about it!

Thanks for the reply @str4d, interesting ideas to think about here.

Awesome post. I think a thing many people forget is Zcash’s privacy tech isn’t likely a scaling limitation for Zcash. Its just standard consensus problems.

Snarks might be slow to prove, but they are fast to verify and some software optimizations can make them way faster. Moreover, even in Bitcoin, transaction verification is not remotely the bottleneck.

For zcash: you do need to remove the need to scan the blockchain to detect payments e.g. with payment URLs sent in signal/whats app/ any messenger or maybe we build a mixnet into zcash. But after that, things are simple.

To show zcash can scale, i think its useful to consider scaling Zcash in the dumbest way possible : bigger blocks. Not because this is a good idea, but because it’s so simple we can see what breaks and extrapolate to find what we need to solve. So what does zcash with say 100x bigger blocks look like(or if thats too big ,10x).

Well, we probably should make the TX’s as small as possible, so we will drop the memo.
Now look at data usage, tx verification time, etc.

Its 1AM, so we can wait and do the math later. But the answer here is illuminating. The answer I think is mostly it works on reasonable hardware. Bandwidth usage goes up. And it gets much harder to validate the chain from zero as a new node. But we have checkpoints and fly client lets us probabilistically check the chain. So then the question is what breaks and what can we learn from this thought experiment.

6 Likes

This is a really useful way to think about it.

Another way is, in the Tor ecosystem, there are several chat apps like Ricochet that use Tor onion services to send messages. And it seems like adding Tor is a prudent next step for network layer privacy in Zcash anyway.

Once we add Tor and most Zcash wallets have a Tor daemon running, a user’s Zcash address can become “z-address@onion-url”. If both users are online, delivering the transaction via Tor is fast and easy.

Keeping an iOS app permanently connected to Tor is not possible yet, but there’s work happening on this and there’s a path forward. And in the meantime you could have some centralized sealed-sender + push notification scheme like the one above that would reveal the recipient to the server but not the sender.

Or you could have some dumping ground for undeliverable messages that was more efficient to sync than “download and check all messages” and relied more on tor for protecting the recipients privacy.

Having an in-band way of sending p2p messages privately also makes it really easy to retire the memo field.

@secparam — we were talking about the syncing bandwidth requirements of a client that was capable of receiving all tx’s out-of-band. What were they? Like 1 MB / day?

2 Likes

With a depth 32 Merle tree, 64 bytes per hash, and 175 second blocks
(32 *65) bytes per block * 1/175 blocks per second * 1 day = 1.01mb.
This is stylized facts view of Zerocash/zcash. Implementation details will very, but not my more than an order of magnitude. So worst case, it uses as much bandwidth as visiting twitter’s front page 5 times.

And that holds no matter how big you make the blocks (just you will fill up the Merkle tree faster and need to switch to a new one sooner if they get stupidly big). And whether you use groth16 or halo for proofs.

and @john-light you are describing exactly what payment URLs enable. Just we don’t even need signal to support Zcash to do it.

Yep, I also had ricochet in mind when I wrote that comment. And cwtch, which I think also has features for asynchronous messaging using central servers.

Yes I like the idea :slight_smile: ideally there will be some standard e2e encrypted and anonymized way to send the payment URL to the recipient, so that people don’t revert to non-private methods to sending the URLs thus defeating the purpose of the shielded tx.

I see what you’re saying , but surprisingly, it’s neither needed nor possible. You want anonymous communication between a seller and a buyer. But Zcash isn’t a communications tool, its a payment tool with a notification feature.

if you chat back and forth with someone about buying their WoW account and then pay them in shielded Zcash, are you anonymous? Well, if you used SMS for the chat, no, the seller learns 203-555-1234 bought the WoW account from the chat and they can link that to the shielded payment they received even if you didn’t give it to them over SMS. If you chatted over an anonymous channel, ok then you are good. They still can link the channel to the payment, but they don’t know who you are. So all Zcash needs to do is be compatible with however you are communicating. If thats anonymous great, if not well at least on chain is privacy and thats the best you could do.

1 Like

The reason I propose an anonymous communications channel is not to keep the sender anonymous from the seller (though for certain payments, like Alice visiting a website via Tor Browser to purchase an eBook or donate to a nonprofit, that could be possible) but rather to keep the transaction private from everyone else. It would be unfortunate if Verizon or Facebook or Google or whoever ends up with copies of all of Alice’s payment URLs, or in the case of non-anonymous e2e encrypted messaging providers, metadata showing that Alice sent a message to QuestionableMerchant or ControversialNonprofit.

1 Like

Ah, i see. So ideally yes. But again, if you don’t talk to the seller over an anonymous channel, Verizon or Facebook or Google or whoever ends up being able to infer this from metadata in many cases.
So it may be acceptable to say: here is a payment URL. IF you care about privacy, you should only talk to merchants over anonymous channels and send this over that.

Scaling is fine but what about the blockchain size that’s gonna be humongous.

Not as much of a problem as you think. We have checkpoints, so no one needs to download and verify from zero. You can probabilistically check blocks before the checkpoint using fly client, so we still get verification of the checkpoint. Beyond that, you can start giving compact proofs that blocks are correct. For Pollard/Halo, you’d do this with a recursive proof. For sapling you can easily do this with inner pairing product proofs , which are insanely compact. Just to use the numbers i have from a research paper i’m working on, ~16 thousand Groth16 proofs are ~5 MB (300 byes * 2^14). The aggregate proof over them is is about 10 times smaller. And this gets better as you add more proofs and takes ~150ms to verify as opposed to about a minute.

7 Likes

I didn’t realize that. In what ZIP are they described?

Or do you mean the assumevalid parameter inherited from bitcoin, which lets the developers bless a specific block as “you can trust all signatures leading up to this block (i.e. skip their validation), but you must still do all other checks on its history (e.g. no inflation)”.

1 Like

I will refer to @str4d for all technical details, but my understanding is they started out as the Bitcoin style ones, but we 1) keep updating them 2) we’ve gotten much more aggressive about how little you check before a checkpoint, now up to just trusting it . What the default level of checking in, I am unsure.

But the point is, ideologically, Zcash is not wedded to everyone must check all history from genesis and we have checkpoints. So used slightly more aggressively, we can deal with massively larger blocks and avoid having to download the entire chain. You simply use the flyclient trees to probabilistically check some blocks before the checkpoint, but otherwise trust the checkpoint. This makes you an order of magnitude or more scalable than Bitcoin because their main limitation is not individual block size, but initial block download (IBD).

Now your problem is when either 1) the p2p network cannot handle the TX volume 2) people can’t handle checking whole blocks. If whole blocks are the problem, we can add audting proofs or allow probabilistic checking inside a block.

2 Likes

Does that mean a substantial number of devs attest to having personally fully verified the history up to that block and sign for it with their well publicized public key, as is the case for Bitcoin?

Good question. Trusting the checkpoints is the same as trusting the binary. Since the binary actually defines what is a legitimacy check. So it comes down to who signs the builds and that I don’t know. @zebambam would.

As I said, the interesting thing is we can actually, (if we trust the binary/when we have multiple clients) use fly client to probabilistically check blocks before the latest checkpoint. If everyone checks some fraction at random and announces failures, you get the whole chain audited and incentivize people to store it

Flyclient only let’s you verify the cumulative difficulty claimed by some block. It doesn’t have anything to do with transaction validity. That’s why bitcoin’s assumevalid still processes all previous blocks; so that it can build the correct UTXO set.

How do you propose to get the correct UTXO set without processing the entire history leading up to a checkpoint?

Good question,

So my understanding is flyclient, in order to let you check PoW hashes, also ended up letting you efficiently check that a given block is in the chain. So this lets you audit the correctness of individual blocks you pull down modulo double spends.

If we are talking about the Zcash UTXO set, blocks commit to the merkle root over them. So you can check the correctness of consecutive blocks. Which means if everyone pulls down a random number of (block_i,header(block_i+1)) , then we collectively get assurances that 1) blocks are correct 2) the UTXO set is correct.

However, that is, morally speaking, “half” the UTXO set in Zcash, since you need the nullifier set to check double spends. So we’d need block headers to commit to that as well and then some auxiliary data for non membership proofs (which are 1kb per tx) for checking (that need not be in blocks).

You can do something similar with the UTXO set for transparent TXs if you added the merkle trees, but not sure the best way to do it. Hopefully by the time we get scale, transparent TXs like that are gone.

Ok, so with some tweaks, we get a full probabilistic audit of shielded, right now we get a partial probabilistic one. Thanks for pointing that out.

What’s the question is whats the value of a partial audit? This is hard.

Whats the value of checking from genesis at all?

  1. If you trust the software, you should trust the checkpoints to be honest since its the same devs. Now, the devs/checkpoints could be incorrect, but again if they are produced from the software you trust… you’d need a persistent multi-month long network partition and massive amounts of mining power. Which is … both unlikely and catastrophic.

  2. If you find an error early in the chain, say a double spend that makes 100k BTC or zec, and its been spent, what are you going to do about it? At least in BTC you could theoretically undo the TXs and take everyones money, even as it fanned out to touch huge numbers of UTXOs. I think this is utterly unrealistic given the amounts of money involved now. But for zcash, you can’t even do that in theory for private TXs. So you’re only choice is reset back to the bug. Which even for Zcash today, is close to impossible. And certainly for anything operating at large scale.

  3. There’s some existential value in making people keep around the blocks so they can check there wasn’t inflation. Per 1 and 2, I think this is mostly a myth. But regardless, Zcash doesn’t have the same mythos as BTC: we aren’t the church of there must be no inflation and we will sacrifice on chain privacy, fees, and scale for a higher assurance of that. (And for checkpoints, I question how much assurance avoiding them actually gets you)

It’s not just a notification feature, is it? You actually need the data in the “notification” to be able to spend the money, right?

I think what john-light is saying here is really important from a product standpoint given user expectations from how Bitcoin works. Bad things are gonna happen if you just let them share the payment via whatever mobile app is on their phones.

But if it’s built into another communication tool that the user trusts then yes, you don’t have to worry about it. You could make a new Zcash SDK, that is way more functional and scalable than anything we have now, for use in apps that have their own messaging layer.

Or you could picture breaking off groups of people to work on messaging layers using whatever the most promising approaches were Tor/Ricochet/Cwtch or Mixnets.

Somebody should apply for a ZOMG grant for this!

2 Likes