After finding the MITM certificate vulnerability in a lite client last week, I finally got around to looking at the actual lite wallet protocol implementation today, and there is one high level concern that I think can be addressed but it requires some consideration, and discussion before any implementation should be attempted.
The primary problem is that, practically, the anonymity set for fetching interesting transactions (i.e. to get the memo) is too small, Many blocks only contain a handful of transactions and, regularly, only 1 or 2 shielded transactions. This means that the server can link a small set of transactions to a client. Given prior research which has shown linking of shielded transactions based on usage, it seems clear that the current strategy of per-client random transactions fetches to mask the target transaction isnāt robust.
Part of this issue is caused by the lite client implementations not using an anonymizing network to communicate with the server as assumed by the original spec. And effort might be better directed by moving in that direction which would reducing the ability of a rogue server to link requests.
Widening the threat model however, I think moving towards a mechanism where all lite wallet clients download all transactions in blocks containing smaller numbers of transactions (for the sake of argument, letās call it <= 3), regardless of interest, should go some way to mitigating this with minimal bandwidth costs and increase the practical anonymity set.
The above definitely needs more formal treatment before being implemented, so I thought I would open a discussion.
Walking Onions will definitely help make Tor a more attractive option for mobile applications by minimizing bandwidth use associated with the tor consensus, with that scalability improvement on the horizon it makes sense, in my opinion, to expose lite wallet servers as tor onion services - which would go a long way to improving the privacy properties in practice.
As I understand the problem, this would only solve a part of the problem with privacy as it relates to lite clients. Have you given any thoughts to how one might preserve the privacy of receivers in this case?
Most of these are thoughts on preserving the privacy of receivers as they are the ones who are directly fetching transactions from the lite servers and remain most at risk of both linking individual transactions to a known identifier (like an IP address) and also in linking sets of transactions together (by e.g. fetching two isolate transactions from two separate blocks via the same connection). Receivers are why I proposed that all lite client download all transactions in small blocks - to maximize the practical anonymity set.
Widening the threat model however, I think moving towards a mechanism where all lite wallet clients download all transactions in blocks containing smaller numbers of transactions (for the sake of argument, letās call it <= 3)
Found some time this morning to gather some data on this. In the last 60 days, ~27% of blocks contained just 1 transaction. Another ~20% contained 2 transactions and an additional 14% contained 3.
This means that requiring a lite client to download all transactions in blocks with small numbers of transactions results in a requirement to download 7% / 17% / 28% of all transactions depending on the transaction count parameter (i.e. all transactions in blocks with only a single transaction, or with up to 2 or with less than 3 respectively)
The question then become, does requiring all lite clients to download all transactions in blocks with just a single transaction (27% of blocks, 7% of all transactions) impose a bandwidth requirement that is insurmountable for lite clients?
Rough math, assuming 2.5kb per transaction and ~576 blocks per day, with 27% of them only containing a single transaction, gives a rough estimate of +400kb per day. Extending this up to blocks containing 3 or fewer transactions doubles this to +800kb per day per client.
That seems fairly low bandwidth usage for providing all lite clients with a much more robust anonymity set - especially considering the alternative, which is that lite clients have a 1/4 chance of having their transactions trivially linkable if they are not behind a proxy (practically probably much higher depending on the number of lite clients and the external information available to an adversary).
Yeah, shielded usage is not supposed to stay that way and trends in increasing shielded usage suggest it wonāt. It is plausible to download all memos as a stop gap now and that may be a field expedient solution.
After that, I think the simplest idea is PIR and a tweak (that i think is in the next upgrade already) to have a single bit indicating if thereās a memo to retrieve. Then you do the PIR query. SealPIR scales reasonably well I believe.
Moving my reply to the current memo discussion over to here since I think it is relevant to the wider discussion.
The problem is, its not going to be a few months for a ārealā fix. It look 2 years to ship a wallet. Paralysis will happen. Trust me, Iāve been around here for a very long time. In the meantime, weāve broken privacy by default for everyone. This isnāt acceptable.
This behavior has been around for many months already in zecwallet-lite. I agree it is unacceptable from a privacy perspective (hence this thread and attempts to mitigate it somewhat) but it is currently the default state of the ecosystem and has been for months. I think, as outlined in this thread, that there are mechanisms which can provide reasonable probabilistic guarantees about privacy in the short term - not perfect, but given the limitations of light clients perhaps the best possible given all constraints.
Instead we can at least get privacy for whoever doesnāt use memoās and can tell them not to use them on mobile if they care. And for a wallet that is marketed as being especially stealthy, we kinda should at least have a way to give people the option. It is not ideal. Thereās a risk its the only solution we build, I agree. But the alternative is worse.
People are using memos. Iāll echo my previous question again āDo you have data which suggests most zcash users donāt use memos?ā. Binding memo collection to a manual process (and thus human-action pattern of life) has the potential to do as much harm as it prevents.
Further, this statement is concerning:
Moreover, the sad reality is memoās donāt scale and are going to have to die in there current form. You canāt scan blocks to find memoās or detect payments. And scalable anonymous communication is actually far harder than hiding onchain metadata. So we either are going to have to move to a mixnet (iām skeptical of that ever getting off the ground given the history of minxes) or use out of band messaging to notify you of a payment,. The latter is far more likely and makes memos moot.
Given the number of ecosystem projects being promoted/funded by ECC and ZF that effectively make use of memos as a platform for higher level applications (Zboard, Circle, Zbay etc.) this statement surprises me (not the scale question, but the suggested future path). If that is really the medium-term future view of memos then there is a very real risk that a lot of the current and planned ecosystem investment will be rendered useless. Building privacy tools on a sandcastle that will be washed away at an arbitrary future date is worse than not building them at all, and if that is the case then it needs to be communicated properly to the ecosystem.
If we go that route, then it might be worth considering adding the memo directly into the compact block. Especially given Sarahās point about memos becoming increasingly essential.
We already have a single byte that indicates presence of a memo field: the first byte of empty memos is 0xf6, and any initial byte lower than that indicates the presence of a text memo. It makes the ciphertext field of CompactOutput one byte longer, which would be the same for a single-bit field (as the unit-of-granularity for ciphertexts is bytes), but in exchange lets us only expose txids to the lightwalletd server for transactions we know have memos (which leaks a different, strictly smaller set of information). I argued early on that we should be downloading that extra byte per output in CompactBlocks, but was outvoted due to concerns about the increased bandwidth usage in the limit of high shielded usage.
We could start caching and serving it from lightwalletd (ProtoBuf arrays are variable-length), and this would be backwards-compatible for older light clients. It will make the logic slightly more complex for newer light clients than if weād fetched it from the start, as we need to handle older lightwalletd servers that donāt provide it, but itās doable. We could similarly add the entire memo field to outputs, or have a client flag to the server requesting one of the three combinations.
Isnāt the first byte of a decrypted memo lesser than F6? How would the lightwalletd know if a memo is present without decrypting the memo or having access to the clients view keys?
My understanding is that the memo issue was introduced in the lite client protocol in order to cut down bandwidth usage by 70%. As a user I would personally rather just use ~3x more bandwidth if it means I donāt have to worry about a privacy leak to the lite client server. Why not just have all lite client wallets download the full z txs with memos included for the next year or two until there is a more bandwidth friendly protocol?
Exactly. If we add just that first byte to the current compact block definition, then clients could focus on enhancing only those transactions whose memo does not start with 0xF6. More importantly, they can surface this in the UI so users donāt have to guess which transactions to expand.
Iām probably misunderstanding something fundamental here.
How will Lightwalletd know if a Sapling Output contains a memo or not to add that byte to CompactOutput? From Lightwalletdās perspective, the memo is encrypted, so the first byte can be anything, irrespective of whether there is a memo there or not.
Only the client has the keys to decrypt the memo, so only it can tell whether thereās a memo present, but to do that, it needs to get the full 512 bytes first so it has to download the full Sapling Output anyway.
Curious - instead of this, would you just run your own lightwalletd instance (On an AWS instance or even your own RPi)? That way, you get full and complete privacy, and also get all the bandwidth savings on your mobile device.
Another option is run a full node on your desktop, and use the Zecwallet Companion App. Again, youāll get full privacy (If you connect directly), or if youāre behind a NAT, you can use wormhole for even more drastic bandwidth savings.
From your perspective, what are the pros and cons of these 2 options vs downloading everything via the liteclient?
I myself would probably opt to run my own node and connect with a companion app. I suspect a lot of other users would not have this desire or technical know-how and would choose to use more bandwidth rather than have to even think about memo usage leaking some of their data.
Iād like to see a calculation of how full of z2z txs zcash blocks would have to be before mobile app users noticed the extra bandwidth usage. I suspect the answer is we are not yet even close, but Iād really like to know for sure if it is 10x,100x, 1000x more room to z2z txs before we hit that bottleneck.
I would argue weāre already there. Letās remember that most of the world doesnāt have the high bandwidth 4g connections we have. ZecWallet on mobile downloads on the order of several MBs per week right now, and that is already a lot of data.
Just as a data point: Currently, the ZecWallet companion app has 4x the usage of the ZecWallet lite client app, even though it is available only on android, because the companion app uses only a few kB of data.
If size is being used to identify transaction types what if they were all different?
Webservers compress responses on-the-fly with gzip so if that method was used the overall size would become variable. Smaller sized transactions could also be padded (by size I mean amount of data exchanged)
(as usual, āChileBob, thats a stupid ideaā is a perfectly acceptable answerā¦ its just me thinking out loud)
Good point. Maybe we need to let users decide what trade offs they want to make between privacy and bandwidth, if they are in a region where bandwidth is a serious constraint.
For regions where bandwidth is no issue, I think a large majority of users would just say āuse the bandwidth and max out my privacy protectionsā.
Iāll be curious to know how these stats change over time, since the iOS wallet was just released yesterday in the App Store.