Private Programmability in Zcash - Research Results and Community Discussion

bgillespie · June 20, 2024, 5:18pm

Dear Zcash Community,

Today we are excited to share with you the results of a year-long research effort, funded by the Zcash Foundation, to survey and systematize technologies for programmable privacy in the blockchain space, aiming in particular to provide analysis and recommendations pertinent to programmability for the Zcash ecosystem. Collaborators on the project are @LeCryptoMath and myself from Inversed Tech, @therealyingtong from Geometry Research, and @amiller from UIUC.

Our goal in this post is a) to summarize our main research findings, b) to give our team’s recommendations for potential paths forward for enabling expressive programmability on Zcash with strong native privacy primitives, and c) to open the floor to the community for an ongoing discussion about the future of privacy-oriented programmability in the Zcash ecosystem.

The following points summarize our main findings and recommendations for Zcash at an overview level; we’ll discuss these recommendations in more detail later in the post.

Programmable privacy can be enabled in Zcash with limited consensus changes, mainly support for generic zk-SNARKs and flexible data commitments. Commit-nullify privacy remains state-of-the-art for off-chain/on-chain computation, and can be directly extended to a programmable architecture.
A major design axis that requires discussion by the broader Zcash community is the level of on-chain resources such as computation and data storage that the protocol makes available to application developers.
Programmable Zcash should provide a Domain Specific Language (DSL) and high-level tooling to codify and enforce protocol-specific requirements for application logic, and to shield developers from sharp edges and implementation details of underlying cryptographic machinery.
Programmable Zcash should include first-class support for a Mediated Computation layer – intermediate computations which make use of cryptographic techniques and/or secure hardware to allow synchronous privacy-preserving computations over private data inputs from multiple users – and the Zcash community should consider developing an enshrined default implementation.

To learn more about our findings beyond the Zcash-specific conclusions and recommendations found in this post, feel free to also check out our SoK manuscript with detailed framework and analysis, SoK: Programmable Privacy in Distributed Systems, as well as recent presentations by @therealyingtong and myself at ZconV, The Path to Shielded Programmability, and by @therealyingtong and @LeCryptoMath at ZKProof 6 in Berlin, SoK: Private Programmability on Blockchain Systems.

Finally, we’d like to thank several individuals for valuable conversations and feedback over the course of this project: Pablo Kogan (@PabloK) from QEDIT on the design of the ZSA Swap functionality, Chris Bender and Joey Kraut from Renegade on their novel MPC-based dark pool protocol, and @teor for input on recommendations for the Zcash ecosystem.

Research Findings

During this project, our group studied a wide selection of decentralized blockchain protocols supporting complex application logic with user data confidentiality. Our aim in studying these protocols was to improve our understanding of the concrete mechanisms and cryptographic primitives available for implementing privacy-enabled applications, and to formulate a common framework for organizing and analyzing protocol designs allowing reasonable comparisons between different approaches.

While analyzing various protocols, we identified a common organizing structure that seems to apply broadly across applications and domains. An application aims to periodically produce global state updates, which fairly and systematically integrate inputs from various users. These inputs may be shielded both from other users and from protocol operators. Each such state update “epoch” is subdivided into three phases with different privacy and synchrony properties:

Phase 1: Independent Computation – protocol participants encode private computations over their own data plus public data to produce transaction inputs/application intents
Phase 2: Mediated Computation – network operators ingest user intents in a privacy-preserving way to conduct computation over private data from multiple different participants
Phase 3: Global Computation – public computation outcomes are executed against the global application state

These phases roughly correspond with different classes of techniques used to conduct computations on distributed blockchains. Global Computation corresponds with transparent state updates on-chain, Independent Computation corresponds with single-party cryptography like digital signatures and zk-SNARK proofs, and Mediated Computation corresponds with more sophisticated multi-party cryptography like Secure Multiparty Computation (SMPC) and Fully Homomorphic Encryption (FHE), or hardware solutions like Trusted Execution Environments (TEEs). As a modeling tool, in Section 3 of our SoK Paper we describe a modular ideal functionality representing this three-phase epoch model which can be used to organize and analyze the privacy and security properties of concrete protocols in terms of this framing.

Notably, chains’ support for programmable privacy seems to lie in a rough progression with the following main stages:

Non-private programmability – implements fully programmable Global Computation phase, plus limited Independent Computation functionality (examples: Ethereum, Solana)
Asynchronous private programmability – implements fully programmable Global and Independent Computation phases, omits Mediated Computation phase (examples: Mina zkApps, Aztec, Polygon Miden)
Synchronous private programmability – implements fully programmable Global, Independent, and Mediated Computation phases (example: Secret Network)

The now-common design of “programmable VM plus zk-SNARKs” is typical for the second of these types, and achieves a lot of what is needed from a fully programmable privacy solution. The distinction between the second and third types is that private application logic executed off-chain and validated by single-party cryptographic primitives like zk-SNARKs leaves a gap in the way that private data can be processed in the network. Such primitives require the off-chain processor to have direct access to computational inputs, and as such do not allow for application logic which requires synchronous processing of private inputs from multiple users. This limitation can sometimes be overcome by careful design of application logic and incentive structures for specific use cases (especially those with low interactivity), but for some cases where there is a strong need for synchronous integration of private inputs, a Mediated Computation phase is needed.

One of the most common applications benefiting from a Mediated Computation protocol is the limit order auction, in which parties submit private bids to trade different types of assets, specifying requirements for the clearing prices of executed trades. In such contexts, trading information is generally strategic, so participants prefer an environment in which their bids can be algorithmically matched without revealing price information to counterparties. In particular, in a trustless and anonymous environment, it is difficult to ensure that an intermediate compute node with access to such information does not then choose to become a counterparty themselves, with access to privileged trading information. A Mediated Computation protocol allows computations of this type to take place while preventing node operators from accessing confidential inputs.

Design Considerations and Recommendations

A few major design areas come up frequently when studying the architectures of different privacy protocols. Some of these areas have tended to settle on established best-practices, while others highlight design spaces exhibiting significant engineering trade-offs that need to be considered thoughtfully.

Application Development Model

At a high level, protocols need to decide how widely they make their programmability layer available to developers. This relates to a trade-off space between “low barriers to entry and a dynamic application development ecosystem” and “well-vetted walled garden with more limited applications of a consistent high standard of security and privacy”. At one end of the spectrum, a protocol can provide generic public tools for application development along with mechanisms for permissionless deployment on chain which allow anyone anywhere to deploy application logic of their choosing. On the other hand, a protocol could choose to support only applications which are deployed in a controlled or centralized manner, by protocol maintainers or some formalized system of governance, with processes in place to ensure quality.

Recommendation. Both sides of this design space are relevant to Zcash’s community values, so the question of where Zcash programmability should lie definitely should be discussed. Open development and permissionless deployment are aligned with the basic ethos of a fully decentralized and censorship-resistant privacy protocol, but on the other hand, Zcash has high standards for base level privacy and security properties on the protocol, which protocol users have come to expect. An option to consider is an open development model along with some formal mark of community approval for applications satisfying particularly high standards of formal security and privacy.

On-Chain Data Storage

The typical difficulty of on-chain state growth is amplified both by generic application programmability and by standard privacy techniques, so an architecture chosen carefully to account for long-term sustainability should be a big priority.

The most common design for representing shielded state on a public ledger remains the now-classic Zcash Commit/Nullify design, which provides excellent privacy and flexibility, but introduces an essential cost trade-off: due to the fact that validators cannot tell when specific data commitments can be pruned from the commitment tree, a full historical commitment tree needs to be maintained for all shielded data in order to allow Merkle inclusion proofs in future transactions. Opening the shielded pool to arbitrary application data accelerates the growth of this commitment tree; and if a protocol also provides in-band encrypted storage of shielded data commitments, then this further increases the storage requirements for validator nodes.

While there does not appear to be consensus on a best solution to this issue, a few classes of approaches have been considered in different contexts:

Restrict on-chain storage in favor of off-chain or external data storage services, allowing reference to Merkle roots or similar commitment mechanisms with primary storage provided elsewhere
Allow applications to manage commitment trees which can incorporate application specific logic for private data retention, at the risk of fragmentation of the commitment anonymity set
Incorporate time-limited and metered data storage, to reflect the fact that data storage is an ongoing cost for network participants

Recommendation. Zcash may want to consider a mechanism for time-limited and metered data storage, where data commitments are preserved for a specified time, and can be refreshed to be re-inserted into the active commitment pool if they become stale. An interesting R&D problem would be to design a proof scheme allowing users to authenticate the validity of a stale but unspent note commitment against appropriate historical data.

Programmable Zcash should additionally support some amount of transparent data storage for applications. This allows for representing data meant to be visible in plaintext to all network participants, and is more efficient in particular for data that needs to be updated frequently, which would otherwise require nullifying and replacing a data commitment for each update under a pure shielded model.

On-chain Computation

Aside from storing state, a lot of variation can be seen in what computational tasks are taken on by validator nodes. A privacy chain can choose to provide a full public VM for generic on-chain computation by validators, or can restrict the types of computations delegated to validators to more specialized tasks like checking digital signatures or executing zk-SNARK verification protocols. On the one hand, providing a public VM for application developers to leverage makes development more accessible with lower infrastructure requirements; but on the other hand, it adds additional protocol complexity and increases the hardware requirements for running a validator node.

Recommendation. Support for general on-chain computation available to application developers is a common and useful feature for modern programmable blockchains, and Zcash should strongly consider including this capability in an upcoming roadmap. The biggest challenges for supporting this capability are in balancing compute availability against validator node resource requirements, calibrating gas costs against concrete validator compute costs, and choosing a VM environment which supports these goals, allows efficient execution, and is accessible to developers.

A number of modern protocols make use of WebAssembly (WASM) as a compilation target for on-chain computations due to its computational efficiency, substantial prior investments in runtime environments and developer tooling, and structural characteristics which make it amenable to static analysis. See for instance this presentation by Ed Felten of Offchain Labs about Arbitrum Stylus, a language for the Arbitrum protocol which makes use of WASM as a high-efficiency computation alternative to the EVM.

Off-chain Private Computation

Almost every blockchain implements some kind of off-chain private computation, even if it is as straightforward as users providing a transaction signature validating the authority to spend a UTXO. Programmable zk-SNARK proofs are a core component of the computational stacks of most modern privacy chains, providing the means to execute generic single-party application logic privately off-chain, while providing an efficient means for the network to verify the correctness of these computations. In particular, SNARK proofs provide the key functionality enabling the commit/nullify scheme for demonstrating the validity of private data witnessed on-chain, and supporting the use of such data in subsequent off-chain computations.

More recent advances in SNARK proof protocols have improved performance to the point that moderately complex application logic can be proved on widely available consumer hardware, meaning that deployment is feasible in the context of a widely distributed public network with heterogeneous end-users. Additionally, newer protocol designs such as those of Aztec and Mina zkApps demonstrate off-chain contract composability by the use of recursive SNARK constructions.

Recommendation. To fully enable private programmability, Zcash should provide first-class support for general off-chain computation using a zk-SNARK proof protocol. This would require support for user-provided dynamic verification keys, implementation of high-level tooling supporting application development by engineers without heavy cryptographic background (see also subsection “Developer Tooling” below), and most likely one level of recursive composition to keep verification at the consensus level generic across all programs.

Rolling out a full programmable SNARK toolchain is a nontrivial undertaking, but as it forms the core primitive of most programmable privacy chains, it calls for significant investment by the Zcash community. Some existing SNARK languages that may be worth referencing in the design include Aztec’s Noir, Mina’s O1.js, Aleo’s Leo, StarkWare’s Cairo, and Lurk.

Mediated Computation Phase

A mediated computation phase allowing privacy-preserving computations over private inputs from multiple parties serves as the “special sauce” of many newer privacy protocols supporting concurrent DeFi functionalities such as limit order auctions and multilateral trade credit set-off.

Support for mediated computation by an L1 protocol can be as simple as enabling L1 validation of certain SNARK proofs or signature schemes depending on the nature of the mediated computation techniques in question. This approach means that application developers are required to roll their own secure mediated computation scheme or make use of an external provider, and to report the computational outputs to the network with corresponding validity proofs.

Alternatively, protocols can choose to provide a default option for mediated computation, which at the minimum provides a well-vetted software library which can be used by developers to easily deploy a mediated computation layer for their application, or more comprehensively may provision compute nodes to execute a generic mediated computation protocol for applications in exchange for additional gas fees. These more integrated approaches have several advantages, including better security assurances for users due to development resources dedicated to an intra-protocol design, the ability to leverage the base protocol’s consensus mechanism for identifying mediated computation participants, and clean integration of mediated computation functionality into a protocol’s DSL and tooling for a more uniform and less bug-prone developer experience. All of this of course comes at a cost in terms of added development resource requirements and overall system complexity taken on by the core protocol team.

Recommendation. Zcash should aim to support mediated computation techniques in some form. In the short- to medium-term, this likely should be in the form of limited support for on-chain validation of mediated computation outputs of different types (e.g. collaborative SNARK proofs demonstrating validity of off-chain SMPC computations, or verification of TEE attestations for computations conducted in secure hardware enclaves), leaving the implementation details of concrete mediated computations to development teams who need this functionality. In particular, this approach requires only minimal consensus changes in line with those required for implementing zk-SNARK support for private off-chain computations, and does not significantly alter the basic responsibilities of Zcash block proposers or validator nodes.

For this short-term goal of supporting external mediated computation techniques, some preliminary research would be needed to identify a reasonable set of concrete primitives which would support the most common methods of implementing mediated computations.

In the long-term, Zcash may consider introducing first-class support for a mediated computation layer, either as a compatible software library that individual application developers can deploy to their own network of mediated computation nodes, or as a built-in protocol functionality which provisions mediated computation participants from the network, either leveraging the network consensus layer (most compatible with a proof-of-stake architecture, where block proposers can be punished for equivocation or inactivity) or introducing a separate protocol along the lines of an application-specific L2 for coordinating nodes for this purpose.

For this more extensive support for built-in mediated computation functionality, additional research would be needed to identify the most suitable concrete method of implementing it. The agenda for this research can roughly be broken down into two parts:

What technique should be used to conduct the computation?
What types of computations should be supported, and with what interface?

The latter point is of particular interest because of the following observation: in general, mediated computations are more expensive, are harder to implement securely, and have more complicated security assumptions than private independent computations or public global computations. As such, it is advantageous in a design to minimize the computation that has to happen in a mediated computation phase, preferring instead to push as much computation to private single-party methods or public on-chain execution. Based on our study, we believe that it would be especially worthwhile to investigate SMPC-based designs for a lean “computational kernel” which can support general mediated computation functionality while presenting a minimal surface for security challenges and application design complexity.

Developer Tooling

A nearly universal feature of privacy chains supporting general application programmability is that they provide developers with comprehensive tooling enabling streamlined interaction with the full computational pipeline provided by the protocol. This typically includes a Domain Specific Language (DSL) for specifying application logic, along with testing and deployment utilities.

Because of the complexity and cryptographic sophistication of programmable privacy protocols, the design of the DSL in particular plays a very important role in the developer experience. A well-designed DSL can provide:

An integrated view of application logic in the independent, mediated, and global computation phases, which abstracts away cryptographic details
Language primitives which clarify the privacy properties of different data fields
Interfaces which steer developers away from common security footguns and towards good design best-practices

The particulars of how such a DSL is structured depends heavily on the details of the protocol it is targeting, such as the execution, data, and privacy models of the different computational phases, and the cryptographic requirements of the privacy-preserving techniques employed.

Recommendation. Particular attention should be paid to the design of a DSL once the underlying private programmability architecture has solidified. Some important questions to consider include how data privacy is represented, what interfaces are provided for Private and Mediated computation phases, how the differences between computational phases are presented by the language, and how language structure can encourage or require application logic to adhere to security best practices for underlying cryptographic schemes. For inspiration, it may be helpful to review design decisions made in DSLs developed for other privacy protocols, and to consider ideas from academic efforts such as the zkay and ZeeStar languages.

Moving Forward

We would like this post to be a place where a conversation can get started on the future of programmability in the Zcash ecosystem. Some of the biggest questions we believe are worth discussing include:

What functionalities, in general terms, represent currently unmet needs of community participants? How could a programmability layer support implementation and deployment of such functionalities?
What level of core protocol support should be provided, in terms of:
- On-chain public computation, e.g. an on-chain VM (execution model, gas and fees)
- Application data storage (amounts, duration, costs)
- Mediated computation layer (light interoperability support, or full protocol-supported reference implementation)
- Permissionless application deployment (fully permissionless, white-list, or protocol-level deployment only)
- Integrated DSL and other developer tooling
In the shorter term, how can programmable privacy designs be integrated with existing plans for ZSA swap functionality? Is there a limited model of privacy-enabled programmability which would support this functionality, but would require a minimum of changes to Zcash network consensus rules?

Let us know what you think! We invite all community members to weigh in on this important topic of conversation. In addition, in the coming weeks we will be planning to host a community call to help aggregate ideas, to get a picture of overall sentiment, and to make plans for upcoming work towards designing and implementing a suitable programmability layer for Zcash. Thanks for reading!

hanh · June 20, 2024, 8:58pm

Do you have a recommendation about how to include the verifier part on chain without making the nodes spend too much resources?
In the commitment/nullifier scheme, what is the mechanism regarding the maintenance of the trees from the nodes point of view? i.e. who pays for the storage, etc.

Thanks

Edit: I think this is what zcash should be putting all the funding into.

Milton · June 21, 2024, 3:48pm

Do you have any information about “functional encryption” and how it could be used to get rid of the need to have a common public encryption key for protocols doing computation on encrypted inputs? Rand Hindi, the CEO of Zama, said that while multi-key FHE was infeasible, we could in the future use functional encryption to get rid of the need to have a MPC protocol to share an encryption key, but he didn’t explain how it works or elaborate on it further.

bgillespie · June 21, 2024, 3:55pm

Thanks for the question! So the short answer to this is that Zcash already uses SNARK proofs along with the commit/nullify data representation for shielded notes, so extending this to more general application logic would not be a huge leap in system complexity. The main technical change to consensus would be a revision to the SNARK scheme/validator logic to support proofs representing more general computations instead of the current hard-coded computations, which are along the lines of “I know a plaintext shielded note with the following nullifier whose commitment is represented in the following Merkle root from block number N, which produces the following value commitment that has been included in this transaction; and I also know the key authorizing me to spend this note”. In general, SNARK verification is an inexpensive operation, so we would not expect that enabling SNARK verification supporting general application logic would impose significant on-chain computation costs to validators.

The biggest challenge with adopting a general commit-nullify scheme for application privacy, in terms of validator resource usage, is likely the storage of shielded data. In the current shielded pools, it is impossible to tell which notes have been spent (i.e. a nullifier revealed), so you can’t truncate the network state without the possibility of losing data representing valid, unspent notes. This remains an issue when you allow more general applications to make use of the data commitment pool, except that the demand for this kind of storage is increased. Given that state bloat is already an issue of concern for Zcash even without this extra demand, we do think it is important to devise a mechanism which controls the growth in size of the active network state.

This question seems to be an open area of research, but we have some ideas. To throw out one possibility I’ve been thinking about lately:

Data commitments are stored in large Merkle trees as usual, but with a new base Merkle tree introduced every 6 months
Merkle trees are periodically rotated out, so that only the most recent 2 years of commitments are considered to be part of the “active” set
An inexpensive operation allows data commitments from older trees to be nullified and re-inserted into the most recent tree to “refresh” the data’s lifetime
Commitments from old Merkle trees are not considered to be active, and cannot be referenced in new notes, but to “refresh” an out-of-date commitment, a more computationally expensive SNARK proof can be generated by referencing archived historical commitment data to prove that a) a note existed in an old commitment tree which was part of the current chain’s history, and b) that note’s nullifier did not appear in historical nullifier sets after its creation

These ideas would need to be hammered out in a lot more detail to verify they are technically feasible and to ensure the system provides a positive user experience, but I personally don’t see any fundamental barrier at a high level to implementing a scheme in approximately this shape. There probably are other good methods that would provide a suitable solution to this issue as well, but the big picture is that it’s likely quite important to address state bloat if we want to support general applications making use of an on-chain shielded note/data commitment pool.

bgillespie · June 21, 2024, 4:24pm

This is a great question @Milton – I’ve done a little reading in the direction of functional encryption, and my take-away has been that there are practical methods for doing computations in this way only for very limited types of computations (e.g. dot products), and methods that have been devised for more general computations are not yet efficient enough to use in real-world deployments. It could be a really interesting project to see if there are any techniques out there or being developed which change this outlook though – I would not say that my review of this topic was anywhere near comprehensive.

My thoughts on this question in general are that the k-of-n style security assumptions of MPC or FHE schemes for mediated computation are eminently usable for current design needs, with a few specific points in mind:

These assumptions typically only control the privacy of data in the mediated computation, and not the correctness, since correctness issues can typically be sorted out by SNARK proofs validated on-chain
As a general design concept, it is often possible to minimize the data that is exposed to the mediated computation mechanism to just what is required to execute the application logic, e.g. for an order book functionality, only the asset, amount, and price of an order need to be encoded, but not the source of input assets or the recipient of proceeds
Careful selection of the parties involved in executing privacy-preserving MPC or threshold FHE decryption can provide greater assurances for the integrity of the mediated computation privacy – you only need to trust enough of the operators to faithfully follow the protocol in order to gain those assurances. As an example, a committee of operators might incorporate a selection of several well-known organizations chosen periodically by a community vote, along with several operators chosen in a decentralized, randomized way via opt-in ZEC staking.

Milton · June 21, 2024, 4:33pm

If you could make functional encryption practical, would the private computation be completely trustless?

bgillespie · June 21, 2024, 9:25pm

Thinking about this a bit, it seems to me that the primitive doesn’t actually provide a way to conduct computation over private data of multiple parties trustlessly, even if general techniques were further along. The primitive allows an authority with a “master secret key” msk to generate individual secret keys sk_f allowing computation of a function f for encrypted values. This allows the holder of msk to delegate access to data encrypted to them in a restricted way (i.e. the delegate can retrieve only a function of the originally encrypted value, not the value itself). However, this doesn’t seem to provide the ability to combine encrypted inputs from multiple parties, which is what’s needed for a mediated computation layer. (For general reference, the first page of Functional Encryption: Definitions and Challenges by Boneh, Sahai and Waters gives a nice high-level description of functional encryption.)

Regarding the comment you mentioned from Rand Hindi at Zama, I thought about it a little and it’s not clear to me what he might have had in mind. If anyone else has ideas about how FE could be used to bypass the need for an MPC protocol for decentralized generation of public key and distributed threshold private keys for use in a homomorphic encryption scheme, I also would be interested to hear.

Milton · June 21, 2024, 10:00pm

Rand talked about it on this podcast from 37:05-39:05

Milton · June 24, 2024, 7:35pm

I have another question:
Do any of the surveyed protocols that do the computations on-chain support function privacy? If not, do you know if it is practical or even possible to encrypt the smart contracts themselves?

bgillespie · June 28, 2024, 3:14pm

Though I’m not certain to what extent function privacy has been prioritized in a lot of current designs, this can be accomplished at least for single-party computations using SNARK recursion: a SNARK proof is generated which proves “knowledge of a SNARK verifier key possibly with some specified public commitment, and of an associated SNARK proof, such that verification of the SNARK proof by the verifier key with some specified public inputs passes”. So essentially the SNARK proof provided is a “proof of proper SNARK proof validation”, while the ZK properties of the scheme shield the nature of the computation being conducted.

A minor downside of the approach is that it takes place in the Independent computation phase, allowing only private inputs from a single user, because typical SNARK schemes require the prover to have full access to all function inputs. As such, out of the box, the technique can’t be used generically to provide function privacy for Mediated computations which need to process private inputs from multiple parties. There is some relatively recent work on collaborative SNARK proving schemes allowing multiple parties to generate a SNARK proof using their various private inputs (see: the paper Experimenting with Collaborative zk-SNARKs: Zero-Knowledge Proofs for Distributed Secrets by Dan Boneh and Alex Ozdemir, and docs for the Renegade dark pool protocol, https://docs.renegade.fi/core-concepts/mpc-zkp#collaborative-snarks), but these techniques are not as optimized as traditional single-party SNARK proof schemes, so it’s probably an open question of whether they would be suitable for a general programmability framework.

decentralistdan · June 28, 2024, 5:33pm

decentralistdan · July 3, 2024, 5:50pm

hanh · July 10, 2024, 8:31am

Hi @bgillespie , I was referring to the need to have app specific merkle trees. For instance, if an app has coin age as secret data and needs to prove it in zkp, the classic method would be to have a tree commitment of coin commitment+height.
But this means the nodes have to keep track of the root since it is an instance column of the circuit.
Is there a strategy for dealing with these types of scenarios?
Another example would be the need of a nullifier sparse tree.