Dear Zcash Community,
We are excited to share with you this milestone deliverable for a project we started to survey existing ZK programmability protocols to better understand how to add programmability to Zcash in the long run. Sponsored by the Zcash Foundation and worked on by @bgillespie @therealyingtong and myself.
You can find the full write up in this link -
Or read below a shorter version (limited in characters!)
In any case we are looking to have conversations with people who have opinions, knowledge or experience in this topic. Please reach out to any of us, or reply here directly, to engage
In early 2023, the Zcash Foundation commissioned the authors to write a systemization of knowledge paper on the topic of privacy-preserving blackchain programmability solutions, in order to inform discussions amongst the Zcash community about adding programmability to Zcash. This project is to be delivered in three phases:
- Milestone 1: Research of Protocols and Parameters
- Milestone 2: Analysis and Comparison of the Protocols
- Milestone 3: Paper write up, and publication of the final SoK paper
In this write-up, corresponding to the milestone 1 deliverable, we establish a structural outline consisting of four main components:
- Framework — a general protocol model, and common state models and execution models
- Parameters — semi-formal definitions for privacy, security, efficiency, and data availability used in comparing different protocols
- Applications — classes of common applications supported by the above protocols
- Protocols — overview of contemporary protocols, analyzed in terms of the above parameters and designs
In the final write-up for this project, an additional section will be included discussing conclusions on protocol design, and recommendations for useful implementation paradigms.
To provide a formal basis for defining notions of privacy and security, we start by describing a uniform underlying general model which we will use to interpret the properties and behaviors of network protocols.
One fundamental idea which is important in many areas of analysis is that of identity in a distributed protocol. For our purposes, there will be two related notions of identity:
- Keypairs — public/private keypairs representing a pseudonymous logical identity in a protocol
- Nodes — interactive computational elements with history and private memory representing a network participant
Here, keypairs provide an algorithm-level notion of identity allowing private communication and authentication, and which usually can be created inexpensively by random sampling. Nodes provide a protocol-level notion of identity, and are the abstraction which generates and makes use of keypairs.
Another important abstraction is the specification of a distributed protocol in terms of distinct operations defining network updates initiated by one or more nodes. Specifically, an operation is defined by its:
- Transaction Data Schema — a data structure which is submitted to the network, giving a public view of an operation’s output
- Generation Protocol — one or more algorithms or interactive protocols which produce a transaction output from a specified set of inputs; these are the “prescribed” methods of producing transactions by honest nodes
- Validation Procedure — an algorithm used by network nodes to check that a transaction data structure is valid according to requirements of the protocol
- Execution Procedure — an algorithm used by network nodes to update the persistent network state based on the parameters of a valid transaction
The operation of a network protocol is then modeled as a collection of nodes which produce transactions associated with operations as defined above, which are interpreted using the relevant validation and execution procedures to update a persistent global network state. A sequence of transactions is called valid if each transaction is valid with respect to the state obtained by executing all of the transactions of its prefix in the sequence.
A UTXO-based state model typically consists of:
- Commitment set — an append-only set representing resources in the state, some of which may be spent
- Nullifier set — an append-only set representing spent resources
A transaction spends a private input commitment to produce a public nullifier and a public output commitment. Importantly, the public state change cannot be linked to the input commitment which produced it: even if a third party knew the owner and value of a certain input commitment, they would be unable to detect if it was spent.
In this model, an application’s state may be represented as a set of commitments gated by specific validity predicates. To update its state, it spends the commitments containing the old state, and produces commitments containing the updated state. This enables high concurrency in applications with sharded state: each user can update their own commitments on-chain without affecting state owned by other users.
In the account-based model, application state is stored in a persistent location associated with the application’s account, and updates to this state are applied in-place. This makes it possible to tell which application is being updated, even if the contents of the update remain private.
Unlike in the UTXO-based model, each private state update affects the entire application state, even if the application data can be logically sharded. To achieve concurrency, the Mina protocol [Min23] introduces an action/reducer pattern, where commutative actions are posted on-chain, and can be processed (”reduced”) at timely intervals. Although actions are public, they can be made privacy-preserving through clever design, such as homomorphically encrypting each action to some authorized reducer service.
Transaction execution proceeds through three stages:
- Intent pre-authorization: the user authorizes the spending of her resources, conditional on the fulfilment of some desired end-state. This is completed locally off-chain, using a signature as a binding commitment.
- Transaction creation: partial intents are matched to specify a complete execution trace of a transaction. All conditions of all intents must be satisfied in a transaction. For instance, a trade intent can be matched with that of a counterparty, or batched with other trades into a single aggregate trade. This process can happen off-chain, on-chain, or with some combination of both.
- Settlement: finally, transactions are executed and state transitions are effected. This always happens on-chain, meaning it can be replayed by every consensus node in the network.
Most protocols surveyed in this SoK perform all intent matching off-chain, and simply settle completed transactions on-chain. In effect, each application on the network functions as its own “rollup”: network consensus nodes cannot access user intents, and can instead only validate fully-specified execution paths. Put another way, the result of the transaction execution must be known before it is submitted to the network. As a consequence, privacy properties in this execution model are mostly dependent on application design, and not protocol design.
Single-party transaction creation
In the special case where a transaction involves only a single party, there is no need to match against a counterparty, and the execution path can be specified without help from a third party.
A zero-knowledge proof is the simplest option for privacy-preserving transaction creation in the single-party case. Since no additional processing or computation can be done on the intents encoded in the zero-knowledge proof, all that remains is for the network to check the validity of the execution path.
For instance, in the Zcash protocol, a single party is aware both of the notes they are spending, as well as the resulting nullifiers and outputs. This is in contrast to the Penumbra protocol [Pen23], where only the spend is known at transaction creation time, and outputs are computed by the network consensus nodes.
Multi-party transaction creation
A group of users may choose to perform a multi-party computation (MPC) to match their intents off-chain. A successful matching round produces a complete transaction, which is publicly known to all involved participants; an unsuccessful round preserves the privacy of each user’s input. Due to the highly interactive nature of these protocols, requiring all parties to be on-line for many rounds of communication, MPC intent-matching can practically only be realized off-chain.
Figure 1: MPC-based backrunning between a user and a searcher. (Figure taken from [Ann23].)
Examples of multi-party intent-matching applications include: backrunning [Ann23] (Figure 1); block building [Has23]; inventory matching [Pol et al., 2023]; and peer discovery [WG20].
To achieve on-chain intent matching, intents must be expressed in a structure-preserving way, that still allows third parties to blindly perform computations on them. This requires either:
- homomorphic encryption, where computation can be done blindly on encrypted data; or
- encryption to a trusted execution enclave, where decryption can be done securely.
Note that solutions for on-chain transaction creation can also be used off-chain, by sequencers and matching services; however, the converse is not true.
Homomorphic encryption (HE)
In an HE-enabled execution model, the network’s consensus nodes are able to execute all state updates blindly on encrypted state. State is encrypted to some threshold of the consensus nodes, in effect reducing privacy guarantees to the network’s base consensus assumptions (Figure 2). To ensure that the ciphertext encodes some valid state, this must also be accompanied by a proof of knowledge of correct encryption.
Figure 2: In an HE-enabled execution model, state updates are blindly performed on-chain. The Zama FHE library is being used to build a private smart contract network [Zam23]. (Figure taken from [Zam23].)
Intents may then be matched by arbitrary algorithms in smart contracts to form complete execution paths. Some of these algorithms may also be “hard-coded” in the network’s state transition rules: for instance, in Penumbra [Pen23], users submit encrypted swap values to validators, who batch the values before executing them.
State transitions are executed directly on the ciphertext, with each validator being able to do a partial decryption of the output. The network then engages in an aggregation protocol to recover the full decrypted value, and re-encrypt it to the user.
Trusted execution enclave (TEE)
A TEE-enabled execution model provides a similar user experience as the FHE-enabled model. However, computation is not done in public over encrypted data; instead, it is done on the decrypted plaintext, inside a secure enclave.
For instance, in Secret Network [Sec23], each validator runs an SGX enclave containing the network private key. (This design can also be compartmentalized such that only a few master nodes hold the entire key, while worker nodes receive only contract-specific keys.) The network state is encrypted to this key.
User intents can be encrypted using a shared secret derived from the user’s key and the network key, and decrypted inside the secure enclave. Transactions can be created and executed in plaintext inside the enclave. Afterwards, transaction outputs can be encrypted and posted on-chain. If a quorum of the network’s consensus nodes agree on the results, the chain state is successfully updated.
To allow for a generic notion of privacy, we need to provide a way to express what differences in information make it possible to efficiently distinguish between transactions added to a protocol ledger. Define a transaction signature to be a pair of data structures: a table associating one or more data elements with sets of keypairs, and a table associating one or more data elements with sets of nodes. Informally, a transaction signature specifies the minimal collections of keypairs or computational nodes with the ability to discern a given data element from a transaction.
We aim to formalize a definition which generalizes the notion of ledger indistinguishability developed in [Ben-Sasson et al., 2014], which will provide a method of defining a signature of a ledger, such that different ledgers with identical signature from the perspective of an adversary are computationally indistinguishable. We are still developing the formalisms that will usefully describe this functionality. Under such a definition, different notions of privacy may be described by properties of signatures derived from the ledger:
- Anonymity — a collection of one or more public keys associated with an operation type are not revealed by the signature for an adversary lacking knowledge of any private keys from the collection
- Confidentiality — a collection of one or more data fields associated with an operation type are not revealed by the signature for an adversary lacking knowledge of any private keys from an authorized collection
- Unlinkability — for a distinguishing piece of data (public key, commitment, etc.) which is associated with multiple operations in a ledger, the distinguishing data is not revealed by the signature of more than one such operation for an adversary lacking knowledge of any private keys from an authorized collection
- Function Privacy — a function associated with an operation is not revealed by the signature for an adversary lacking knowledge of any private keys from an authorized collection
Beyond the concept of “information hidden cryptographically” is the question of what can be determined from the information that is included in a ledger signature.
- Plausible Deniability — it is infeasible to identify a unique node which participated in constructing a ledger transaction based on information revealed by the ledger signature
- Statistical Privacy — specified information revealed by the signature of a ledger transaction follows the same (or at least a computationally indistinguishable) distribution regardless of the inputs or method used to construct it
A notion related to privacy which is often important for contemporary blockchain applications is the following:
- Auditability — an authorized node or group of nodes is able to produce a convincing proof of a property of the ledger signature such as the value of a private data field or the output of a computational predicate
A number of additional technical considerations come into play when analyzing the privacy guarantees provided by distributed protocols. A major component of protocol security is described by the security model and trust assumptions underlying the model’s claimed properties. Secure computation relies on reduction of desired properties to the assumptions provided by an underlying security model, which can come in different forms. The security model can play a significant role in how implausible it is (practically or in perception) for an adversary to overcome a privacy scheme. Some common features that are incorporated by security models include the following.
Cryptographic Security and Assumptions
Mathematical constructions used to ensure protocol security often rely on the difficulty of solving certain computational problems to guarantee their effectiveness. Some problems such as hardness of discrete logarithms or preimage resistance of well-known cryptographic hash functions are called “standard assumptions” and typically have risk profiles with known parameters, and are usually acceptable as a foundation for production cryptosystems. More exotic cryptographic primitives may rely on other computational problems which have not been subjected to extensive analysis or battle-tested in production systems, often called “non-standard assumptions”. Such assumptions, though potentially valid, have underexplored risk profiles, and so their use as a foundation for protocol security in real-world applications is typically discouraged.
Notably, the distinction between “standard” and “non-standard” assumptions is fuzzy and often debated by practitioners in the field; as such, it is useful for protocol specifications to be precise about what cryptographic assumptions are required for their correct functioning and desired properties.
Corruption Models and Economic Security
Protocol security often depends on the degree to which different participants are willing to act dishonestly. A protocol in which nobody is acting according to the rules has no ability to constrain the computational outputs, but it may be possible for a single honest participant in a protocol to still produce valid results due to careful design. In other cases, validity of protocol results may rely on a fixed positive proportion of participants to behave honestly, or at least only to behave dishonestly in constrained ways. The description of which protocol participants are assumed to behave dishonestly, and in what ways, is called the “corruption model” of the protocol.
One approach to ensuring a required level of honest participation in a protocol is to tie economic incentives to honest participation. For instance, a protocol may be designed such that bad behavior is possible, but that once this behavior is detected, an amount of a network currency token which was escrowed in order to participate is taken away and distributed to the remaining honest participants. Such schemes are called “economic security”, since the desired properties rely on the financial/economic incentives of the participants.
A layer of protocol security can be provided by requiring the use of hardened secure computational modules implementing a Trusted Execution Environment or TEE. Such modules run computations within a secure computational enclave with specified, controlled interfaces for input and output, and provide hardware-enforced assurances of validity and privacy. Under the assumption that a TEE module is physically capable of resisting attacks from a determined adversary, then computations executed using a TEE can be treated as though they were run by a “trusted” computational entity which is assumed to produce correct results and leak no information beyond what is specified by the controlled interface.
The availability of a real-world instantiation of a trusted protocol participant can greatly simplify the design of secure protocols. However, given the prevalence of subtle side-channel attacks on real-world hardware, as well as instances where cryptographic secrets have been exfiltrated from TEE modules which were thought to be secure (see for instance [Van Schaik et al., 2022]), there is some disagreement on the feasibility of producing TEE modules which truly are capable of resisting sophisticated attacks. Nevertheless, even if such concerns are warranted, hardware-based measures may still be valuable for contexts where security does not have a single point of failure (e.g. compromising a single TEE module breaks protocol security), but rather, depends only on a “significant portion” of participants making use of TEE modules as intended.
A further aspect of protocol security which is important to consider, but which is broadly outside the scope of this work, is that the overall guarantees provided by a protocol which admits user programmable logic may be dependent on multiple layers of specifications. The underlying global protocol/consensus layer will generally provide some guarantees, while the design of individual applications may provide others. In some cases even the specifics of user input may play into what level of privacy is provided in a particular distributed environment. In our analysis, we will treat protocol level security and privacy as the primary focus, and we may make remarks when warranted about ways in which application or user level security may interact with the protocol level in subtle ways.
Another important consideration when analyzing protocols is how formally a security model has been linked to the protocol’s concrete implementation. Most protocol specifications provide at least an informal justification for why the spec produces some desired security and privacy properties, but due to the subtle nature of layered and intertwined computational and cryptographic procedures in a multi-participant protocol, it is often possible for unforeseen vulnerabilities to slip through such informal justification.
To avoid such inadvertent vulnerabilities, a security proof is a valuable component of a protocol specification. This is a mathematical argument that the protocol’s desired properties reduce to the assumptions made by an underlying security model. For our analysis of different protocols, we will assess three characteristics:
- Does the protocol specification include a security proof?
- What model does the security proof make use of (e.g. ad-hoc constructions, a composable security framework like UC, etc.)?
- Does the security proof apply to the concrete protocol, or to a simplified precursor such as what might be given in an academic paper on which a protocol was based?
Concrete efficiency and scalability is an important aspect of the design of protocols meant to be deployed in real-world/production environments, but one which will not be a central focus of our formal analysis. Efficiency of different protocols and cryptographic primitives is generally well-studied in the literature and in surveys, and the analysis of protocol components along this axis doesn’t typically interact in subtle ways with the privacy characteristics of the protocol. In general our discussion of protocol efficiency will address the following qualitative features:
- Is a protocol or protocol component concretely efficient enough to be used in real-world systems
- If efficiency is currently an issue, either as a bottleneck or as a blocker for certain use cases, are improvements expected
That is, we will be interested in whether protocols or techniques are efficient enough for practical use, either now or in the near future. We will also note where efficiency considerations play a role in the design of a system when such considerations have an impact on functionality, privacy, or scalability.
Data availability describes the method by which protocol state data is stored and made available to network participants. A protocol may simultaneously specify multiple distinct methods for providing data availability for different types of data and with different guarantees of availability. Some possible variations for the scope include:
- Arbitrary user data
- Fixed data per account, transaction, or contract
- No user data
Variations for the mode of data availability layers include:
- Consensus data availability — data is treated as part of the consensus network state; data is reproduced by all network nodes
- Separate data availability — data availability is provided by a separate related protocol; data redundancy (number of times a specific piece of data is reproduced across network nodes) may vary depending on the protocol
- No public data availability — maintaining local state data is the responsibility of network participants
A common design paradigm for minimalistic blockchain protocols is to provide very limited data availability allowing the recording of one or a small number of state roots of a commitment structure such as a Merkle tree.
Data availability is an important design consideration for distributed computation protocols, but interestingly is largely orthogonal to data privacy — any data meant to remain private can be stored on a public data availability layer in encrypted form so that only those with access to authorized credentials is able to decrypt the data for use.
An occasional mechanism at the data availability layer which is used to reduce future damage in case of compromised credentials is to add an authentication mechanism to the data availability layer itself, so that honest nodes will only distribute data to nodes after providing valid credentials, preventing adversaries in many cases from retrieving and storing encrypted data for future cryptanalysis.
We define three categories of common applications, grouped according to their minimum required kinds of coordination. We note that this categorization is not formal, nor final, as we find that each application can be implemented in several ways, potentially falling in different categories. With this section we would like to emphasize the fact that there are some meaningful differences between applications that enable for interesting comparisons between protocols. We hope to have a more formal categorization in Phase 2 of the project.
In coordination-free applications, each party’s actions are causally independent of other parties’. Here, the output of the whole transaction is known in advance by the party, and a third-party cannot cause any conflict or side-effect. Applications in this category include:
- Asset transfer and issuance — being able to create and transfer (custom) assets in a privacy-preserving manner
- Asset based features — any asset specific features that are added to the protocol, such as:
- supply conservation and other global invariants;
- proving ownership or specific attributes of the tokens owned;
- Set membership — proving membership of a token in some on-chain global state; also used in implementing access controls and identity management systems.
Certain applications require the ability to aggregate user actions and perform a batch state update. Here, many actions are “reduced” by a single agent (could be a smart contract, a liquidity pool, an exchange, etc.); the aggregate result may then be published or used in a further action.
- Voting — votes are cast by participants and then aggregated on-chain in order to generate the final output.
- Auctions — bids in an auction may be received in any order as long as they are received prior to the conclusion of the auction. After this point, the result of the auction is independent of the order in which bids were received
- Batch swaps — each user commits an asset, to be swapped for a desired amount of some other asset. The network aggregates the asset commitments, and executes them in a single batch against a liquidity pool
A more complex class of applications requires interactivity and atomic resource matching between one or more mutually distrusting parties. These parties must be able to interact in a privacy-preserving way to jointly construct a complete transaction. Examples of such applications include:
- Atomic swaps — two or more parties each pre-authorise a partial trade, encumbered on the fulfilment of a certain desired outcome. To form a complete trade, all outcomes in each partial trade must be fulfilled.
- Backrunning — a user and a searcher jointly compute a backrunning transaction on the user’s original transaction. In case of failure to create a transaction, both the user’s transaction and the searcher’s search strategy are kept confidential.
- Block creation — a group of transaction owners jointly compute a maximally profitable set of transactions for inclusion in a block.