Grant Application - zaino - Stability, Performance & Testing

hanh · June 24, 2026, 6:19am

github.com/ZcashCommunityGrants/zcashcommunitygrants

Grant Application - zaino - Stability, Performance & Testing

opened 03:46AM - 19 Jun 26 UTC

### Terms and Conditions - [x] I agree to the [Grant Agreement](https://9ba4718…c-5c73-47c3-a024-4fc4e5278803.usrfiles.com/ugd/9ba471_f81ef4e4b5f040038350270590eb2e42.pdf) terms if funded - [x] I agree to [Provide KYC information](https://9ba4718c-5c73-47c3-a024-4fc4e5278803.usrfiles.com/ugd/9ba471_7d9e73d16b584a61bae92282b208efc4.pdf) if funded above $50,000 USD - [x] I agree to disclose conflicts of interest - [x] I agree to adhere to the [Code of Conduct](https://forum.zcashcommunity.com/t/zcg-code-of-conduct/41787) and [Communication Guidelines](https://forum.zcashcommunity.com/t/zcg-communication-guidelines/44284) - [x] I understand all milestone deliverables will be validated and accepted by their intended users or their representatives, who will confirm that the deliverables meet the required quality, functionality, and usability for each user story. - [x] I agree that for any new open-source software, I will create a `CONTRIBUTING.md` file that reflects the high standards of Zcash development, using the [`librustzcash` style guides](https://github.com/zcash/librustzcash/blob/main/CONTRIBUTING.md#styleguides) as a primary reference. - [x] I understand when contributing to existing Zcash code, I am required to adhere to the project specific contribution guidelines, paying close attention to any [merge](https://github.com/zcash/librustzcash/blob/main/CONTRIBUTING.md#merge-workflow), [branch](https://github.com/zcash/librustzcash/blob/main/CONTRIBUTING.md#branch-history), [pull request](https://github.com/zcash/librustzcash/blob/main/CONTRIBUTING.md#pull-request-review), and [commit](https://github.com/zcash/librustzcash/blob/main/CONTRIBUTING.md#commit-messages) guidelines as exemplified in the `librustzcash` repository. - [x] I agree to post request details on the [Community Forum](https://forum.zcashcommunity.com/c/grants/33) - [x] I understand it is my responsibility to post a link to this issue on the [Zcash Community Forums](https://forum.zcashcommunity.com/c/grants/33) after this application has been submitted so the community can give input. I understand this is required in order for ZCG to discuss and vote on this grant application. ### Application Owners (@Octocat, @Octocat1) hanh ### Organization Name hanh ### How did you learn about Zcash Community Grants Long time contributor ### Requested Grant Amount (USD) 50000 ### Category Infrastructure ### Project Lead ```project-lead.yaml Name: hanh Role: dev Background: zcash dev Responsibilities: ``` ### Additional Team Members ```team-members.yaml - Name: N/A Role: Background: Responsibilities: ``` ### Project Summary Zaino is a Zcash lightwalletd-compatible indexer. It speaks the lightwalletd gRPC protocol to serve compact blocks, transactions, and chain data to Zcash light wallets. It reads from a full Zcash node (zebrad or zcashd) and builds local indices so that compact blocks can be served efficiently. This proposal covers three areas of work: 1. **Stability** — replace manual locking and deep-clone snapshots with compiler-enforced immutability. The result is a codebase where concurrency bugs are not representable in the types, not just prevented by convention. 2. **Performance** — lock-free reads, O(1) block ingestion under normal operation, and fair scheduling so that thousands of concurrent clients see minimal slowdown relative to a single client. 3. **Testing** — a suite of four tools that verify Zaino behaves correctly under load, across reorgs, and against the lightwalletd protocol specification. ### Project Description Light wallets need compact blocks to synchronize with the Zcash blockchain without downloading the full chain. A compact block is a skimmed-down version: it contains only the transaction data relevant to the wallet's shielded pools (Sapling and Orchard), plus enough transparent transaction metadata to track UTXOs. The lightwalletd protocol defines exactly how these compact blocks are requested and streamed. Zaino currently serves this protocol by maintaining three tiers of state: - A **mempool** (in-memory, DashMap-backed) that tracks unconfirmed transactions and provides streaming updates to subscribers. - A **non-finalized state** (in-memory, deep-cloned on every update) covering the top ~100 blocks — the reorg zone. - A **finalized state** (on-disk LMDB) holding everything below the reorg zone, with BLAKE2b-256 checksums for corruption detection. Queries route through a subscriber layer that takes a snapshot of non-finalized state, resolves against the finalized DB, and streams compact blocks back to clients. The whole thing plugs into either a zebrad ReadStateService or a JSON-RPC connection to any Zcash validator. Code reviews have surfaced concerns with the internal design — specifically around the deep-clone snapshot pattern, the CAS ceremony, and in-place reorg rewiring. These are areas where the type system could carry more of the correctness burden. ### Proposed Problem ### Manual locking and deep-clone snapshots The non-finalized state uses `ArcSwap<NonfinalizedBlockCacheSnapshot>`. On every sync cycle the writer deep-clones all ~100 blocks in the window — about 464 KB — mutates the working copy (adding new blocks, rewiring height indices on reorg), then publishes via compare-and-swap. Readers see a consistent snapshot because they hold the old `Arc`. This pattern relies on several runtime invariants. The writer must not publish a half-updated snapshot — doing so would expose readers to a partially rewired chain. Publishing the updated snapshot requires a `compare_and_swap` — the writer must hold onto the old `Arc` as a sentinel until the swap completes, and handle a failure path that is unreachable in normal operation but can't be removed because the CAS API demands it. A reader that takes two snapshots in one request must reuse the same `Arc` or risk seeing inconsistent state across the two calls. On reorg, the height index is rewired in-place on the working copy — a panic mid-update leaves the snapshot in an inconsistent state. The current code handles these correctly, but the compiler can't catch a mistake here. ### Reorg handling When the validator's chain tip moves to a different branch, Zaino walks backward from both tips to find the fork point, then rewires the height index and rebuilds forward along the new branch. This happens in-place on the working copy: the walk is async (it fetches missing blocks from the validator), and between await points the working copy is partially rewired — `best_tip` points to the new chain, but some heights still map to the old one. The working copy isn't published until the walk completes, so readers are safe today. But the incoherent intermediate state is visible to any code that runs during the walk, and nothing in the types prevents accidentally publishing it early. ### Other components The mempool has a related class of issues — shared mutable state behind a `DashMap` with per-subscriber tracking and a custom broadcast abstraction. Correctness depends on the sync loop and subscribers following a protocol that the compiler can't enforce. The mempool is not part of this proposal, but the approach described below applies to it as well. If the mempool becomes a bottleneck or a source of bugs, it can be updated in a later phase following the same design principles. ### Proposed Solution ### Compiler-enforced immutability The core idea is simple: instead of deep-cloning mutable snapshots, store blocks in **persistent data structures** that share structure with their predecessors. The new `zaino-store` crate provides three types that work together: **`Phm`** — a persistent hash map from block hash to block data. Wraps `Arc<im::HashMap<BlockHash, Block>>`, a HAMT (Hash Array Mapped Trie). Clone is O(1) (an `Arc` refcount bump). Insert returns a new root sharing all unchanged subtrees with the old one — only the path through the trie to the new entry is allocated. **`HeightDeque`** — a persistent deque mapping height to best-chain block hash. Wraps `Arc<im::Vector<BlockHash>>`. Push back (ingestion) and pop front (freeze) both allocate only the changed spine; the rest is shared. **`ChainStateInner`** — the immutable root holding `blocks: Phm`, `heights: HeightDeque`, and `tip: BlockHash`. The writer builds new `Phm` and `HeightDeque` roots (each an `Arc` clone + one structural allocation along the changed path), then swaps them under a brief write lock. Readers clone the two `Arc`s under the read lock (two pointer bumps, nanoseconds) and are then fully independent — no lock is held during block iteration. ``` Reader view: ChainStream { blocks: Phm, heights: HeightDeque, cursor } ─── two Arcs, four integers, ~48 bytes ─── Writer path: ingest(block) → new_blocks = blocks.insert(hash, block) // HAMT path alloc → new_heights = heights.push_back(hash) // deque spine alloc → RwLock swap: inner.blocks = new_blocks inner.heights = new_heights inner.tip = hash ``` Only the HAMT path and the deque spine are allocated per block — a few hundred bytes regardless of how many blocks are in the store. The deep clone of the entire window disappears. More importantly, the concurrency hazards disappear because none of them are representable in the new types: - `Phm` and `HeightDeque` are **never mutated** — `insert` and `push_back` return new values. There is no working copy to accidentally corrupt. - **No `compare_and_swap`**. The writer takes a standard `RwLock` write guard, swaps three fields, and drops the guard. No CAS failure path. - A reader clones two `Arc`s under the read lock, then releases it. The resulting `ChainStream` sees a **consistent snapshot forever**, regardless of concurrent writes — including reorgs that rewrite the best chain. Stability under reorg comes from the `HeightDeque`: it is a persistent data structure, so the writer's reorg creates a new deque mapping heights to the new best-chain hashes, while the reader's clone of the old deque still maps those same heights to the old best-chain hashes. Both deques resolve through the same shared `Phm` (**blocks are hash-addressed and archived, not discarded** — popped from the deque on freeze, but persisted to LMDB). The snapshot isolation is the deque's persistence. - The `ChainIndexSnapshot` enum and the `NonFinalizedSnapshot` trait disappear entirely. The old design forced every query to take a snapshot through a ceremony that returned either `NonFinalizedStateExists` or `StillSyncingFinalizedState` — a global flag that serialized on the sync state of the entire index. During initial sync, every reader saw `StillSyncingFinalizedState` and either failed or fell back to the validator, even for heights the finalized DB could serve. In the new design there is no snapshot ceremony: a reader clones the current `Phm` and `HeightDeque` under the read lock and proceeds. If a block isn't in the in-memory `Phm`, the reader transparently falls through to LMDB. The sync status of the tip doesn't gate reads on blocks that are already stored — **the server is always available. It never waits for a "synced" state to serve the data it has**. None of these guarantees require the programmer to follow a convention or remember a rule. You can't mutate a `Phm` — `insert` returns a new one. You can't see an inconsistent chain — the `HeightDeque` you cloned defines exactly one chain, and the writer's reorg doesn't touch it. You can't corrupt a reader — the writer never holds a reference to the reader's data. **The types do the work. The full functionality of the store has been modelled and proven in Lean 4.** With persistent data structures, the finalized vs. non-finalized boundary becomes a storage-tiering decision, not a semantic one. Currently the two tiers differ in mutability, representation, and write path. The new design unifies them — both are immutable; the `Phm` resolves blocks by hash, LMDB resolves them by height (with a hash→height map maintained alongside). The only difference is where they live: hot blocks in memory (Phm + HeightDeque), cold blocks on disk (LMDB). The `freeze` operation pops from the front of the deque and writes to LMDB — the in-memory roots are updated, but readers holding an old deque snapshot still see the frozen heights. Because both tiers are immutable and never mutate in place, the guarantees are the same whether a reader resolves a block from the in-memory `Phm` or from LMDB. A `ChainStream` that spans the freeze boundary transparently reads hot blocks from its captured `Phm` snapshot and cold blocks from LMDB. The cold blocks are frozen by height, and heights are never reorged below the freeze horizon — so LMDB reads are as stable as in-memory reads. The reader doesn't know or care which tier served a given block. ### Natural snapshot isolation A side effect of the design is that streaming responses are **naturally consistent even when reorgs happen mid-stream**. The gRPC handler calls `stream_range(start, end)` which clones the `Phm` and `HeightDeque` under the read lock — two `Arc` bumps — and returns a `ChainStream` cursor. If a reorg arrives while the stream is in flight, the writer takes the write lock, swaps in new roots, and publishes — but the in-flight `ChainStream` continues from its snapshots. The client gets a consistent view of a single chain. Meanwhile, ingestion is cheap: unlike the previous design, which deep-cloned the entire ~100-block window on every cycle, `Phm::insert` and `HeightDeque::push_back` are **O(1) and lock-free**. The new roots are **built outside any lock**; the write lock is held only for the final pointer swap. **The writer never waits for readers, and readers never wait for the writer.** ### Modular design The rewrite doesn't touch the parsing, serialization, or gRPC serving layers. `zaino-store` deals only with data storage. Its `Block` type is **opaque** — `{ height: u32, prev_hash: [u8; 32], data: Vec<u8> }` — the store never looks inside the payload. It enforces consistency by validating that each ingested block's `prev_hash` matches the current tip and its `height` follows sequentially. It doesn't parse Zcash transactions or build compact block protobufs. Those concerns stay in `zaino-state`, reused as-is. The gRPC server, JSON-RPC server, config system, and validator connectors are unchanged. Because the store is a small, self-contained set of operations — insert a block, extend a chain, find a fork, truncate, freeze — **the full functionality has been modelled and proven in Lean 4**. Every operation preserves the invariants: a chain built by extending a valid tip is itself valid, reorgs produce valid sibling chains, and no sequence of operations can produce a broken prevHash link. ### Performance Block ingestion allocates only what changed: one HAMT path through the `Phm` (log n nodes, effectively constant for practical sizes) and one spine node in the `HeightDeque` — **a few hundred bytes per block regardless of store size**. The writer holds the `RwLock` write guard just long enough to swap three `Arc` fields (nanoseconds). **No deep-clone of the entire window. No CAS retry loop.** Reads are nearly lock-free. A reader takes the `RwLock` read guard just long enough to clone two `Arc`s (two pointer bumps, nanoseconds), then releases it. All subsequent iteration through the `ChainStream` cursor is plain hashmap and vector lookups with **no contention**. A thousand concurrent clients each hold their own `ChainStream` and read independently — **no read locks held during iteration, no contention on the hot path.** Fairness comes from the fact that **the writer never blocks readers**. The writer takes the write lock, swaps three `Arc` pointers, and releases. Readers holding old `Arc` snapshots continue unimpeded. The only synchronization point is the `RwLock`, whose read side is a single atomic operation. In practice, the fetch backend imports **1000+ blocks per second**. This makes the direct zebra-DB reader backend unnecessary — a path that relied on undocumented zebra implementation details and tied Zaino to a specific validator. With the fetch path alone sufficient for ingestion speed, Zaino can talk to any validator (zcashd, zebrad, or another zainod) over the stable JSON-RPC interface. ### Testing The proposal includes four testing tools. Each addresses a different dimension of correctness: **`zaino-compare`** takes two running servers and compares blocks retrieved from A with blocks retrieved from B, flagging any mismatch. This is used to verify that Zaino behaves identically to the reference lightwalletd implementation — same compact blocks, same nullifier sets, same transaction data — across the full range of the chain. **`zaino-check`** issues many random range queries against a single instance and verifies that the results are always internally consistent: no chain breaks, no height gaps, every block's prevHash matches the previous block's hash. This catches reorg-handling bugs, off-by-one errors in range streaming, and corruption in the block store. **`zaino-concurrent`** stages thousands of concurrent clients against the same instance, each querying different overlapping block ranges. It verifies that every client gets correct results in a timely fashion and reports statistics on latency distribution, throughput, and any failures. This catches contention bugs, starvation, and performance regressions under load. **`zaino-grpc-test`** systematically tests every method in the CompactTxStreamer gRPC interface. For each method it sends valid requests, invalid requests (wrong types, out-of-range heights, missing hashes), and edge cases (empty ranges, tip boundaries, reorg windows). It reports which methods pass, which fail, and which are not yet implemented. This identifies gaps between the implementation and the lightwalletd protocol specification. These tools run in CI and can be pointed at any Zaino instance — local, staging, or production. They don't depend on internal APIs; they exercise the same gRPC interface that light wallets use. ## Scope The validator connectors, LMDB finalized store, gRPC and JSON-RPC transport servers, config system, and `zainod` CLI are all unchanged. From the outside, Zaino still speaks the same protocol and connects to the same validators. What goes away: the `NonFinalizedSnapshot` trait, the deep-clone sync path, the CAS retry logic, and several error variants that corresponded to code paths that no longer exist. The net line count drops while the test coverage increases. ### Solution Format The work is delivered as a series of small, isolated commits on a feature branch, each self-contained and straightforward to review and merge: 1. **Split `ChainIndexer` into three traits.** The monolithic indexer trait is decomposed into a validator-abstraction trait (hides zcashd vs. zebrad differences behind a single interface), a mempool trait (tracks unconfirmed transactions and broadcasts to subscribers), and a compact-block-indexer trait (stores blocks and serves compact-block streams). Each trait is its own commit, with no behaviour changes — only the trait boundary is drawn. This makes the subsequent commits easy to land because the seam already exists. 2. **Develop `zaino-store`.** The `zaino-store` crate is introduced with `Phm`, `HeightDeque`, and `ChainStateInner` as described above. It is built and tested in isolation — no callers yet, no integration with the rest of the system. The commit ships the crate, its unit tests, and the Lean 4 formalisation of its invariants. 3. **Replace calls into `zaino-state`.** One commit per trait from step 1: the compact-block-indexer implementation switches from the deep-clone / CAS snapshot pattern to `zaino-store`; the mempool is updated if it is in scope for the current phase. The validator abstraction and gRPC serving layer are unchanged. Old code is deleted after the new path is wired in, so each commit compiles and passes tests independently. 4. **Add test tools.** `zaino-compare`, `zaino-check`, `zaino-concurrent`, and `zaino-grpc-test` land as separate commits, each with its own CI wiring. They are introduced after the store replacement so that they exercise the new implementation. 5. **Validate.** A final integration commit adds a CI job that runs the full test-tool suite against a Zaino instance backed by both zebrad and zcashd on a common testnet checkpoint. This commit gates the merge: the branch does not land until all tools pass against both validators. ### Dependencies Zaino ### Technical Approach Most of the current code is unchanged as per previous solution format. ### Upstream Merge Opportunities - PR for the ChainIndex abstraction. It's a mechanical interface refactor: Make it implement a trait. - PR for zaino-store - PR for test tools ### Hardware/Software Costs (USD) 0 ### Hardware/Software Justification N/A ### Service Costs (USD) 0 ### Service Costs Justification N/A ### Compensation Costs (USD) 50000 ### Compensation Costs Justification Cover design & development costs ### Total Budget (USD) 50000 ### Previous Funding Yes ### Previous Funding Details Various projects ### Other Funding Sources No ### Other Funding Sources Details _No response_ ### Implementation Risks None, this is mostly retroactive funding. Everything is working and tested. ### Potential Side Effects None ### Success Metrics The new code is easier to work with because of its modularity, interfaces, invariants. It performs better and has better concurrency. And finally it has been formally verified. ### Startup Funding (USD) 0 ### Startup Funding Justification N/A ### Milestone Details ```milestones.yaml - Milestone: 1 Amount (USD): 50000 Expected Completion Date: 2026-06-20 User Stories: - "As a lightwallet user, I can scan and synchronize my wallet against a Zaino indexer that responds correctly and performantly under concurrent load" - "As a zaino operator, I can ingest the blockchain at a rate comparable to lightwalletd while running Zaino unattended, with confidence that the server remains available during reorgs and initial sync" Deliverables: - lean 4 proof — a formal model of the `zaino-store` persistent data structures (`Phm`, `HeightDeque`, `ChainStateInner`) and their core operations, with machine-checked proofs of snapshot isolation and structural sharing invariants - zaino block store — a production `zaino-store` crate implementing those persistent data structures in Rust, backed by the `im` crate (HAMT + persistent vector), integrated into Zaino's ingestion and query paths, replacing the deep-clone snapshot pattern - zaino test tools — a suite of integration tests covering concurrent read/write under load, reorg correctness (fork detection, height rewiring, snapshot stability across reorg boundaries), and freeze-to-LMDB round-trip integrity - The changes will be submitted as a pull request to the upstream Zaino repository. We will work with the maintainers through the review process, respond to feedback, and iterate as needed. While we will make every effort to achieve a merge, the final decision rests with the project maintainers and we cannot guarantee acceptance. Should the PR not be merged after a good-faith effort to address all review feedback, we will maintain a public fork carrying the completed work, documented and kept in sync with upstream so that it remains usable by the Zcash light-wallet ecosystem. ```

dismad · June 24, 2026, 2:19pm

I love this format of explaining your grant hahn

hanh · June 25, 2026, 6:15pm

Thanks

If someone wants to try it out, I made a docker image.

Synchronizing a full mainnet blockchain (assuming you have Zebra synced up), should take you ~20 minutes (instead of ~24h).

Docker Usage

Image: hhanh00/zaino:latest

Binaries in the image: zainod, zaino-admin, zaino-check,
zaino-compare, zaino-concurrent-test, zaino-grpc-test.

Volumes

Mount point	Purpose
`/app/config`	Config directory
`/app/data`	LMDB data directory

Config file

Minimal zainod.toml (place in the directory you mount to /app/config):

backend = 'fetch'
network = 'Mainnet'
block_store_max_concurrency = 8
start_height = 0

[grpc_settings]
listen_address = '127.0.0.1:9067'

[validator_settings]
validator_jsonrpc_listen_address = '127.0.0.1:8232'
validator_user = 'xxxxxx'
validator_password = 'xxxxxx'

[storage.database]
path = '/app/data'
size = 384

Setup

Create the data directory:

mkdir ./data

Bootstrap

Load blocks from a local Zebra RocksDB into the zaino LMDB store:

docker run --network host -v ./data:/app/data -v .:/app/config -v ~/.cache/zebra:/app/zebra --entrypoint zaino-admin hhanh00/zaino:latest bootstrap /app/zebra

Run the server

docker run -d --network host -v ./data:/app/data -v .:/app/config hhanh00/zaino:latest -c /app/config/zainod.toml start

The default entrypoint runs zainod and forwards all arguments, so -c /app/config/zainod.toml start is equivalent to zainod -c /app/config/zainod.toml start.

Running other tools

Override --entrypoint with the binary name:

Ex: to compare blocks 3300000 to tip between zaino and zec.rocks

docker run --network host -v ./data:/app/data -v .:/app/config --entrypoint zaino-compare hhanh00/zaino:latest --start 3300000 http://localhost:9067 https://zec.rocks

Note: container runs in “network host” mode, where it has access to the ports of the host. It could be run in bridge mode, but the network configuration is more complex.

Topic		Replies	Views
Zaino, The Zallet Release Applications	14	998	March 26, 2026
Zingo Labs Accelerates Zcashd Deprecation, With Zaino Applications	61	2118	May 9, 2025
Complete Zaino! Applications	19	1249	May 19, 2026
Zainod release announcements Community Grants Updates	6	220	June 18, 2026
Zcash Z3 updates (formerly Zcashd Deprecation) Ecosystem Updates	156	9068	June 19, 2026

Grant Application - zaino - Stability, Performance & Testing

Docker Usage

Volumes

Config file

Setup

Bootstrap

Run the server

Running other tools

Related topics