ZCG Grant Proposal – Auto-Failover Toolkit v2

Proposal Summary: Auto-Failover Toolkit v2 for Zcash

Reference: Zcash Community Grants Issue #132

Author: Emilio983 (The Social Mask Development Team)

Status: Open Proposal

Total Budget: $39,500 USD

Estimated Duration: 25 weeks

Executive Summary

This proposal introduces the “Auto-Failover Toolkit v2,” an infrastructure solution designed to eliminate single points of failure in Zcash wallet and application connections. The primary goal is to ensure that if a lightwalletd server fails, the application automatically switches to another operating server in less than 3 seconds (p95), without the end user perceiving any interruption.

Unlike a previous proposal that was rejected for being too rigid, this v2 version focuses on flexibility, offering three distinct integration paths to accommodate both novice developers and established wallet teams.

The Problem

Currently, many Zcash wallets and services rely on a single lightwalletd server. If this server goes down or falls out of sync, the application stops working, preventing users from viewing balances or sending transactions.

This forces every development team to build their own custom reconnection scripts, which is costly and creates a significant barrier to entry for new developers coming from web environments (JS/TS/Python) who lack Rust experience.

The Proposed Solution

The project offers a modular toolkit under the MIT license that requires no key custody and avoids vendor lock-in. It is divided into three main components:

  1. Ready-to-Use SDK (JS/TS)

A library designed for developers looking for a quick solution (“npm install”). The SDK automatically manages latency measurement between servers and handles connection switching transparently in the event of failure.

  1. Directory API (HTTP/JSON)

A lightweight web service that allows querying the best available server at any given moment based on criteria like latency or geographic location. Ideal for applications in Python, Go, or environments where installing the full SDK is not desired.

  1. DIY Templates (Do It Yourself)

Detailed guides and example code for teams that need total control and prefer to implement their own switching logic based on the best practices documented by this project.

Key Differentiators

• No Central Dependency: If the project API goes down, the system can continue functioning via a decentralized signed JSON registry.

• Flexibility: Does not impose a “one-size-fits-all” solution; developers can choose to use the full SDK, just the API, or simply copy the design patterns.

• Open Source: All software will be MIT licensed and self-hostable.

• Security: It never touches private keys at any point.

Commitments and Deliverables

The team commits to delivering:

• Client libraries (SDK).

• Public API infrastructure and server registry.

• Technical and educational documentation.

• Primary server (VPS) maintenance for 12 additional months after delivery.

• Security audits and stress testing.

The proposal aims to facilitate the entry of new developers into the Zcash ecosystem by reducing the complexity of network infrastructure management.
[https://github.com/ZcashCommunityGrants/zcashcommunitygrants/issues/132\]

I attach this link where there was another post on the forum about the tool where we developed a discussion about the project What if Zcash never broke again because of a single lightwalletd server? - General - Zcash Community Forum

3 Likes

I support this as a hot fix for here and now and mobile environments where CPU is limited. Ultimately I think lightwalled should be integrated as a mode of full node code (don’t download the chain, don’t broadcast as a peer, request only blocks that the wallet has asked for)

3 Likes

:grinning_face_with_smiling_eyes: Thanks so much for the support

Would be worth getting in touch with existing developers and getting feedback directly from them first; this would depend on the Rust bindings/implementation that each developer uses and need coordination with them. I think that it would make more sense to handle lightwallet server failover on the Rust side and just pass an array of servers from Node.js/React Native.

Thanks, @1337bytes. You and @hanh are spot on: the architectural ideal is definitely native Rust integration.

But as @fireice_uk noted, this is a necessary hotfix for the ‘here and now’

We can’t ask web developers to wait for protocol upgrades or learn Rust just to keep their apps online.

This toolkit fills that gap immediately-delivering reliability via npm install today, while paving the way for the protocol to catch up later.

Could you point to any Zcash web developers that are being blocked because of lightwalletd server stability issues? I believe zec.rocks already has a proxy failover system on their end, and I’m not sure what you mean by a hotfix that is not on the Rust side? Here is how zingolib handles the connection zingolib/zingolib/src/grpc_client.rs at ee0a776e1e053744bc42a50fbf4814c430d93db9 · zingolabs/zingolib · GitHub

let config = match zingolib::load_clientconfig(server, Some(data_dir), chaintype, false)

I believe this is a static constant, I don’t quite see how error handling on the JavaScript side could catch an error and change the server URI.

About your question of “which web devs are blocked”: I am not thinking only about wallet teams like Zingo that already did the hard work in Rust. I am more worried about the “small builders” who use JavaScript or TypeScript and just want to make a simple thing on top of Zcash: a donation button, a small bot, an internal tool, a hackathon demo, and so on.

What I see in practice is something like this:

  • They use one lightwalletd endpoint.
  • They get connection errors or the height gets stuck for some users.
  • Then they realize that to fix this they need to learn Rust or do their own failover logic with more infra.
  • At that point many people just move the idea to Ethereum, Algorand, etc., where the SDK already hides this part.

With my own project (Social Mask) I also felt this problem. When one node goes down, the app looks “broken” for users, even if the rest of the system is fine. This creates less trust and makes it harder to keep people on Zcash.

For me, Zingo is actually a good example of why this problem is real: you already added logic to choose between lightwalletd servers on your side, because having only one hardcoded endpoint hurt real users. What I want is to bring that kind of idea to people who do not run a full Rust wallet codebase, but still want to build things with Zcash using JS or simple backends.

About the “hotfix not on the Rust side”: I agree with you that the best long term solution should live closer to the core, in Rust or at protocol level. If one day lightwalletd or the common Rust libraries include this logic in a clean way, that would be great and I will be happy to adapt to that.

The reason I am proposing a JS-side toolkit is more practical:
changing Rust libraries and doing releases takes time and coordination, but a JS or Node dev today cannot just npm install something and get basic multi-endpoint behavior in a few minutes. Either they write it by themselves, or they give up on Zcash. I want to offer a simple option so they do not have to give up.

About the code you linked:

let config = match zingolib::load_clientconfig(server, Some(data_dir), chaintype, false)

You are right: once this is called, the chosen server is basically static for that client. I am not saying that JavaScript can change a static constant inside Rust. The idea is different:

  • From JS or Node you still choose which server string you pass.
  • If a health check fails, the toolkit can create a new client with another endpoint from a list.
  • The app dev only passes a list of servers and gets simple logic like “try A, if it fails try B, then C”.

So I see this proposal as something that works together with what you already did in Zingo, not against it. It is a bridge so small builders can have better reliability now, while more deep Rust or protocol changes are discussed in parallel.

Basically this is like a paradigm but you can understand this if you think that this solution and this conversation it is like a battle between rust and the reality,

Hi Emilio, do you know people who work with Zcash in JS or TS like you describe because I am not aware of a js library that would enable someone to connect to a lightwalletd directly.

There is webzjs, but it does not support connecting to lightwalletd. It needs to go through a grpc-web proxy.

With my own project (Social Mask) I also felt this problem. When one node goes down, the app looks “broken” for users, even if the rest of the system is fine.

What’s the architecture of your app?

1 Like

Hi @hanh I was in an event sorry for the time that I am responding.

1. Developer Adoption & The “Easy vs. Better” Trade-off

I speak from experience engaging with builders in Discord, hackathons, and grant programs. I previously developed an Algorand wallet, and I must admit, the integration was incredibly seamless.

That taught me a vital lesson: “Easy” doesn’t always mean “better” technology, but “easy” brings the people. And having people builds the ecosystem.

Maybe those same people will eventually improve these systems and build even better tools later. But they need the bases first to make everything flow. If we give them that entry point now, it becomes much easier for them to grow into the ecosystem rather than leaving for simpler chains.

2. Technical Scope & Architecture Clarification

To avoid confusion when I say JS SDK in this proposal, I am referring specifically to the Node.js environment.

Social Mask, the architecture illustrates exactly why this tool is needed

• Backend Infrastructure: I use a Node.js backend connecting to lightwalletd via native gRPC.

• The Stability Challenge: Even in this backend environment, connection stability is currently manual. If the specific node I configure creates friction or goes down, the app looks “broken” to the user.

My proposal abstracts this failover logic into a standard library. This ensures new developers get reliability out of the box via npm install, rather than having to engineer infrastructure code from scratch and if you have an other question you can say and I will answer but to clarify I’m talking about node js because is a tool/language that a lot devs uses to make their application.

I bet to disagree. It seems you built a tool/platform based on your previous experience with other coins and it felt several pain points. This is because the zcash ecosystem uses different tools and language. Have you considered how to build the Social Mask using our ecosystem? It seems you took an architecture that works for Algorand and expected it to work for Zcash.
I agree that working with Zcash is harder than most other coins, but I don’t think that the main issue is the lack of failover for lightwalletd.

1 Like

Hi @hanh, I understand your point. I know Zcash is unique and more complex due to its shielded nature, and I am not trying to force patterns from other chains blindly.

But the practical hurdle I faced was this:

Connecting to a single public node meant my app would break if that node went down. To fix it, I was forced to run my own full node, which made the project much more expensive and harder to maintain.

That is exactly the gap I want to fill. I want to give builders reliable uptime using public infrastructure, so they don’t have to pay for and manage a dedicated node just to launch an MVP.

Look, here is the core issue:

To use Zcash, you simply need to connect to a node. In theory, that is easy.

But ensuring the app continues to work if that node goes down requires extra logic that takes time to build. That is exactly the gap this toolkit fills.

Many builders here simply don’t have the budget to run and maintain a dedicated node for every project. But not being able to afford a private node shouldn’t mean we have to risk our apps stopping whenever a public node fails.