RFP - Zcash Lightwalletd Infrastructure Development and Maintenance

Ywallet never had a fork… I don’t quite understand the distinction you are pointing out.

2 Likes

All - It’s clear that there are some strong off-topic feelings floating around. Let’s please keep the discussion here constructively focused on the RFP and its responses.

Responses to concerns below:

It actually does not mention the software version and its update policy.

“Offering ongoing maintenance, active monitoring, and prompt upgrades to the lightwalletd service to ensure it remains compatible with Zcash network updates”

It’d be great to also understand the budget allocation to pay for the cloud services vs the compensation for the team.

I have a surplus of server hardware racked up at two datacenters in the USA. It’s hard to estimate what their fair monthly lease value would be since I do not normally operate as a hosting company.

I tried my best to come in with the lowest rate that covers fixed costs and believe that $3,750/month for six dedicated servers managed by an experienced site reliablity engineer is an incredible value for this community.

One way of viewing the budget allocation would be:

  • Free: 2 managed dedicated servers in USA, 1gbit unmetered
  • $1,875/month: 2 managed dedicated servers in LATAM, 1gbit unmetered
  • $1,875/month: 2 managed dedicated servers in Europe, 1gbit unmetered
  • Free: Open sourced Helm charts for anyone to deploy Lightwalletd on Kubernetes, with documentation informed by lessons learned during our first year of hosting this infrastructure. (educated hardware requirements, load balancing and anti-DDoS suggestions, etc)

The infrastructure in your plan, how many light clients can it service?

We propose six dedicated servers hosting Lightwalletd infrastructure, two in each region, each with at least 16 cores and 128GB of RAM running as Kubernetes nodes in regional clusters. Our existing hardware in Texas is 24 core, 128GB RAM, with 4TB of NVMe storage per physical server. We plan to adjust and further optimize the hardware as we learn more about the load profile of hosting the Lightwalletd application for the Zcash community’s use patterns.

I am uncomfortable with estimating how many light clients the proposed infrastructure will be able to service, that is highly variable based on what types of requests the clients are making to the server infrastructure (what is their wallet age, how many transactions have they received, what is their connection speed, etc).

If there are existing benchmark or load simulation tools for Lightwalletd I am happy to point them at our existing hardware to try to ballpark. Perhaps an existing Lightwalletd maintainer can chime in.

We will transparently share metrics and lessons learned while deploying and maintaining this infrastructure. Once it is all live and observed for a period of time we should be able to much better understand what hardware to recommend when others are launching their own Lightwalletd nodes.

Is there more costs incurred to your infrastructure based on the type of light client connecting to your service (ie. ios app, android app, desktop app, cloud machine)?

The proposed capacity and costs are fixed ($3,750/month for six dedicated servers managed by myself and anyone I choose to recruit for help). There is a remote risk that our infrastructure becomes overloaded and needs to be scaled further, that is why I mentioned potential scaling costs in the risk category. I would prefer that many diverse teams launch Lightwalletd infrastructure using the standardized Kubernetes Helm charts we are developing, and that the number of available Lightwalletd nodes scale in a decentralized fashion rather than by scaling our infrastructure too far.

What type of activities from the connecting clients cost your infrastructure more money?

None, the costs are fixed and bandwidth is unmetered. In terms of which activities slow the infrastructure down: we will learn about this using observability tools and report openly to the community what we find as part of open sourcing our Kubernetes Helm charts, and documenting the recommended server infrastructures for others to use them to launch their own Lightwalletd nodes.

Observing the other two lightwalletd service providers in the ecosystem what methods will you deploy to make your solution more cost competitive?

I am not very familiar with the infrastructure approaches that existing providers are using or their costs and would love to engage in open dialogue with them.

I am a strong believer in self-hosting away from public clouds. This reduces costs by over 10x in many cases, especially for bandwidth, and ensures that a community is more resilient in the event of major hosting providers banning cryptocurrency-related network traffic. By using owned and/or leased dedicated servers, hardware can be custom-tailored to the needs of the applications being deployed (Lightwalletd + Zcash full nodes). A primary benefit of Kubernetes is that engineers can attain public-cloud-level reliability on commodity hardware when thoughtfully-architected.

I hope that many teams apply for this grant, that more than one team receives it, and that a large list of running Lightwalletd servers exist for users to diversify their sync options with in a few months.

7 Likes

Yeah… Well, this doesn’t say which fork it is going to use…

Could you tell us the specs of these machines?

Thanks
–h

2 Likes

Yeah… Well, this doesn’t say which fork it is going to use…

I’m happy to host whichever fork the grantor prefers. The container image is configurable in the Helm values file. If someone prefers to host a different fork with the same chart they can edit the file. The default I am working with now is the “electriccoinco/lightwalletd:v0.4.7” image.

I will try to use zebrad wherever possible but am not sure how production-ready the lightwalletd+zebrad combination is right now. If it’s stable and supported by the light wallets that this grant aims to provide infrastructure for, then we will focus on zebrad from the start. If not I’ll make sure that by the end of the year we have transitioned over to the latest recommended lightwalletd hosting approach, to the extent possible at the time.

Could you tell us the specs of these machines?

Our existing hardware is predominantly using AMD EPYC 7443P CPUs. I have not finalized a vendor for the non-USA datacenters yet but aim to spec each cluster to start with the same amount of aggregate compute power across its nodes. If CPU options are lacking in a region we will use more nodes there to target the same capacity.

1 Like

I am sorry but I still can’t get a clear understanding of the hardware specs that you are going to use.
Could you tell us in simple terms what the config is and how much the vendor is going to charge you?

For example:
image

1 Like

Hi Hanh, below is a summary of what I have outlined above:

USA: Two physical servers: AMD EPYC 7443P CPU, 128GB RAM, 4TB NVMe
LATAM: TBD of equivalent compute to the USA cluster, at least two physical servers
Europe: TBD of equivalent compute to the USA cluster, at least two physical servers

The USA servers are already owned and are currently racked up in a great datacenter. I am the vendor. LATAM and Europe datacenters will be evaluated, picked, and deployed as part of the grant. I will size those clusters appropriately to, at minimum, match the compute power of the USA cluster.

If the grant requestor prefers that I research and select the exact global datacenters ahead of time I can, but I’d prefer to focus on developing repeatable Helm charts that are confirmed to be production-ready in the USA before rolling the charts out globally (see timeline proposed above: 3 weeks of dev time before USA deploys, 6 weeks for global).

2 Likes

That’s way over powered. By 4x at least. It’d more economical to use cheaper hw and have more locations.

5 Likes

Hanh, please re-read my proposal. It’s hardware that I already own. I’ll scale each region appropriately to the load I observe. In my experience with provisioning self-hosted Kubernetes it’s better to over-provision than to be sorry, because public cloud conveniences such as autoscaler “infinite scale in seconds” are not a thing.

3 Likes

I thought you said that you would have each region be at least equal to what you have in the USA.

I noticed that it’s hardware you already own. I pointed out the capacity needed today (from what I observed) because the proposal costs are based on what you have.
Anyway, I understand the value proposition now.
Thanks for the clarification and good luck.

4 Likes

Hi all, I am curious what the process and decision timeline is for proposals here?

I went ahead and wrote basic Helm charts and deployed them to our owned hardware in the USA, and to Vultr Kubernetes for now, for the global regions. I am in the process of testing the global load balancer architecture proposed above. For now the infrastructure is running zcashd. I will upgrade it to support zebra before releasing the chart.

I am going to submit a revised proposal that would use a mixture of owned servers and cloud hardware across more regions as suggested. It would be helpful to know the process by which proposals are evaluated so that I can prioritize my time.

Testing and feedback is welcomed on these endpoints running on Kubernetes:

https://zec.rocks:443 - Global endpoint (closest region automatically selected)
https://na.zec.rocks:443
https://sa.zec.rocks:443
https://eu.zec.rocks:443
https://ap.zec.rocks:443

All endpoints support IPv6 which I believe is very important for users in the developing world.

2 Likes

Hi @emersonian. We don’t have a set timeline for making a decision, but we will be discussing your proposal in a meeting tonight. If we’re unable to reach a decision, we’ll provide you with an update on the status. Thanks.

2 Likes

Note that there are privacy weaknesses in Zcash’s current light wallet protocol that essentially let the servers figure out “who is paying who” among the users of the server. The wallet also reveals its transparent address to the server.

The servers are high-value targets for anyone wishing to de-anonymize shielded Zcash transactions or transparent addresses, so they should not be considered low risk. Even though all of the data stored on them is public information, the access patterns to that data are sensitive and need to be protected.

An ideal architecture for security should:

  1. Make it an easy and regular process to re-build the server instances from known-good state. This way if the instance is compromised, the compromise is short-lived because the instance will be rebuilt from a secure state after not too long. (e.g. use Docker)
  2. Prevent even the system administrators from accessing the production instances, e.g. it should not be possible for anyone to SSH into or run commands on the production instances once they have been started, or to obtain the TLS private key. If that’s necessary to do for debugging in emergencies, the SSH private key should be kept offline (in an air gap) until it’s needed and then rotated after use.
  3. Run on hardware that’s not shared with other tenants in a cloud environment, as a defence against cache side-channel attacks.

Security will become even more important if support for detection keys is added.

7 Likes

I think a core question is do we want to be a community of many nodes, or of a few relatively-centralized nodes.

There are many opportunities to improve user education around privacy and expectations when connecting to a light wallet server.

My end goal here is to write and release production-proven Helm charts which make it straightforward for hundreds of nodes to be launched by hundreds of individuals. A humbling reference is monero.fail - I don’t think we have more than 20 lightwalletd servers currently running globally for our entire Layer 1.

This enables a future where it’s easier for users to run their own nodes for the best privacy. Everyone should run their own node ultimately.

This is understood in the Bitcoin community regarding electrs/Electrum, and I believe it will be better understood here as we simplify the lightwalletd self-hosting process and educate users.

Sure, server access controls are important. But going down a path of Zcash leadership micromanaging node operator security practices, auditing their security claims, possibly KYC’ing server providers, is a centralized path that still has many flaws. (unauthorized server access is not the most likely way that user privacy could be compromised in my opinion)

I am for the decentralized path of hundreds of lightwalletd nodes running, with straightforward charts and documentation on how to run your own.

Let’s launch and prove the reliability of open source Kubernetes infrastructure-as-code, a standard many other projects are using that will be familiar to server-savvy newcomers to our community in the years ahead.

6 Likes

I support this proposal

I think you have made strong observations of the limitations of the current zcash lightwalletd infrastructure, you have proposed reasonable solutions, you have a strong contact @zancas (wallet developer using lightwalletd for zingo).

It would be great to produce a short video of your lightwalletd solution for future users to be distributed on zechub.xyz ? @dismad

It would be great to conduct an interview on free2z live to explain the importance and limitations of lightwalletd infrastructure. @skyl

I hope you write a strong introduction and basic demo document to be added to Zcash Read the Documentation ?

I would expect you to continue to contribute to the zcash community by posting regular updates to the community.

Good Luck @emersonian

3 Likes

@emersonian at the most recent meeting, the @ZcashGrants Committee voted to approve your proposal, granting one year of support at $45,000. The committee has requested that you provide monthly updates via the forum and officially post your proposal to Submittable.

6 Likes

Thank you for the support! Very excited to ramp up my contributions after many years of ZEC use. I will post updates here and post on Submittable as requested.

@kworks Great suggestions. I’m open to contributing in those ways, let’s chat about it in DM/Signal.

4 Likes

Gentle reminder regarding your proposal submission on Submittable. If you have any issues or need assistance with the submission process, please don’t hesitate to reach out. We’re here to help!

1 Like

Hi all - here is the Kubernetes helm chart that we are using to deploy and maintain our lightwalletd infrastructure.

I am working to improve the documentation and write a few guides on how to deploy lightwalletd on popular clouds, as well as on self-hosted hardware. Look for a forum post soon encouraging people to help beta test these charts to run their own nodes.

Feedback welcomed!

@ZcashFoundation our Submittable application is here, thank you: Gallery View: Zcash Community Grants Program

3 Likes

Hi all apologies for the delayed update here, I am recovering from a rough sickness.

Here’s an update on how the Zec.rocks project is progressing.

Completed tasks

  • Global and regional endpoints are live and available over IPv4, IPv6, and experimentally on Tor.
    • All endpoints are running Zebra. One standby zcashd node is running in case of chaos, but is not currently serving any user requests.
    • They are currently load balanced using HAProxy hosted at Fly.io, reverse proxyied to Kubernetes nodes running on Vultr Kubernetes Engine.
  • A status page is available at https://status.zec.rocks
    • Caveat: it currently only checks successful TLS connections. It does not yet make status check requests using gRPC, we are working towards that goal to get a more robust view the lightwalletd’s availability.
  • Our infrastructure is deployed using Kubernetes Helm charts which are open sourced at: GitHub - emersonian/zcash-stack
  • A lightwalletd gRPC ping utility was built and open sourced to assist with troubleshooting connectivity issues, and for validating the experimental functionality of lightwalletd Tor hidden services: GitHub - emersonian/zecping

Endpoints:

  • zec.rocks (global anycast, proxies to the nearest cluster to the user)
  • na.zec.rocks (EWR, LAX)
  • sa.zec.rocks (GRU, SCL)
  • eu.zec.rocks (AMS, CDG)
  • ap.zec.rocks (SIN, BOM)
  • testnet.zec.rocks (EWR)

Works in progress

  • We are slightly behind schedule on moving our primary clusters to dedicated hardware. Next week we are finalizing work on launching Kubernetes clusters on dedicated server hardware provided by Hivelocity, a hosting provider which accepts Bitcoin. We will then point the Fly load balancers to the dedicated hardware. Vultr will remain as backup infrastructure since it can scale-out quickly.
  • The Fly.io edge piece of our infrastructure which uses HAProxy to proxy requests to a user’s nearest Kubernetes cluster is not yet open sourced. This should be released before Z|ECC.
  • We currently do not log any requests. It would be nice to know how much this infrastructure is used to help with capacity planning, and to provide supporting data for metrics-driven decisions made by our community.
    • We intend to solicit feedback on whether minimal logging of basic metrics per region (request counts per gRPC method) is something that people are comfortable with. (of course with no logging of method arguments/parameters or user identifiers such as IP addresses)
  • Third-party vendors are currently paid using Bitcoin to support a circular crypto-native economy, except for Fly.io which does not yet have a cryptocurrency payment option. When necessary, we intend to swap the ZEC to BTC using a DEX as soon as one is reliably available.
    • Hivelocity is working towards accepting Zcash and communicated that generally they are open to accepting any cryptocurrency that Coinbase supports.
    • Vultr did not respond to our request to accept Zcash, they use Bitpay for their payments and are likely limited by Bitpay’s options.
  • We are working on making the infrastructure available to Nym and Tor users as custom services and hidden services respectively.

Challenges

  • To our knowledge only Zashi and Zingo have added zec.rocks endpoints to their applications. It is not clear if other applications are working towards adding zec.rocks to their server lists.
  • There is a lack of tooling to troubleshoot lightwalletd connectivity issues.
    • As an example, a reliability issue experienced by zec.rocks users in the Czech Republic was not repeatable by our team.
    • We are working on a tool which will make a gRPC request from all of Fly’s regions as a form of “global ping” to help with troubleshooting these types of regional user reports of adverse performance.
  • It is not clear if a standardized behavior exists across wallet applications when connections fail to a lightwalletd server. Do connections retry, how many times, and are additional server connections attempted if a primary server is offline? Other light wallets in the industry
  • It would be nice if more lightwalletd infrastructure providers existed, and if they used our Kubernetes Helm charts to help us further battle-test them in production.
8 Likes

If anyone interested in contributing to lightwalletd infrastructure, the Zcash Community Grants (ZCG) program is still accepting proposals. You can reach out to ZCG directly or submit a grant proposal to get involved. This is a great opportunity to help enhance the network while utilizing Kubernetes Helm charts in a production environment.

2 Likes