Establish a formal release process

Many entities run software developed by the Zcash community.

Currently it is hard to know when the latest software is released, and what is considered “stable”.

New releases of Zebra, Zcashd, and Lightwalletd may be announced in any number of places: the forum, on Github, on Docker Hub, on Discord, or on scheduled video calls.

Announcements need to be “push” rather than today’s “pull,” requiring sysadmins to check many different places.

As an example of how chaotic this can be for server operators, the most recent Zebra and Zcashd releases were not announced on the forum. Let’s collaborate to clean this up.

I propose that we standardize the release process by using an email mailing list, a standard many other open source projects rely on. Urgent security updates should also be announced via email in my opinion.

We also need to define what is “stable”. When upgrading zec.rocks’ server infrastructure, it’s never been clear to me if a new release should be gradually rolled out, or should be considered stable and production-ready.

Let’s formalize our community’s software release process and stick to it.

Here’s a brain dump of recent issues I’ve run into regarding unclear releases:

  1. Zebra and Zcashd were released on Github and Docker Hub but I did not see the new versions mentioned on a typical forum thread, so I did not know about the releases. I suspect that sysadmins at large companies running Zcash (exchanges) would love an email list. Maintaining a list would also help us to know who to contact for urgent issues, and to get user feedback about what features would help the people running large scale deployments. (a standardized log format for example, Prometheus metrics, readiness states, etc)
  2. Lightwalletd has not been released in a very long time. It’s not clear if it has a maintainer. Recent commits are essential for serious node operators, greatly reducing startup time. We have to run our own fork and images.
  3. Lightwalletd has Docker Hub releases that are not listed in its Github Releases.
  4. The wrong Lightwalletd build was published to a Docker Hub release a few months ago, then force updated, necessitating the use of image hashes in our configurations to deploy the image to Kubernetes clusters that had pulled the incorrect tag. CI needs to be the only way that things are released.
  5. It’s not clear who to contact, who the maintainer is of each project in the Zcash ecosystem. The person, not the entity.
  6. Zebra v2.0.1 introduced a breaking change for node operators, cookie authentication enabled by default, that broke the zcash-stack Helm charts. I believe that teams building Zcash software should all be running and familiar with our production environment, Kubernetes, so that updating the Helm charts used to run Zcash in production environments will be understood as clearly in-scope for any release.

I propose that:

  1. We culturally shift into “push” notifications of releases and security bulletins, rather than expecting busy operators to “pull” announcements by regularly checking the forum, Github, Docker Hub, and attending video calls like is currently necessary.
  2. Zcashd, Zebra, and Lightwalletd releases are always announced to an email mailing list, cross-posted to the forum, posted to Github Releases, and released to Docker Hub using CI instead of any manual pushes.
  3. Releases across all projects must happen regularly. Commits should not float around unreleased. If a maintainer is unsure as to whether the current state of a repository is “release-ready,” there should be a quick bias towards releasing anyways but appending release candidate indicators to the tag (“-rc1”, etc) to indicate uncertainty of stability.
  4. Updating the zcash-stack Helm charts must be part of the release process.
  5. The zcash-stack Helm charts should run pre-release software in a staging environment to catch any integration issues before a wide “stable” release, both on Testnet and Mainnet (and any other networks that are expected to be maintained).
  6. Every release should indicate whether the build is expected to be stable in production and is safe to roll out widely.
  7. Every project needs a single source of truth. We move all Zcash source code into one organization on Github which is managed by ZCG, who decides maintainers. This simplifies leadership transition when grants are won and lost. I firmly believe that if a project is grant-funded, the granting body should control all repositories and artifact registries (Docker Hub, etc) and properties related to the project (domain names, Google/Apple deployment accounts), to prevent old software from circulating.
  8. An appeal process should exist for pull requests that are rejected or sit idly for long periods of time. If a maintainer does not want to merge a pull request, there should be a clear chain of command that a contributor can make their case to.
7 Likes

I like all these ideas

I dont think this is necessary nor ideal. We just need better communication.

Ive been trying hard to communicate in the channels most active, but I can certianly do a better job in the fourms. Great call out!

4 Likes

This is very useful feedback, thank you.

That’s a great idea, we are looking into it.

For Zebra, anything that’s not a release candidate is considered stable. That being said it seems wise to not update everything at the same time if you run multiple nodes regardless if it’s stable or not.

This release had a bunch of issues and we’re looking into ways to ensure that they don’t happen in the future.

  • The cookie authentication enabled by default was introduced in 2.0.0, which should be OK since 2.0.0 is expected to be breaking change. But it should have been in the release candidate first, or failing that, it should be introduced disabled by default.
  • We had to delete the 2.0.0 release due to a bug, so the 2.0.0 changelog was copied into 2.0.1, but it was not clear that that happened (we’ll edit it shortly). This gave the impression that the cookie auth was introduced in 2.0.1 when it wasn’t.
  • The release notes could give more details on how the cookie auth works and how it could be disabled if required (we’ll edit those in).

I don’t think that updating your Helm charts could be in scope, we can’t be expected to be aware how everyone in the ecosystem deploys their nodes. But I agree the whole thing could be handled better regardless.

4 Likes