Zcash Ecosystem Monorepo

skyl · May 11, 2022, 6:12am

I’ve been thinking a lot about Zcash governance and would like to share my big idea.

Let’s put all code related to Zcash into a polyglot monorepo with a single trunk

I’m a monorepo enthusiast. I love monorepos. I can’t stand switching context between repos and I hate wasting time managing | dependencies and releases across multiple repos.
For a basic primer about why a project would choose to keep all of their code in a single repository, I invite you to read Google’s wonderful article in the ACM from 2016 (my happy place):

There is also a nice talk that is worth watching Why Google Stores Billions of Lines of Code in a Single Repository - YouTube

Some advantages in short:

Easy to discover code, find canonical versions
Simplified dependency management!!
Atomic changes, large-scale refactoring
Easy to share code and collaborate
Flexible ownership defined in the repo
Developer ergonomics, standardized workflows and dev envs
More eyes, more reviewers, more knowledge shared
Value tangibly accumulated and distilled
Code can be more modular because the overhead of repo boundaries is removed (just create a new directory/file!)
Possibility to do integration tests across multiple projects - changes to dependencies can trigger integration tests in their dependents

How

We focus initially on Rust, TypeScript, Python and also documentation and official websites. We bring repos together into one repo (zecosystem - zecOS - Zecosys ), packages are separated by language/build system and/or by project/team/ownership. Merge rules are setup so PRs can only be merged with adequate acceptance by the owning team - eg, zebra requires approval from a ZF member and zcash requires approval from ECC. Whatever rules there currently are can be replicated by using directories instead of repos. Using CODEOWNERS and teams could be more transparent than the current fairly opaque ownership of the ECC and ZF repos (1). Rules can also be added so that downstream integration tests must pass before a merge is allowed.

Individual packages can be published out of the monorepo to the various package managers for people not fortunate to be working in the Zcash monorepo.

Reususable react components for zecpages, free2z, etc could be at ts/react, python libraries could be published out of a top-level py/ directory. All of the zips and books could be in the same repo and the boilerplate and build systems for docs wouldn’t have to pasted from repo to repo.

We could utilize a build system such as bazel to maintain explicit dependency graphs.

The repos

ZF has 47 repositories on GitHub:

zcash (ECC?) has 36:

ECC and ZF produce a lot of code in a lot of repos. There is even an additional repo that appears to be used to figure out the cross-repo dependencies.

Is the canonical version of this repo the one under the zcash org, zcash/developers or the one under ZcashFoundation org, ZcashFoundation/developers? Many if not most of the repos in ZcashFoundation and zcash are forked into the other github org. It’s sometimes hard to figure out which version is source of truth. Keeping track of all of the dependencies between different repos seems to be a chore in itself. Merging between individual forks and juggling all the versions is work that can be essentially removed in a single trunk.

The two elephants are zebra and zcash. Zebra is already setup in a monorepo format and it looks great to my eye.

I’m sure someone out there feels strongly that combining these repos is a bad idea or impossible. I’d like to hear those arguments. I can tell you that it is not impossible technically. Some (or even most) developers at ECC and ZF will have plenty of reasons why they like many repos and many folks will probably have a knee-jerk reaction that it’s impossible because of reasons like “ECC controls zcashd and ZF controls zebrad”. But, I ask if this is a real argument against a monorepo and if things need to be the way they are. Why is the separation considered necessary or desirable? Use your imagination and consider what could be possible.

We can talk about decentralization all we want. But, the truth is that the permissions to the canonical repos, the dependency graph and the trust placed in those who are allowed to merge and release is a huge (and probably underrated) part of what governance really is in practice. In a monorepo, these arrangements could be more transparent - encoded in the IAM and merge rules on the single main branch. I feel like right now we have a lot of the downside of centralization but are missing out on the possible benefits of centralization, while some of the efforts at decentralization might be uncharitably characterized as “decentralization theatre”.

There are probably important people who will start with a flat HELL NO and stick to it. But, think about it anyways. I sincerely believe that utilizing a monorepo could boost productivity by an order of magnitude by dispensing with unnecessary coordination between repos and allowing that effort to go into fruitful integration and knowledge sharing across the entire ecosystem.

Maybe this can just be food for thought and stimulate ideas about how we could radically change the Zcash governance and software ecosystem for the better. But, I could help work on this idea for real, if people are interested. It could possibly start out as two-way subtrees between the existing repos and the prospective monorepo. But, the real advantages would only start to accrue if significant contributors really wanted to commit to it.

GGuy · May 11, 2022, 6:27am

I do like the idea of more open discussions being possible on GitHub between parties…

conradoplg · May 11, 2022, 2:27pm

In my experience working with Zebra I don’t see any benefits of using a monorepo. Rust crates provide a easy way to manage dependencies regardless of which repository they reside.

If we had a lot of C++ code that would make more sense, since C++ dependencies are a pain to manage.

Regarding the other issues of discoverability, I think we could improve that by simply having a index page of sorts mapping the different components of the ecosystem. IMO that would be even better, because opening a gigantic repository with a bunch folders would not bring clarity to most of the Zcash users.

The DAGs do not keep track of dependencies between software components; they keep track of dependencies between tasks (ticket/issues) that have been planned.

pitmutt · May 11, 2022, 2:55pm

I have not heard of something like this. I am not sure I see the benefit, can’t imagine cloning the repo, downloading everything to make a contribution to a small part of one particular project.

However, some sort of “lay of the land” page showing the projects in the ecosystem and their repos, tagging them as active/inactive and whatnot, could be quite useful.

nuttycom · May 11, 2022, 3:26pm

I strongly dislike monorepos. They inevitably result in horrifically coupled code, they’re a pain to merge stuff to when a change in one part of the codebase has to be immediately reflected everywhere else, and they overall inhibit reasoning about parts of a system in isolation and discourage encapsulation, which are critical to security.

Moreover, I think that a monorepo is antithetical to the idea of decentralization. It is vitally important that components be independent and only interoperate in terms of high-level APIs; we’ve got too much coupling in the zcashd codebase as it is, simply because the zcashd wallet and full node share code. It’s a maintenance burden.

It’s hard for me to overstate the degree to which I detest monorepos. Basically, I won’t work that way.

str4d · May 11, 2022, 4:17pm

The article you linked to also described various disadvantages:

These costs and trade-offs fall into three categories:

Tooling investments for both development and execution;

Codebase complexity, including unnecessary dependencies and difficulties with code discovery; and

Effort invested in code health.

And this is why I am generally against full monorepoization. It works at Google, because Google is Google, and has the size and resources to dedicate multiple full-time teams solely to developing the tooling and providing the support necessary to maintain the monorepo.

At the kind of scale we are operating at, I prefer (what I will refer to as) “localised monorepoization”, where we take advantage of the benefits of cross-area workspaces where it makes sense, while minimising the interactions between more distant parts through repo separation. As you’ll see below, we are already doing this.

Let’s break these down:

Active repositories

GitHub - zcash/zcash: Zcash - Internet Money
- Structure inherited from Bitcoin Core; I’m not particularly keen to change that while we retain an underlying dependency on that upstream.
- Already a localised monorepo: certain dependendecies (the ones that upstream would modify regularly) are vendored internally. We additionally pin all our dependencies (unlike upstream) but do not vendor sources of those dependencies (because that makes the maintenance burden significantly higher).
GitHub - zcash/librustzcash: Rust-language assets for Zcash
- The main collection of Zcash-specific Rust crates, focused on pure-Rust APIs.
- Already a localised monorepo: it has 11 Rust crates within a single Cargo workspace.
GitHub - zcash/orchard: Implementation of the Zcash Orchard Protocol
- Standalone repo from the main Rust crate repo, I suspect (not speaking for ECC) because it made the licensing situation easier (ECC holds copyright on the repo, and requires CLAs to contribute).
GitHub - zcash/halo2: The Halo2 zero-knowledge proving system
- Originally a standalone repo like orchard, now already a localised monorepo: it has 3 separate crates for proofs, gadgets and (eventually) recursion.
Standalone Rust crates that, while maintained by ECC, are not Zcash-specific:
- GitHub - zcash/pasta_curves: Rust implementation for zcash/pasta
- GitHub - zcash/incrementalmerkletree: An append-only merkle tree which is always pruned, along with incremental, fast-forwarding witnesses
GitHub - zcash/zips: Zcash Improvement Proposals
- Already a localised monorepo: it manages both the protocol spec (in all its various NU versions), and the ZIPs.
Other assorted repos:
- GitHub - zcash/zcash-seeder: Network bootstrapping for the Zcash cryptocurrency via DNS
  - Standalone tool, fork of upstream bitcoin-seeder.
- GitHub - zcash/protocol.z.cash: Source for the Zcash Protocol website
- GitHub - zcash/developers
- GitHub - zcash/zcash-gitian: Deterministic build environment for Zcash
- GitHub - zcash/zcash.github.io
  - This one definitely can’t be part of a monorepo, as it’s for GitHub Pages.

Mobile wallet / SDK repositories

I can’t speak much to the structure of these, other than to say that managing iOS and Android toolchains is a lot of work, and in the past we have generally only had a single developer for each of iOS and Android.

Repos for generated content

GitHub - zcash/gitian.sigs: Gitian signatures for Zcash
- The structure of this repo, and its separate existence, is inherited from upstream Bitcoin Core.
GitHub - zcash/rpc

Historic repositories

Of these, the repos that I have interact with on a daily basis within the past few months are:

zcash/zcash
zcash/librustzcash
zcash/orchard
zcash/halo2
zcash/zips (not making changes, but referring to its contents)

Now that zcashd 5.0.0 is out, I expect changes to zcash/orchard to become much less frequent. The others are all localised monorepos.

I agree with the general idea of having the full context for whatever it is that I need to work on, locally available. With the current repository structures, I have that for the large majority of the time, and in the cases I don’t (integrating an API change from a Rust crate into zcashd for example), there is a simple pathway to connecting them (cargo patches).

I don’t know for certain why ZF forked zcash/developers, but I suspect it’s because a) they saw the benefit of the DAG and wanted to tune their local view, and b) I was travelling for conferences and while away accidentally broke the GitHub Pages renderer (because ZenHub does not allow more than one API key per user sigh), and they weren’t using locally-rendered versions for agility like we’ve been doing at ECC. I’d like to resolve this fork at somepoint.

As @conradoplg pointed out above, this is a misunderstanding of the DAG. Even if we used a monorepo, we would still use the DAG. Because the DAG is not about coordinating inter-repo dependencies, but inter-issue dependencies; all a monorepo would do here is change the URLs to the issues.

The DAG is great. All other planning tools are projections of the DAG, mere shadows of its perfection. The DAG is what helped us to finish zcashd 5.0.0. Al̴l̶ ̷h̶à̴̡͍͔i̵̼͒͂l̸̠̱̄ ̵̺̝͆͐ͅẗ̴͔̤̟́͑ȟ̸̺̩͎̪͈͂̑ė̵̖̯̹͙̋̆ ̶̖̙̅̔͑̿D̴̠͇͚̱̫̿̉̒A̵̟̬͂̇̔̍̈G̶̨̹̮͍͓̽̋.̶̧̈́͒̕͠

skyl · May 11, 2022, 11:54pm

Hey, thanks a lot for the explanation @str4d! Perfect level of detail to help me understand what’s going on. Thanks for correcting my misunderstanding of the DAG project. It does look really awesome. Al̴l̶ ̷h̶à̴̡͍͔i̵̼͒͂l̸̠̱̄ ̵̺̝͆͐ͅẗ̴͔̤̟́͑ȟ̸̺̩͎̪͈͂̑ė̵̖̯̹͙̋̆ ̶̖̙̅̔͑̿D̴̠͇͚̱̫̿̉̒A̵̟̬͂̇̔̍̈G̶̨̹̮͍͓̽̋.̶̧̈́͒̕͠!!!

I appreciate your time here and your acknowledgment of the benefits and trade-offs and not just saying “I hate monorepos” . I like your idea of “localized monorepos”. Fewer is better IMHO. If the same small group of people has to bump tags in repo A to bump the tag and rebuild repo B so that repo C can bump the tags/revision hashes (all manually checked and reviewed by the same people in all the repos, commits just to move version pins …), there might be a case to bring A and B into C - while still keeping A and B as separate as desired and still granularly publishing as many small composable packages as makes sense. Paradoxically, it’s easier to publish more, smaller packages from a single repo because the overhead of the extra repo is replaced by an extra directory.

I don’t want to belabor this thread or start a holy war. But, I would like to respond to a few things as food for thought.

This is confusing statement for me @conradoplg. You made this single PR that bumped the external versions for ~12 packages with a single unified changelog. Without a monorepo, I think this task would have been about 12X more work? In a larger monorepo you might not have had to bump these versions at all - 1.0.0-beta.9? Since everything would be integrated in trunk, you might be able to hold off on formally publishing intermediate release candidates to yourself.

github.com/ZcashFoundation/zebra

v1.0.0-beta.9 release

ZcashFoundation:main ← ZcashFoundation:v1.0.0-beta.9-release

opened 06:08PM - 06 May 22 UTC

conradoplg

+106 -27

--- name: Release Checklist Template about: Checklist of versioning to create …a taggable commit for Zebra title: '' labels: assignees: '' --- ## Versioning ### Which Crates to Increment To check if any of the top-level crates need version increments, go to the zebra GitHub code page: https://github.com/ZcashFoundation/zebra and use the last modified dates of each crate. Alternatively you can use the github compare tool and check the `main` branch against the last tag ([Example](https://github.com/ZcashFoundation/zebra/compare/v1.0.0-alpha.15...main)). `git diff --stat <previous-release-tag> origin/main` is also useful to see what's changed. - [x] Increment the crates that have new commits since the last version update - [x] Increment any crates that depend on crates that have changed - [x] Keep a list of the crates that haven't been incremented, to include in the PR ### How to Increment Versions Zebra follows [semantic versioning](https://semver.org). Semantic versions look like: `MAJOR`.`MINOR`.`PATCH[`-`TAG`.`PRE-RELEASE]` #### Pre-Release Crates Pre-Release versions have a `TAG` like "alpha" or "beta". For example: `1.0.0-alpha.0` 1. Increment the `PRE-RELEASE` version for the crate. #### Unstable Crates Unstable versions have a `MAJOR` version of zero. For example: `0.1.0` 1. Follow stable crate versioning, but increment the `MINOR` version for breaking changes #### Stable Crates For example: `1.0.0` Increment the first version component in this list, and reset the other components to zero: 1. MAJOR versions for breaking public API changes and removals * check for types from dependencies that appear in the public API 2. MINOR versions for new features 3. PATCH versions for bug fixes * includes dependency updates that don't impact the public API ### Version Locations Once you know which versions you want to increment, you can find them in the: - [x] zebra* `Cargo.toml`s - [x] tower-* `Cargo.toml`s - [x] `zebra-network` protocol user agent: https://github.com/ZcashFoundation/zebra/blob/main/zebra-network/src/constants.rs - [x] `README.md` - [x] `book/src/user/install.md` - [x] `Cargo.lock`: automatically generated by `cargo build` #### Version Tooling You can use `fastmod` to interactively find and replace versions. For example, you can do something like: ``` fastmod --extensions rs,toml,md --fixed-strings '1.0.0-alpha.12' '1.0.0-alpha.13' fastmod --extensions rs,toml,md --fixed-strings '0.2.9' '0.2.10' tower-batch fastmod --extensions rs,toml,md --fixed-strings '0.2.8' '0.2.9' tower-fallback ``` ### Reviewing Version Bumps Check for missed changes by going to: `https://github.com/ZcashFoundation/zebra/tree/<commit-hash>/` Where `<commit-hash>` is the hash of the last commit in the version bump PR. If any Zebra or Tower crates have commit messages that are **not** a version bump, we have missed an update. Also check for crates that depend on crates that have changed. They should get a version bump as well. ## README As we resolve various outstanding known issues and implement new functionality with each release, we should double check the README for any necessary updates. We should check and update if necessary: - [x] The "Known Issues" section to ensure that any items that are resolved in the latest release are no longer listed in the README. ## Change Log **Important**: Any merge into `main` deletes any edits to the draft changelog. Once you are ready to tag a release, copy the draft changelog into `CHANGELOG.md`. We follow the [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. We use [the Release Drafter workflow](https://github.com/marketplace/actions/release-drafter) to automatically create a [draft changelog](https://github.com/ZcashFoundation/zebra/releases). To create the final change log: - [x] Copy the draft changelog into `CHANGELOG.md` - [x] Delete any trivial changes. Keep the list of those, to include in the PR - [x] Combine duplicate changes - [x] Edit change descriptions so they are consistent, and make sense to non-developers - [x] Check the category for each change - prefer the "Fix" category if you're not sure #### Change Categories From "Keep a Changelog": * `Added` for new features. * `Changed` for changes in existing functionality. * `Deprecated` for soon-to-be removed features. * `Removed` for now removed features. * `Fixed` for any bug fixes. * `Security` in case of vulnerabilities. ## Create the Release ### Create the Release PR After you have the version increments and the updated changelog: - [x] Push the version increments and the updated changelog into a branch (name suggestion, example: `v1.0.0-alpha.0-release`) - [x] Create a release PR by adding `&template=release-checklist.md` to the comparing url ([Example](https://github.com/ZcashFoundation/zebra/compare/v1.0.0-alpha.0-release?expand=1&template=release-checklist.md)). - [ ] Add the list of deleted changelog entries as a comment to make reviewing easier. - [ ] Also add the list of not-bumped crates as a comment (can use the same comment as the previous one). - [x] While the PR is being reviewed, turn on [Merge Freeze](https://www.mergefreeze.com/installations/3676/branches) to stop other PRs merging ### Create the Release - [x] Once the release PR has been approved and merged, create a new release using the draft release as a base, by clicking the Edit icon in the [draft release](https://github.com/ZcashFoundation/zebra/releases). - [x] Set the release title to `Zebra ` followed by the version tag, for example: `Zebra 1.0.0-alpha.0` - [x] Copy the final changelog of this release to the release description (starting just _after_ the title `## [Zebra ...`) - [x] Set the tag name to the version tag, for example: `v1.0.0-alpha.0` - [x] Set the release to target the `main` branch - [x] Mark the release as 'pre-release' (until we are no longer alpha/beta) - [x] Publish the release ## Final Testing - [x] After tagging the release, test that the exact `cargo install` command in `README.md` works (`--git` behaves a bit differently to `--path`) - [ ] Turn off [Merge Freeze](https://www.mergefreeze.com/installations/3676/branches) to start merging PRs again If the build fails after tagging: 1. fix the build 2. increment versions again, following these instructions from the start 3. update `README.md` with a **new** git tag 4. update `CHANGELOG.md` with details about the fix 5. tag a **new** release

This may be just a difference in preference. But, I imagine pulling this imaginary repo and having a devcontainer with all of the pinned versions you need to do anything - develop zebra, zcashd, zecwallet-light, build the rpc docs locally, have the zips conveniently onhand, the zebra book, the zcash book, build and start lightwalletd, start a new website or wallet, start an interactive python session that can idiomatically speak to a local RPC interface with little or no setup - the host machine only needs something that can run the devcontainer (docker).

cd contrib/ts/websites/zecpages
npm install; npm start

cd zcash/zcash
./zcutil/build.sh -j$(nproc)

cd zcash/rpc
./build.sh

…

The thought of it makes me happy.

A jump-off page for developers new to the ecosystem could help some, or maybe even better a “metarepo” with sub{modules|trees|repos} could provide some of the power/convenience of a monorepo but also allow 2-way push/pull with the many repos so that developers who hate monorepo could keep their normal many-repo workflows. Metarepo could be cool but it’s more complex than a straight monorepo and you don’t get the true atomic changes across multiple packages and consequently you don’t get as much help with the version-bump dependency hassle of many repos. I think I noticed a bit of this type of hassle yesterday? Discord

Ah @nuttycom. I have so much respect for you, I’m sorry you feel this way . You are much more important to the code in question by about \infty, so, I guess that settles it! But, I’m going to respond just in case!

I don’t agree. Monorepos allow over-dependence at the source level but that’s not inevitable. In a mature system at scale, owners of different code paths will declare what is “public” and allowed to be depended on by other packages/apps. Monorepos also allow much more modular granularity. A monorepo with 1000 small components independently published is no big deal compared to 1000 interdependent repos with version pins. Further, having some UI components down/one/path and some rust crates down/another - these things would be hard to couple together even if you tried! But, if you have a server in one language and a client in another - you want these things to always work together. Having them in the same repo opens possibilities of code generation and integration that are harder to cultivate across separate repos.

If a change to an upstream dependency breaks something downstream, I’d rather know before merging it than have to find out when the version pin is eventually tried in some dependent-yet-unintegrated downstream repo. The trunk model does require some commitment to staying close to trunk and continuously taking in changes from the main branch. A large, long-running, stale branch can be a pain to merge indeed. But, this can also provide information that the changes are too disruptive and too much work - when everything is “independent” (dependent, yet unintegrated), you have less information about how much downstream pain aggressive changes will cause.

Monorepos give you flexibility to make a spaghetti monolith; but they don’t make you or discourage anything. In fact, you are much more free to break things down into smaller components. You start a package and put stuff into it and later you realize it should be 5 packages so you break it into 5 directories atomically without breaking any other dependents in the repo. But, then you decide it’s more elegant as 2 packages instead of 5. So, you make the change and change all dependents in another single atomic PR. With many repos, you would just keep the 1 package that you initially decided on because creating 5 isolated and encapsulated repos would be too much hassle with all the additional version pins to bump everywhere.

This I think I have to agree with - in theory. I am basically advocating for centralization! Or, rather advocating that we take more advantage of the centralization that is already intrinsic to Zcash governance. On paper, multiple repos look more decentralized. But, in practice, it’s the permissions at the org level that centralize Zcash - not the number of repos in the orgs. Those permissions are actually mysterious and opaque right now. I’m pretty sure there is no way to see who has what rights in which repo for common zcashers who are not in the central circle. In a monorepo, where potentially many people/teams/orgs interact with the code, the definitions of the merge rules would have to be defined publicly - anything in zcash/zcash would need a review by teamX consisting of {members}, any changes in websites/zecpages would need to be approved by teamY. This would force a kind of transparency that is not available today to my knowledge. Today, who exactly can merge to the master branch of zcash/zcash and by what rules - what governance?

As I think about it, I actually have an even worse idea - all code funded by the dev fund should go into the same centralized monorepo! Of course, decentralization is an overall goal of the project. So, how do I reconcile this? Well, there are still branches and forks and we still publish packages, crates, etc (granularly and MIT-licensed) so no freedom or flexibility is lost. People are still allowed to deviate from the centrally controlled integration and go their own way. Right now, for the most part, it’s the same small teams controlling all of this code. So, there is not much decentralization at this stage, regardless of how many repos.

Say these components were more decoupled. I’m curious how you resolve this dependency issue - would you have a copy of the same code in both places or would you have a shared library? In my world, we would pull out the shared stuff into well-defined and separately tested foundational components that multiple different things could depend on.

@str4d - thanks again for the high-resolution explanation of the individual repos. I can see that your approach overall makes a lot of sense given the current context. Sounds like it’s a moot point anyways. I achieved an “I’m muting this thread now” from @dconnolly which was liked by @zooko. I think between that and the “I detest monorepos” from @nuttycom it’s pretty much settled .

I would still love to have a “whole world at your fingertips” workflow with monorepo and devcontainer… Maybe I’ll begin it someday anyways starting with sub{repos|modules|trees} … a devcontainer with submodules could be a cool start …

Does any one of these repos have a devcontainer with tools installed or do most people install various versions of things on their host machines? (eg rust, python, go, npm …) … or?

I guess for now I’ll mind my own business and go work on my own beloved monorepos.

conradoplg · May 12, 2022, 12:24am

Honestly I never thought about the Zebra repo as a monorepo since it’s all Zebra to me, but of course, it actually contains multiple crates. You’re right in pointing out that this makes some tasks much easier!

But I still don’t think including the entire Zcash ecosystem in a monorepo would be beneficial, I guess I also favor the “localized monorepos” that @str4d explained much better than I’d could.

skyl · May 12, 2022, 12:36am

Maybe a less fraught word for the zebra repo would be something like “multipackage repo” - but, to me, it’s an example of a monorepo. All of the best software is in a monorepo

Curious if anyone here can point out a counterexample for the following claim: “Every programming language is in a monorepo”.

Think about it - why would you want version A of one part of the language and version B of another part of the language? Interested to hear if anyone has a counter though.

nuttycom · May 12, 2022, 3:14am

One way in which the zcash/zcash monorepo is already incredibly problematic is build times - it takes over an hour for a single CI run. zcash/librustzcash is, as @str4d mentioned, also monorepo-ish, and its build times are in the tens of minutes. Having separate repositories means that you don’t have to run what amount to full, cross-module integration tests on every PR. TBH, this is a reason that I think we should fragment our repositories more than we already do; we should start by moving everything in zcash/librustzcash that is depended upon by zcash/orchard out into its own repository, so that we no longer have a cyclic dependency between repositories.

That cyclic dependency arose, of course, because zcash/librustzcash is a monorepo. But it absolutely makes things like patch version dependencies harder to manage.

skyl · May 12, 2022, 8:32pm

Thanks for the concrete problem to ponder. In the unlikely future where everything is in a giant polyglot monorepo, a tool like bazel would absolutely be needed to only run tests that need to be run for each PR, cache results and potentially run tests with massive parallelization. You’re right that just pulling everything together and running all the tests for every PR wouldn’t work. BUT, OTOH, you do want to know if your changes to a dependency breaks a dependent. Cutting the line so that you can change the dependency and simply not know that you broke the dependent is not ideal either. “If you’re not using a monorepo, you’re not doing continuous integration, you’re doing frequent integration at best”.

I like a challenge; so, I’m going to take a look at the existing zcash/zcash and try to understand why the tests take so long and what might be able to be done to speed them up where they are.

I’m looking at Development Guidelines — Zcash Documentation 5.2.0 documentation

I notice the URL says latest but the embed says 4.6.0. I’m working off of the zcash/zcash:master though - has anything changed? One thing I notice is that qa/pull-tester/rpc-tests.sh doesn’t exist. I guess at some point this was changed to qa/pull-tester/rpc-tests.py. The first thing I would like to change here is the development_guidelines.html to get them up-to-date. But, unfortunately … the source of development_guidelines.html is not in zcash/zcash. Spending a few minutes to find where these docs come from, it appears from https://github.com/search?q=%22Add+unit+tests+for+Zcash%22&type=Code that … the canonical reference for Development Guidelines — Zcash Documentation 5.2.0 documentation … is … zcash/development_guidelines.rst at 811fcdbeed394a0117dcb02e86aba0be91d30981 · AngeloSegreto/zcash · GitHub

BUT, when I check out this repo, it looks like the latest there is 4.2.0 - zcash/conf.py at a4b2c9ec383a71966aa56bd3ffcd3c14ef75f426 · AngeloSegreto/zcash · GitHub

So, after about 15 minutes I can’t find the source to https://zcash.readthedocs.io/en/latest/rtd_pages/development_guidelines.html to improve it.

This is a microcosm, case-in-point to the barriers and problems with many repos. In the monorepo case, everything can be atomically bumped together and artifacts like the RTD website can be pushed out together so things don’t come out of sync. With a monorepo, I could use regular unix tools or my editor to find the source for the RTD site offline instead of flailing around with the search on GitHub …

But, anyways. Do you have any hints on what are the main reasons why the tests take so long and what a newb might start looking at to find relatively low-hanging fruit for making them faster?

skyl · May 12, 2022, 8:44pm

Ha, maybe I’m just an idiot ;9 “Edit on gitlab” … still

skyl · May 12, 2022, 9:09pm

For the size of everything in the Zcash ecosystem, probably less than a gigabyte total, I’d personally be happy to clone it all and be able to work offline since I don’t have super good internet all of the time. But, for extremely large repos (all of Zcash ecosystem wouldn’t probably be considered extremely large), Microsoft has done some good work since all of Windows is in a single git repo. The initial clone doesn’t require pulling everything.

→

skyl · May 13, 2022, 4:30am

github.com/QED-it/zcash

Switching to QED-it orchard Rust crate

QED-it:zsa1 ← QED-it:conn_to_qedit_orchard

opened 05:57AM - 10 May 22 UTC

vivek-arte

+50 -2

This PR allows us to use the QED-it orchard crate. Building locally would requ…ire using `CONFIGURE_FLAGS=--enable-online-rust ./zcutil/build.sh -j8`, as that allows using the dependencies specified in Cargo.toml instead of those in Cargo.lock Note that when the orchard crate is updated, it will be required to update the Cargo.toml file to reflect any newer commits.

skyl · May 13, 2022, 7:20am

I promise I’m not cherry-picking things or trying to start trouble. But, this is the exact kind of hassle I’m talking about.

The zcashd docs, wherever they get pushed to -RTD, pdf, github-pages, etc - should really just live in zcash/zcash and be pushed out as an artifact IMHO.

What is the use of having two repos with the same docs diverging?

skyl · May 13, 2022, 8:31am

This sounds nice

https://blog.coinbase.com/bootstrapping-the-coinbase-monorepo-575cf981c859

nuttycom · May 13, 2022, 5:42pm

We’re absolutely aware that the readthedocs.io site information is out of date. And, ironically, this is one place where a “monorepo” approach makes a lot of sense to me - we’re hoping to move all this documentation back into the zcashd book so that it can more easily be kept up-to-date as the source code changes.

This documentation all originally lived in the zcash/zcash repository; a previous generation of ECC developer relations folks thought that it would be easier to maintain outside of the repository, but the overhead of having a separate repo, combined with the departure of the folks who made those changes and the ECC core protocol team being buried in NU5 development made for some rot in those docs. Suffice it to say that we’re working on it, and community PRs to aid this process would be greatly appreciated. Using mdbook documentation has been really effective for us in the past year; for example, the halo2 book has stayed up to date through this period.

nuttycom · May 13, 2022, 5:54pm

There was one previous effort I’m aware of to create a Bazel alternative to the zcashd build system: Port build system to Bazel · Issue #2811 · zcash/zcash · GitHub. The problem, as usual, is how to allocate development resources - while using Bazel is appealing for a number of reasons, none of the ECC core team who would be responsible for maintaining this system have Bazel experience and we’ve been too busy working on the protocol to consider switching. All of our existing infrastructure is built based upon the upstream Bitcoin build systems, and there’s a ton of specialized knowledge involved in maintaining that (the depends system in particular). It’s just not clear how to get from here to there with all of the higher-priority work that needs to be done.

skyl · May 15, 2022, 12:37am

Cool! I’ll try to find time to pick up on that work if people would still be potentially interested.

I threw out the zany big idea (that all code from zebra to zcashd to zecpages to free2z to ZWL … all get merged into the biggest, baddest repo) more as a thought experiment than a practical call-to-action on what the top priority should be immediately. I know it’s 1000s of hours and there are other priorities. BUT, I like the idea of taking small practical steps forward with some of the advantages of monorepo in mind. For example, I love the idea of putting the docs that describe how to build zcashd in the same repo as the code/scripts that those docs reference.

I also really like bazel after using it at a high level for a few years and am pretty excited to see how far someone got with it a few years ago. But, also a little sad that it didn’t make it in. That was a lot of work!

github.com/zcash/zcash

Add Bazel build system (Linux x86-64 only for now)

zcash:master ← per-gron:bazel-review

opened 10:11AM - 22 Jan 18 UTC

per-gron

+39869 -432

This is a very large PR, I know. But you'll be glad to know that the core of the… build system is "just" 7,227 lines! Most of the code in this PR is generated by included scripts. I think the size of this PR warrants a thorough PR description, so here it comes: # What This PR adds a Bazel based build system that is close to being fully hermetically sealed: All that is needed from the host system is Bazel, Perl, Python and a glibc less than a few years old (this means that on most Linuxes Bazel is the only dependency that's not included). It produces a bit-for-bit deterministic build and builds and runs virtually all of the tests. This is the bulk of #2811. Each test file is run independently. This allows for running tests in parallel, which in my experience results in a substantial improvement in the runtime of the test suite. # How to use **WARNING: Until the PR is reviewed, be careful about actually running the scripts here. They download and run binaries built by me, so if I am evil or if someone evil has messed with my computer your computer could be compromised.** First, [install Bazel](https://bazel.build). To build everything, run `bazel build ...`. To test everything run `bazel test ...`. It is probably not needed but I recommend running the build from a fresh Zcash repo because the current build system writes header files and other random things in the source tree which can make things confusing. Notes: * The build will download a bunch of stuff, including GCC and the Sprout public parameters. This is not optional and is necessesary for deterministic builds. * By default the "fastbuild" build configuration is used: It is quick to build but not good when debugging and does not produce optimized builds. For debug builds, add `-c dbg`, for optimized builds add `-c opt`. * Configure the number of parallel jobs / test runs with the `--jobs` flag, for example `bazel build --jobs 16`. * 1GB of RAM per parallel job is not enough to build (you will see terrible swapping and/or crashes). 2GB seems sufficient. * The tests need even more RAM. I haven't done a lot of testing but it seems like the tests are stable at ~10GB of RAM per parallel job. # Development workflow For day-to-day development, the main difference between the current build system and Bazel is that it needs the full dependency graph to be specified in the `BUILD` files. Having to write this down explicitly can be good for awareness of dependencies but in practice it is also a pain in the neck. Keeping `deps` up to date is similar to the problem of keeping a correct set of `#include`s: Everything that is used in this file should be `#include`d and nothing more. Like C++, Bazel does not (currently) detect if a library depends on a header that is only indirectly depended on in the `deps` declaration, so it is possible to end up with both too many and too few entries in `deps`. To make it easier to maintain this, I have written a script `./tools/build_cleaner.sh`. Inspired by (but much less smart than) a tool with the same name that exists internally at Google, it goes through all C++ rules in the `BUILD` files and compares their `deps` with what is actually `#include`d in their sources and adjusts the `BUILD` files accordingly. It has been very nice to have that while writing all these `BUILD` files and I think it can be useful in the future too. However, it is not fool-proof and it will likely require a bit of maintenance to keep it working in case different kinds of header search paths are introduced or similar. Its goal is that if the `deps` are right, it should do nothing, but there will be cases where it does not properly fix broken dependencies. It is also a bit slow, especially the first time it's run (have patience). It's up to you if you want to use `build_cleaner.sh` or not. Bazel also strictly forbids circular dependencies between libraries; if you have a circular dependency you have to either remove it or put both source files in the same library, which has negative consequences for caching and build times. Personally, I think this is a good thing, especially in this code base which has plenty of them. It is good that something complains about it. # Linking This build system uses static linking more aggressively than Zcash's main build system: `zcashd`, `zcash-cli`, `zcash-tx` and `libzcashconsensus.so` are only dynamically linked against glibc. All other dependencies, including libstdc++ and libgcc are linked statically, to aid portability (it should for example eliminate the error in #2866). The static linking of `libgcc` is the thing that I suspect can be controversial, see for example http://micro.nicholaswilson.me.uk/post/31855915892/rules-of-static-linking-libstdc-libc-libgcc Here is the actual reasoning why they made it dynamic by default: https://gcc.gnu.org/ml/gcc/2000-04/msg00610.html This static linking is safe because all C++ code used in the executables are built with exactly the same toolchain. It is easy to see how this guarantees safety for libstdc++, and for libgcc this is also true because the only thing that can cause issues across different compilation units is its exception handling, which C does not have. This is safe also for `libzcashconsensus.so` because it has a C API which does not expose any C++ type or exceptions. glibc 2.19 (released 2014-02-07) is used, which means the produced binaries work back to Debian Jessie (initially released on 2015-04-26) and any other Linux distribution with glibc that is at least as recent. # The GCC toolchain To ensure deterministic builds, the scripts in this PR downloads and uses a specific build of a specific version of GCC. Although it is technically possible to not download binaries and build GCC from source, that would increase the build time a lot so I think downloading binaries is necessary. Downloading binaries is a bit scary from a security point of view. I tried to find trustworthy prebuilt GCC binaries but couldn't find any that I was happy with (they have to work across Linux distributions, be of a recent version and support some not-super-common features like OpenMP and provide static versions of libgcc and libstdc++). The closest I could find was from [Android](https://android.googlesource.com/platform/prebuilts). I think they could be used, but GCC 4.8 is over two and a half years old. To get a trustworthy precompiled GCC toolchain, I wrote [scripts based on Vagrant and crosstool-ng](https://github.com/per-gron/zcash-toolchain) that compiles GCC and generates a bit-for-bit deterministic tarball that can be downloaded and used by Bazel to build this project. The scripts are in a separate repo because I wanted to use Github's "releases" feature to host the tarball. If the Zcash maintainers can host this file outside of Github the scripts can be moved to zcash.git. # Third-party dependencies Zcash depends on a bunch of third-party libraries, most of which do not offer Bazel build scripts. One way to make this work is to link against prebuilt versions of these libraries, but that is difficult to do in a verifiably secure way and is also quite tricky to get to work properly with debug symbols, optimization modes, code coverage etc. Instead, like the current build system, this PR will build all these libraries from source. To ensure that the build stays deterministic they are built with Bazel, which requires Bazel files for the dependencies. Some of the dependencies have fairly complicated build systems (for example OpenSSL and GMP), and I wanted to make sure that it is possible to upgrade these dependencies in the future. To make that easier, I wrote scripts that, when run in the source directory of the libraries, create a `BUILD` file that can be used. See for example `tools/depends/generate_openssl_bazel.py` and the generated `tools/depends/generated/openssl.bazel`. Some of these generated Bazel files include generated config.h headers and similar things, which makes them quite large (this is why this PR adds almost 40k lines of code). When running these scripts, make sure to set the `CC` environment variable to the path to the `gcc` of the toolchain used in the build, otherwise the `./configure` scripts will find libraries on the machine the script is run on that won't be there when actually building. # How to review 1. Review the [toolchain build scripts](https://github.com/per-gron/zcash-toolchain) and build it. Compare the produced `zcash_cc_toolchain.tar.xz` against [the one that's used by the build](https://github.com/per-gron/zcash-toolchain/releases/download/v0.1.0/zcash_cc_toolchain.tar.xz). If they are different, please see if the uncompressed .tar files are different. If they are, please extract both archives and run `diff -r` on the directories to find which files are different, run `diffoscope` on those files and let me know the results. 2. Review all but the last two commits in the PR: They make changes to zcash repo that the Bazel build needs to work. 3. Review the "Add Bazel build system" commit; this is where the actual build system is. 4. Review the generated Bazel files in the "Add generated Bazel files" either by looking at the files or by running the `generate_*_bazel.py` scripts yourself and comparing the results. If you want to verify build reproducibility, here are checksums of the binaries produced when I build with `-c dbg`, both on my computer and on a Google Cloud instance (as of 113edcf89e367ad3bd42a475756da929d0f8b60a): ``` $ shasum -a 256 < bazel-bin/src/zcashd 26cc68f88c6b8f6aaf48a3c1bfedf127db663b676bec340e1fc1e242f68cd38f - $ shasum -a 256 < bazel-bin/src/zcash-cli 830782280a2b57aedfd33dfa4ad582df343647985e066f8f64860d48e7421857 - $ shasum -a 256 < bazel-bin/src/zcash-tx 6d4aa061424acc109f59af7651cc02ddffcf3225d05bbba7433ffa1cf2c74022 - ``` # Third-party licenses The Bazel build system does not add any additional dependencies to the built binaries but it does have some new build-time dependencies. In addition it explicitly downloads some libraries that were implicitly depended on before, for example procps. Here is a list of these (not including libraries that are already downloaded by the current build or included in the repo), including what licenses they are under: * https://github.com/bazelbuild/rules_rust Apache 2 * https://github.com/nelhage/rules_boost Apache 2 * https://github.com/wahern/hexdump MIT * http://haddonthethird.net/m4 BSD 3-clause * https://gitlab.com/procps-ng/procps GPL 2 * https://github.com/ActiveState/appdirs MIT * https://github.com/PyCQA/baron LGPL 3 * https://github.com/dchest/pyblake2 Apache 2 (is one of several options) * https://github.com/PyCQA/redbaron LGPL 3 * https://github.com/alex/rply BSD 3-clause * https://www.python.org/ Its own permissive license * GCC GPL 3 * Rust MIT/Apache 2 * https://github.com/rust-lang/libc MIT/Apache 2 You may want to consider adding these to some kind of NOTICES file. I don't see one in the repo at the moment. # Additional notes * `pyflakes` is not run by Bazel, it is probably best to run it as a separate CI step. * Bazel does not accept Python modules with a dash in their search path, so I had to rename `rpc-tests` to `rpc_tests`. This will require reconfiguring CI bots, at least for `pyflakes`. * The `no-dot-so` test is not included here. With Bazel you have better control over linking than when building third party libraries with their own build scripts so I don't think it's needed. * I have made one change to libsnark which should probably be ported upstream, see "libsnark: Don't (implicitly) rely on other tests initializing the public params" * Unlike the current build system, GMP is built with the `--enable-fat` configuration option. # Future work I think this is a good start: It builds zcash and proves that it works by running tests. It is not enough to replace the current build system though. Missing pieces include: * It does not run the sec-hard test (it's not hard to add though). * It does not use Rust code, because the official [Bazel rules for Rust](https://github.com/bazelbuild/rules_rust) do not support using Rust code from C++ code. This should be possible to do but is difficult enough that I think it makes sense to do that as a follow-up. * It refers to on-the-fly generated tarballs from Github that are not guaranteed to not change. It would be better to host these files separately. I did not do this because I think it's better if the Zcash developers add it to some official mirror than if I do this. * It does not generate a Debian package for the official release. This should be straight forward to add; Bazel has support for this. * It does not support OS X or Windows. The current build system doesn't either, but I don't think it would be a good idea to remove the current build system until the Bazel build supports these platforms, otherwise the inofficial ports would be broken. * There is no code coverage support. Bazel can be hacked to support this but does not officially do it yet; it's supposed to work before the 1.0 release in June though.

Any indication of why it didn’t make it? I guess other things were moving so fast that it was hard to keep such a large PR fresh? Add Bazel build system (Linux x86-64 only for now) by per-gron · Pull Request #2891 · zcash/zcash · GitHub

It’s hard to estimate how much would be gained in terms of test time using bazel. But, it’s not hard to imagine cutting 90% or more from the CI run time for most runs. Did per-gron or anyone else report any comparison of running the tests with bazel instead of the existing toolchain?

skyl · May 16, 2022, 2:15am

I have a little kanban board here to keep notes.

I’m going to be concentrating on free2z feature work and stuff mostly in the coming months. But, I’m stoked to try to work on some of these ideas at some point. I kinda’ want to open source free2z and put it in the monorepo . I have some psychological deficit where having more than one repo, more than one editor window open, really bums me out . I want to just pull the world and go from package to package, directory to directory without any kind of repo friction.

I think the recent zcashd → lightwalletd → zecwallet lite problem illustrates how important integration is. In theory these things should be decoupled and independent. But, in practice, the final integration of all of the important components in the ecosystem is what matters most.

This is not a criticism of all of the hard work that everyone is doing. I’m super impressed by everything that is going on and I’m basically just dreaming over here without having made any significant contributions. But, I still envision a potential future where we are able to integrate all of these important components together to find downstream problems and collaborate on solutions at much earlier points in the lifecyle.

Topic		Replies	Views
100% of lockbox to Zcash Rust monorepo Governance	2	99	March 8, 2025
Zcash Ecosystem Gitub Repo Community Curated List Community Collaborations	8	721	October 28, 2022
Proposed change in process for protocol upgrades General	11	358	July 28, 2025
Monero. Everywhere General	179	4726	July 23, 2025
Zcash Developer Relations Engineer Community Grants	24	1029	July 17, 2025

Active repositories

Mobile wallet / SDK repositories

Repos for generated content

Historic repositories

Related topics