Zcash Ecosystem Monorepo

I’ve been thinking a lot about Zcash governance and would like to share my big idea.

Let’s put all code related to Zcash into a polyglot monorepo with a single trunk

I’m a monorepo enthusiast. I love monorepos. I can’t stand switching context between repos and I hate wasting time managing | dependencies and releases across multiple repos.
For a basic primer about why a project would choose to keep all of their code in a single repository, I invite you to read Google’s wonderful article in the ACM from 2016 (my happy place):

There is also a nice talk that is worth watching Why Google Stores Billions of Lines of Code in a Single Repository - YouTube

Some advantages in short:

  • Easy to discover code, find canonical versions
  • Simplified dependency management!!
  • Atomic changes, large-scale refactoring
  • Easy to share code and collaborate
  • Flexible ownership defined in the repo
  • Developer ergonomics, standardized workflows and dev envs
  • More eyes, more reviewers, more knowledge shared
  • Value tangibly accumulated and distilled
  • Code can be more modular because the overhead of repo boundaries is removed (just create a new directory/file!)
  • Possibility to do integration tests across multiple projects - changes to dependencies can trigger integration tests in their dependents

How

We focus initially on Rust, TypeScript, Python and also documentation and official websites. We bring repos together into one repo (zecosystem - zecOS - Zecosys :smiley: ), packages are separated by language/build system and/or by project/team/ownership. Merge rules are setup so PRs can only be merged with adequate acceptance by the owning team - eg, zebra requires approval from a ZF member and zcash requires approval from ECC. Whatever rules there currently are can be replicated by using directories instead of repos. Using CODEOWNERS and teams could be more transparent than the current fairly opaque ownership of the ECC and ZF repos (1). Rules can also be added so that downstream integration tests must pass before a merge is allowed.

Individual packages can be published out of the monorepo to the various package managers for people not fortunate to be working in the Zcash monorepo.

Reususable react components for zecpages, free2z, etc could be at ts/react, python libraries could be published out of a top-level py/ directory. All of the zips and books could be in the same repo and the boilerplate and build systems for docs wouldn’t have to pasted from repo to repo.

We could utilize a build system such as bazel to maintain explicit dependency graphs.

The repos

ZF has 47 repositories on GitHub:

zcash (ECC?) has 36:

ECC and ZF produce a lot of code in a lot of repos. There is even an additional repo that appears to be used to figure out the cross-repo dependencies.

Is the canonical version of this repo the one under the zcash org, zcash/developers or the one under ZcashFoundation org, ZcashFoundation/developers? Many if not most of the repos in ZcashFoundation and zcash are forked into the other github org. It’s sometimes hard to figure out which version is source of truth. Keeping track of all of the dependencies between different repos seems to be a chore in itself. Merging between individual forks and juggling all the versions is work that can be essentially removed in a single trunk.

The two elephants are zebra and zcash. Zebra is already setup in a monorepo format and it looks great to my eye.

I’m sure someone out there feels strongly that combining these repos is a bad idea or impossible. I’d like to hear those arguments. I can tell you that it is not impossible technically. Some (or even most) developers at ECC and ZF will have plenty of reasons why they like many repos and many folks will probably have a knee-jerk reaction that it’s impossible because of reasons like “ECC controls zcashd and ZF controls zebrad”. But, I ask if this is a real argument against a monorepo and if things need to be the way they are. Why is the separation considered necessary or desirable? Use your imagination and consider what could be possible.

We can talk about decentralization all we want. But, the truth is that the permissions to the canonical repos, the dependency graph and the trust placed in those who are allowed to merge and release is a huge (and probably underrated) part of what governance really is in practice. In a monorepo, these arrangements could be more transparent - encoded in the IAM and merge rules on the single main branch. I feel like right now we have a lot of the downside of centralization but are missing out on the possible benefits of centralization, while some of the efforts at decentralization might be uncharitably characterized as “decentralization theatre”.

There are probably important people who will start with a flat HELL NO and stick to it. But, think about it anyways. I sincerely believe that utilizing a monorepo could boost productivity by an order of magnitude by dispensing with unnecessary coordination between repos and allowing that effort to go into fruitful integration and knowledge sharing across the entire ecosystem.

Maybe this can just be food for thought and stimulate ideas about how we could radically change the Zcash governance and software ecosystem for the better. But, I could help work on this idea for real, if people are interested. It could possibly start out as two-way subtrees between the existing repos and the prospective monorepo. But, the real advantages would only start to accrue if significant contributors really wanted to commit to it.

2 Likes

I do like the idea of more open discussions being possible on GitHub between parties… :thinking:

2 Likes

In my experience working with Zebra I don’t see any benefits of using a monorepo. Rust crates provide a easy way to manage dependencies regardless of which repository they reside.

If we had a lot of C++ code that would make more sense, since C++ dependencies are a pain to manage.

Regarding the other issues of discoverability, I think we could improve that by simply having a index page of sorts mapping the different components of the ecosystem. IMO that would be even better, because opening a gigantic repository with a bunch folders would not bring clarity to most of the Zcash users.

The DAGs do not keep track of dependencies between software components; they keep track of dependencies between tasks (ticket/issues) that have been planned.

5 Likes

I have not heard of something like this. I am not sure I see the benefit, can’t imagine cloning the repo, downloading everything to make a contribution to a small part of one particular project.

However, some sort of “lay of the land” page showing the projects in the ecosystem and their repos, tagging them as active/inactive and whatnot, could be quite useful.

2 Likes

I strongly dislike monorepos. They inevitably result in horrifically coupled code, they’re a pain to merge stuff to when a change in one part of the codebase has to be immediately reflected everywhere else, and they overall inhibit reasoning about parts of a system in isolation and discourage encapsulation, which are critical to security.

Moreover, I think that a monorepo is antithetical to the idea of decentralization. It is vitally important that components be independent and only interoperate in terms of high-level APIs; we’ve got too much coupling in the zcashd codebase as it is, simply because the zcashd wallet and full node share code. It’s a maintenance burden.

It’s hard for me to overstate the degree to which I detest monorepos. Basically, I won’t work that way.

3 Likes

The article you linked to also described various disadvantages:

These costs and trade-offs fall into three categories:

  • Tooling investments for both development and execution;
  • Codebase complexity, including unnecessary dependencies and difficulties with code discovery; and
  • Effort invested in code health.

And this is why I am generally against full monorepoization. It works at Google, because Google is Google, and has the size and resources to dedicate multiple full-time teams solely to developing the tooling and providing the support necessary to maintain the monorepo.

At the kind of scale we are operating at, I prefer (what I will refer to as) “localised monorepoization”, where we take advantage of the benefits of cross-area workspaces where it makes sense, while minimising the interactions between more distant parts through repo separation. As you’ll see below, we are already doing this.

Let’s break these down:

Active repositories

Mobile wallet / SDK repositories

I can’t speak much to the structure of these, other than to say that managing iOS and Android toolchains is a lot of work, and in the past we have generally only had a single developer for each of iOS and Android.

Repos for generated content

Historic repositories


Of these, the repos that I have interact with on a daily basis within the past few months are:

  • zcash/zcash
  • zcash/librustzcash
  • zcash/orchard
  • zcash/halo2
  • zcash/zips (not making changes, but referring to its contents)

Now that zcashd 5.0.0 is out, I expect changes to zcash/orchard to become much less frequent. The others are all localised monorepos.

I agree with the general idea of having the full context for whatever it is that I need to work on, locally available. With the current repository structures, I have that for the large majority of the time, and in the cases I don’t (integrating an API change from a Rust crate into zcashd for example), there is a simple pathway to connecting them (cargo patches).

I don’t know for certain why ZF forked zcash/developers, but I suspect it’s because a) they saw the benefit of the DAG and wanted to tune their local view, and b) I was travelling for conferences and while away accidentally broke the GitHub Pages renderer (because ZenHub does not allow more than one API key per user sigh), and they weren’t using locally-rendered versions for agility like we’ve been doing at ECC. I’d like to resolve this fork at somepoint.

As @conradoplg pointed out above, this is a misunderstanding of the DAG. Even if we used a monorepo, we would still use the DAG. Because the DAG is not about coordinating inter-repo dependencies, but inter-issue dependencies; all a monorepo would do here is change the URLs to the issues.

The DAG is great. All other planning tools are projections of the DAG, mere shadows of its perfection. The DAG is what helped us to finish zcashd 5.0.0. Al̴l̶ ̷h̶à̴̡͍͔i̵̼͒͂l̸̠̱̄ ̵̺̝͆͐ͅẗ̴͔̤̟́͑ȟ̸̺̩͎̪͈͂̑ė̵̖̯̹͙̋̆ ̶̖̙̅̔͑̿D̴̠͇͚̱̫̿̉̒A̵̟̬͂̇̔̍̈G̶̨̹̮͍͓̽̋.̶̧̈́͒̕͠

7 Likes

Hey, thanks a lot for the explanation @str4d! Perfect level of detail to help me understand what’s going on. Thanks for correcting my misunderstanding of the DAG project. It does look really awesome. Al̴l̶ ̷h̶à̴̡͍͔i̵̼͒͂l̸̠̱̄ ̵̺̝͆͐ͅẗ̴͔̤̟́͑ȟ̸̺̩͎̪͈͂̑ė̵̖̯̹͙̋̆ ̶̖̙̅̔͑̿D̴̠͇͚̱̫̿̉̒A̵̟̬͂̇̔̍̈G̶̨̹̮͍͓̽̋.̶̧̈́͒̕͠!!!

I appreciate your time here and your acknowledgment of the benefits and trade-offs and not just saying “I hate monorepos” :smiley: . I like your idea of “localized monorepos”. Fewer is better IMHO. If the same small group of people has to bump tags in repo A to bump the tag and rebuild repo B so that repo C can bump the tags/revision hashes (all manually checked and reviewed by the same people in all the repos, commits just to move version pins …), there might be a case to bring A and B into C - while still keeping A and B as separate as desired and still granularly publishing as many small composable packages as makes sense. Paradoxically, it’s easier to publish more, smaller packages from a single repo because the overhead of the extra repo is replaced by an extra directory.

I don’t want to belabor this thread or start a holy war. But, I would like to respond to a few things as food for thought.

This is confusing statement for me @conradoplg. You made this single PR that bumped the external versions for ~12 packages with a single unified changelog. Without a monorepo, I think this task would have been about 12X more work? In a larger monorepo you might not have had to bump these versions at all - 1.0.0-beta.9? Since everything would be integrated in trunk, you might be able to hold off on formally publishing intermediate release candidates to yourself.

This may be just a difference in preference. But, I imagine pulling this imaginary repo and having a devcontainer with all of the pinned versions you need to do anything - develop zebra, zcashd, zecwallet-light, build the rpc docs locally, have the zips conveniently onhand, the zebra book, the zcash book, build and start lightwalletd, start a new website or wallet, start an interactive python session that can idiomatically speak to a local RPC interface with little or no setup - the host machine only needs something that can run the devcontainer (docker).

cd contrib/ts/websites/zecpages
npm install; npm start
cd zcash/zcash
./zcutil/build.sh -j$(nproc)
cd zcash/rpc
./build.sh

The thought of it makes me happy.

A jump-off page for developers new to the ecosystem could help some, or maybe even better a “metarepo” with sub{modules|trees|repos} could provide some of the power/convenience of a monorepo but also allow 2-way push/pull with the many repos so that developers who hate monorepo could keep their normal many-repo workflows. Metarepo could be cool but it’s more complex than a straight monorepo and you don’t get the true atomic changes across multiple packages and consequently you don’t get as much help with the version-bump dependency hassle of many repos. I think I noticed a bit of this type of hassle yesterday? Discord

Ah @nuttycom. I have so much respect for you, I’m sorry you feel this way :laughing:. You are much more important to the code in question by about \infty, so, I guess that settles it! But, I’m going to respond just in case!

I don’t agree. Monorepos allow over-dependence at the source level but that’s not inevitable. In a mature system at scale, owners of different code paths will declare what is “public” and allowed to be depended on by other packages/apps. Monorepos also allow much more modular granularity. A monorepo with 1000 small components independently published is no big deal compared to 1000 interdependent repos with version pins. Further, having some UI components down/one/path and some rust crates down/another - these things would be hard to couple together even if you tried! But, if you have a server in one language and a client in another - you want these things to always work together. Having them in the same repo opens possibilities of code generation and integration that are harder to cultivate across separate repos.

If a change to an upstream dependency breaks something downstream, I’d rather know before merging it than have to find out when the version pin is eventually tried in some dependent-yet-unintegrated downstream repo. The trunk model does require some commitment to staying close to trunk and continuously taking in changes from the main branch. A large, long-running, stale branch can be a pain to merge indeed. But, this can also provide information that the changes are too disruptive and too much work - when everything is “independent” (dependent, yet unintegrated), you have less information about how much downstream pain aggressive changes will cause.

Monorepos give you flexibility to make a spaghetti monolith; but they don’t make you or discourage anything. In fact, you are much more free to break things down into smaller components. You start a package and put stuff into it and later you realize it should be 5 packages so you break it into 5 directories atomically without breaking any other dependents in the repo. But, then you decide it’s more elegant as 2 packages instead of 5. So, you make the change and change all dependents in another single atomic PR. With many repos, you would just keep the 1 package that you initially decided on because creating 5 isolated and encapsulated repos would be too much hassle with all the additional version pins to bump everywhere.

This I think I have to agree with - in theory. I am basically advocating for centralization! Or, rather advocating that we take more advantage of the centralization that is already intrinsic to Zcash governance. On paper, multiple repos look more decentralized. But, in practice, it’s the permissions at the org level that centralize Zcash - not the number of repos in the orgs. Those permissions are actually mysterious and opaque right now. I’m pretty sure there is no way to see who has what rights in which repo for common zcashers who are not in the central circle. In a monorepo, where potentially many people/teams/orgs interact with the code, the definitions of the merge rules would have to be defined publicly - anything in zcash/zcash would need a review by teamX consisting of {members}, any changes in websites/zecpages would need to be approved by teamY. This would force a kind of transparency that is not available today to my knowledge. Today, who exactly can merge to the master branch of zcash/zcash and by what rules - what governance?

As I think about it, I actually have an even worse idea :joy: - all code funded by the dev fund should go into the same centralized monorepo! Of course, decentralization is an overall goal of the project. So, how do I reconcile this? Well, there are still branches and forks and we still publish packages, crates, etc (granularly and MIT-licensed) so no freedom or flexibility is lost. People are still allowed to deviate from the centrally controlled integration and go their own way. Right now, for the most part, it’s the same small teams controlling all of this code. So, there is not much decentralization at this stage, regardless of how many repos.

Say these components were more decoupled. I’m curious how you resolve this dependency issue - would you have a copy of the same code in both places or would you have a shared library? In my world, we would pull out the shared stuff into well-defined and separately tested foundational components that multiple different things could depend on.


@str4d - thanks again for the high-resolution explanation of the individual repos. I can see that your approach overall makes a lot of sense given the current context. Sounds like it’s a moot point anyways. I achieved an “I’m muting this thread now” from @dconnolly which was liked by @zooko. I think between that and the “I detest monorepos” from @nuttycom it’s pretty much settled :joy:.

I would still love to have a “whole world at your fingertips” workflow with monorepo and devcontainer… Maybe I’ll begin it someday anyways starting with sub{repos|modules|trees} … a devcontainer with submodules could be a cool start …

Does any one of these repos have a devcontainer with tools installed or do most people install various versions of things on their host machines? (eg rust, python, go, npm …) … or?

I guess for now I’ll mind my own business and go work on my own beloved monorepos.

1 Like

Honestly I never thought about the Zebra repo as a monorepo since it’s all Zebra to me, but of course, it actually contains multiple crates. You’re right in pointing out that this makes some tasks much easier!

But I still don’t think including the entire Zcash ecosystem in a monorepo would be beneficial, I guess I also favor the “localized monorepos” that @str4d explained much better than I’d could.

2 Likes

Maybe a less fraught word for the zebra repo would be something like “multipackage repo” - but, to me, it’s an example of a monorepo. All of the best software is in a monorepo :wink:

Curious if anyone here can point out a counterexample for the following claim: “Every programming language is in a monorepo”.

Think about it - why would you want version A of one part of the language and version B of another part of the language? Interested to hear if anyone has a counter though.

One way in which the zcash/zcash monorepo is already incredibly problematic is build times - it takes over an hour for a single CI run. zcash/librustzcash is, as @str4d mentioned, also monorepo-ish, and its build times are in the tens of minutes. Having separate repositories means that you don’t have to run what amount to full, cross-module integration tests on every PR. TBH, this is a reason that I think we should fragment our repositories more than we already do; we should start by moving everything in zcash/librustzcash that is depended upon by zcash/orchard out into its own repository, so that we no longer have a cyclic dependency between repositories.

That cyclic dependency arose, of course, because zcash/librustzcash is a monorepo. But it absolutely makes things like patch version dependencies harder to manage.

1 Like

Thanks for the concrete problem to ponder. In the unlikely future where everything is in a giant polyglot monorepo, a tool like bazel would absolutely be needed to only run tests that need to be run for each PR, cache results and potentially run tests with massive parallelization. You’re right that just pulling everything together and running all the tests for every PR wouldn’t work. BUT, OTOH, you do want to know if your changes to a dependency breaks a dependent. Cutting the line so that you can change the dependency and simply not know that you broke the dependent is not ideal either. “If you’re not using a monorepo, you’re not doing continuous integration, you’re doing frequent integration at best”.

I like a challenge; so, I’m going to take a look at the existing zcash/zcash and try to understand why the tests take so long and what might be able to be done to speed them up where they are.

I’m looking at Development Guidelines — Zcash Documentation 4.6.0 documentation

I notice the URL says latest but the embed says 4.6.0. I’m working off of the zcash/zcash:master though - has anything changed? One thing I notice is that qa/pull-tester/rpc-tests.sh doesn’t exist. I guess at some point this was changed to qa/pull-tester/rpc-tests.py. The first thing I would like to change here is the development_guidelines.html to get them up-to-date. But, unfortunately … the source of development_guidelines.html is not in zcash/zcash. Spending a few minutes to find where these docs come from, it appears from https://github.com/search?q=%22Add+unit+tests+for+Zcash%22&type=Code that … the canonical reference for Development Guidelines — Zcash Documentation 4.6.0 documentation … is … zcash/development_guidelines.rst at 811fcdbeed394a0117dcb02e86aba0be91d30981 · AngeloSegreto/zcash · GitHub

BUT, when I check out this repo, it looks like the latest there is 4.2.0 - zcash/conf.py at a4b2c9ec383a71966aa56bd3ffcd3c14ef75f426 · AngeloSegreto/zcash · GitHub

So, after about 15 minutes I can’t find the source to https://zcash.readthedocs.io/en/latest/rtd_pages/development_guidelines.html to improve it.

This is a microcosm, case-in-point to the barriers and problems with many repos. In the monorepo case, everything can be atomically bumped together and artifacts like the RTD website can be pushed out together so things don’t come out of sync. With a monorepo, I could use regular unix tools or my editor to find the source for the RTD site offline instead of flailing around with the search on GitHub …

But, anyways. Do you have any hints on what are the main reasons why the tests take so long and what a newb might start looking at to find relatively low-hanging fruit for making them faster?

Ha, maybe I’m just an idiot ;9 “Edit on gitlab” … still :smiley:

For the size of everything in the Zcash ecosystem, probably less than a gigabyte total, I’d personally be happy to clone it all and be able to work offline since I don’t have super good internet all of the time. But, for extremely large repos (all of Zcash ecosystem wouldn’t probably be considered extremely large), Microsoft has done some good work since all of Windows is in a single git repo. The initial clone doesn’t require pulling everything.

I promise I’m not cherry-picking things or trying to start trouble. But, this is the exact kind of hassle I’m talking about.

The zcashd docs, wherever they get pushed to -RTD, pdf, github-pages, etc - should really just live in zcash/zcash and be pushed out as an artifact IMHO.

What is the use of having two repos with the same docs diverging?

This sounds nice

We’re absolutely aware that the readthedocs.io site information is out of date. And, ironically, this is one place where a “monorepo” approach makes a lot of sense to me - we’re hoping to move all this documentation back into the zcashd book so that it can more easily be kept up-to-date as the source code changes.

This documentation all originally lived in the zcash/zcash repository; a previous generation of ECC developer relations folks thought that it would be easier to maintain outside of the repository, but the overhead of having a separate repo, combined with the departure of the folks who made those changes and the ECC core protocol team being buried in NU5 development made for some rot in those docs. Suffice it to say that we’re working on it, and community PRs to aid this process would be greatly appreciated. Using mdbook documentation has been really effective for us in the past year; for example, the halo2 book has stayed up to date through this period.

3 Likes

There was one previous effort I’m aware of to create a Bazel alternative to the zcashd build system: Port build system to Bazel · Issue #2811 · zcash/zcash · GitHub. The problem, as usual, is how to allocate development resources - while using Bazel is appealing for a number of reasons, none of the ECC core team who would be responsible for maintaining this system have Bazel experience and we’ve been too busy working on the protocol to consider switching. All of our existing infrastructure is built based upon the upstream Bitcoin build systems, and there’s a ton of specialized knowledge involved in maintaining that (the depends system in particular). It’s just not clear how to get from here to there with all of the higher-priority work that needs to be done.

1 Like

Cool! I’ll try to find time to pick up on that work if people would still be potentially interested.

I threw out the zany big idea (that all code from zebra to zcashd to zecpages to free2z to ZWL … all get merged into the biggest, baddest repo) more as a thought experiment than a practical call-to-action on what the top priority should be immediately. I know it’s 1000s of hours and there are other priorities. BUT, I like the idea of taking small practical steps forward with some of the advantages of monorepo in mind. For example, I love the idea of putting the docs that describe how to build zcashd in the same repo as the code/scripts that those docs reference.

I also really like bazel after using it at a high level for a few years and am pretty excited to see how far someone got with it a few years ago. But, also a little sad that it didn’t make it in. That was a lot of work!

Any indication of why it didn’t make it? I guess other things were moving so fast that it was hard to keep such a large PR fresh? Add Bazel build system (Linux x86-64 only for now) by per-gron · Pull Request #2891 · zcash/zcash · GitHub

It’s hard to estimate how much would be gained in terms of test time using bazel. But, it’s not hard to imagine cutting 90% or more from the CI run time for most runs. Did per-gron or anyone else report any comparison of running the tests with bazel instead of the existing toolchain?

I have a little kanban board here to keep notes.

I’m going to be concentrating on free2z feature work and stuff mostly in the coming months. But, I’m stoked to try to work on some of these ideas at some point. I kinda’ want to open source free2z and put it in the monorepo :smiley: . I have some psychological deficit where having more than one repo, more than one editor window open, really bums me out :slight_smile:. I want to just pull the world and go from package to package, directory to directory without any kind of repo friction.

I think the recent zcashd → lightwalletd → zecwallet lite problem illustrates how important integration is. In theory these things should be decoupled and independent. But, in practice, the final integration of all of the important components in the ecosystem is what matters most.

This is not a criticism of all of the hard work that everyone is doing. I’m super impressed by everything that is going on and I’m basically just dreaming over here without having made any significant contributions. But, I still envision a potential future where we are able to integrate all of these important components together to find downstream problems and collaborate on solutions at much earlier points in the lifecyle.