Ziggurat: the Zcash Network Stability Framework

A Ziggurat is a rectangular stepped tower that uses precise measurements to ensure that each of the foundational platforms can support the layers above.

Based on Equilibrium’s experience stress testing projects such as Rust IPFS and Aleo, this metaphor can be applied to network testing by defining three layers, each one building upon the next.

Ziggurat will start by testing conformance, making sure that each tested node adheres to the network protocol. With only zcashd this was perhaps simple but with zebra coming out, enforcing a specification that a bona fide Zcash node must satisfy becomes critical. Once foundation is established, Ziggurat will stress test performance using an arbitrary number of test nodes in various topologies. Finally, it tests resistance to bad actors by simulating malicious behavior.

Read on, we’ll be happy to answer any questions (likely on EU time)

9 Likes

Just a quick message to say THANK YOU to ECC, ZOMG, and the community for funding our proposal. We’re really looking forward to engaging with people as we move through the process of enriching the Zcash network.

7 Likes

Hey folks! :wave:

We believe we have reached the successful end of Milestone 1, and thus I’d like to post an update on the Ziggurat project.

First, we inspected the code and runtime behavior of both zebra and zcashd, and then we made a draft proposal of a series of tests to batter the nodes with. After initial review and feedback from the core devs, are proud to announce that we have we made the eqlabs/ziggurat public, and that you can now read the first draft of the Ziggurat spec.

So far we have:

  • 17 conformance tests
  • 2 performance tests
  • 6 resistance tests

There’s still plenty of potential for additional tests, and plenty of work to do even beyond implementation. There are also still a few open questions that we would now like to ask the community’s feedback on, particularly those with experience running zcashd and zebra:

  • Any particular considerations or requirements around node setup and teardown (CI, caches, test data or preloaded state)?
  • Any notable known differences between Zcashd and Zebra nodes, especially with respect to the network protocol or assumptions connected to it?
  • Any relevant complex peering or sync cases to give particular attention to?
  • Any particular malicious angles, design compromises or potential problem areas in need of extra test coverage
  • If you do have experience running a node, how do you think we should define “reasonable load” and “heavy load” for load testing? This can be in terms of number of peers, message frequency, or any metric you select.

We would love to hear from people out in the wild, and use your experiences to inform our future work.

Thank you so much, grantors and community alike.

4 Likes

Thank you for the update!

Can you elaborate on where the feedback/conversation loop with the ECC and ZFND is taking place so ZOMG can better follow along?

Sure, it’s mostly in the #testing channel of the ZCash Dev discord.

1 Like

can you please add some details on what preloaded state means ?

let’s say I want to speed up the initial sync and I already have a “trusted” copy of the full chain, that I can import ( think S3 or on a different machine ). Are you considering to add a test for validating the state ?

1 Like

Hi vamsi, thank you for the question. With preloaded state, we are referring to testing a node which isn’t starting from scratch with its chain state. This could be useful to test block propagation and more broadly the chain syncing mechanism. We haven’t currently planned a test specifically for validating a full chain’s state beyond a few blocks, though we’re open to suggestions regarding more complex scenarios. The one you mention could definitely be envisaged (assuming a full chain is available—this should also probably be capped in size to avoid overly long running tests).

2 Likes

Hi aphelionz,

Welcome! Total Rust newb here, but this looks like an exciting project! :nerd_face: :crab: :zcash: :zebra:

Wanted to pass along some information in regards to your questions.

“Any particular considerations or requirements around node setup and teardown (CI, caches, test data or preloaded state)?”

CI
If you intend to use Docker for CI related tasks, Docker Hub (any image named zcashd-build-* will build for the default linux host) has all the builders for various platforms. Note these do not have Python layers to run RPC tests, but we will be adding these soon as zcashd-worker-*. Most platforms this is trivial to add but to avoid issues on older platforms, I typically recommend folks to use zcashd Ubuntu 20.04 image(Docker Hub).

Caches
It is strongly encouraged to cache output from fetch-params.sh into something you operate as the default mirror is rate limited. We have used a few options to cache these and other artifacts, but IPFS has worked well for this depending on the requirements.

Test Data or Preloaded State
In general, I never preserve cache between tests and the majority of the underlying scripts “should” gracefully clean this up for you. However there are ways to disable this if you want to archive test cache for other purposes.

Preloading the node with a given chain can save a TON of time, depending on your ISP and system hardware. It is recommended to cache these in something you operate/manage per your requirements. Also, depending on your test requirements, it is generally recommended to have two chain copies per network. For example on mainnet, have a chain that is built from a node without txindex=1 in zcash.conf, and another with txindex=1 in zcash.conf. This allows the operator to not have to reindex/rescan for preloaded nodes that need chain meta from txindex=1. When generating these chain snapshots from blocks and chainstate it is important to ensure the node is completely stopped. Otherwise, you can risk corrupting these snapshots.

Any notable known differences between Zcashd and Zebra nodes, especially with respect to the network protocol or assumptions connected to it?

I’m not familiar enough with zebrad or Rust to speak to this.

Any relevant complex peering or sync cases to give particular attention to?

Operating nodes on the expected best chain generally is fairly straight forward. There are some minor issues with operating nodes with Tor configurations. If you intend to spin up your own testnets operating N nodes, there is a whole other layer of cases to consider.

Any particular malicious angles, design compromises or potential problem areas in need of extra test coverage

I can’t speak to specific malicious angles or design compromises. Other core devs and/or security folks could provide this information potentially. We are slowly getting the majority of the pieces together to finish up the last mile of longer running tests, but we have yet to overlap all the items mentioned above with zebrad, so this is uncharted waters.

A couple of tools that can aid tremendously for the scope of work you mentioned:

If you do have experience running a node, how do you think we should define “reasonable load” and “heavy load” for load testing? This can be in terms of number of peers, message frequency, or any metric you select.

I typically baseline this with a default zcash.conf on a 2CPU 8GB system, as this is the minimum hardware needed to build/run a zcash node. From there you can start to model some of the bounds based upon the test criteria to better understand “reasonable”, “average”, and “heavy” load. Then it is clearer to model the given load per some system/network config as it scales up or down. Also helps to isolate peering/network issues that may come up in the wild with these nodes, if they aren’t in an isolated environment.

Please let us know if you have any other questions. For whatever reason my Discord is not functioning, so I am unable to message in that portal :frowning_face:

1 Like