Welcome! Total Rust newb here, but this looks like an exciting project!
Wanted to pass along some information in regards to your questions.
“Any particular considerations or requirements around node setup and teardown (CI, caches, test data or preloaded state)?”
If you intend to use Docker for CI related tasks, Docker Hub (any image named
zcashd-build-* will build for the default linux host) has all the builders for various platforms. Note these do not have Python layers to run RPC tests, but we will be adding these soon as
zcashd-worker-*. Most platforms this is trivial to add but to avoid issues on older platforms, I typically recommend folks to use zcashd Ubuntu 20.04 image(Docker Hub).
It is strongly encouraged to cache output from
fetch-params.sh into something you operate as the default mirror is rate limited. We have used a few options to cache these and other artifacts, but IPFS has worked well for this depending on the requirements.
Test Data or Preloaded State
In general, I never preserve cache between tests and the majority of the underlying scripts “should” gracefully clean this up for you. However there are ways to disable this if you want to archive test cache for other purposes.
Preloading the node with a given chain can save a TON of time, depending on your ISP and system hardware. It is recommended to cache these in something you operate/manage per your requirements. Also, depending on your test requirements, it is generally recommended to have two chain copies per network. For example on mainnet, have a chain that is built from a node without
txindex=1 in zcash.conf, and another with
txindex=1 in zcash.conf. This allows the operator to not have to reindex/rescan for preloaded nodes that need chain meta from
txindex=1. When generating these chain snapshots from
chainstate it is important to ensure the node is completely stopped. Otherwise, you can risk corrupting these snapshots.
Any notable known differences between Zcashd and Zebra nodes, especially with respect to the network protocol or assumptions connected to it?
I’m not familiar enough with zebrad or Rust to speak to this.
Any relevant complex peering or sync cases to give particular attention to?
Operating nodes on the expected best chain generally is fairly straight forward. There are some minor issues with operating nodes with Tor configurations. If you intend to spin up your own testnets operating N nodes, there is a whole other layer of cases to consider.
Any particular malicious angles, design compromises or potential problem areas in need of extra test coverage
I can’t speak to specific malicious angles or design compromises. Other core devs and/or security folks could provide this information potentially. We are slowly getting the majority of the pieces together to finish up the last mile of longer running tests, but we have yet to overlap all the items mentioned above with zebrad, so this is uncharted waters.
A couple of tools that can aid tremendously for the scope of work you mentioned:
If you do have experience running a node, how do you think we should define “reasonable load” and “heavy load” for load testing? This can be in terms of number of peers, message frequency, or any metric you select.
I typically baseline this with a default zcash.conf on a 2CPU 8GB system, as this is the minimum hardware needed to build/run a zcash node. From there you can start to model some of the bounds based upon the test criteria to better understand “reasonable”, “average”, and “heavy” load. Then it is clearer to model the given load per some system/network config as it scales up or down. Also helps to isolate peering/network issues that may come up in the wild with these nodes, if they aren’t in an isolated environment.
Please let us know if you have any other questions. For whatever reason my Discord is not functioning, so I am unable to message in that portal