Authors: Evan Forbes, Dev Ojha (@ValarDragon)
Valar Group
Summary
A key question when choosing PoW block times is what happens with stale block rates, and fork/re-org rates. Lower block times improve UX of users and market makers, giving them faster confirmation times for small size transactions. However, lower block times increase the stale block rate as block propagation delay takes a larger percentage of block time. We want to understand how stale block rates perform at very decentralized, zebra-only networks, at a block time of 25s.
We empirically measure block propagation delay, block times, stale-heights and re-org rates in experiments using 100 geographically distributed Zebra nodes. The nodes are split across 19 regions, including US, Europe, India, Australia, and Singapore. They are also split across cloud providers. Hashpower is evenly distributed across these nodes. With properly configured TCP connections, the experiment falls within the safe operating range measured here: a sub-5% reach-based expected stale-block rate, a sub-5% observed stale-height rate, and a sub-0.5% observed higher-cumulative-work branch-switch rate. This keeps us at roughly the same stale rate ETH POW had in mainnet, at their 12.5 second blocks. A mainnet comparison is consistent with the expectation that mainnet should have somewhat lower stale and fork rates than these deliberately decentralized devnets.
In tandem we build a theoretical model for how to estimate stale rate, and validate it with the experimental data.
Stale-block rates and fork/reorg rates can be modeled from the time it takes a new block to propagate through the network. In an experiment with 100 geographically distributed Zebra nodes, measured block propagation, block times, stale-height events, and higher-cumulative-work branch switches all remained in the expected safe operating range: a sub-5% expected stale-block rate, a sub-5% observed stale-height rate, and a sub-0.5% observed higher-cumulative-work branch-switch rate.
This leads us to conclude that NU7 is safe to decrease the target spacing to 25 seconds pending a large portion of the network adjusting their TCP configurations.
Theory
Note: This model is a baseline for honest propagation-induced stale blocks. It assumes miners publish blocks when found, mine the best tip they currently know, and that the target block rate is approximately stationary over the time window of interest.
Core Model
The core quantity is tau_eff, the network’s work-weighted old-tip mining time after a new block is found. “Target spacing” is the average block time as defined in Zebra":
expected stale blocks per accepted block = tau_eff / target spacing
or:
phi = tau_eff / T
where:
-
Tis the target spacing -
phiis the expected stale blocks per accepted block -
tau_effis measured in old-tip work-seconds
For discrete miners, if miner i has work share w_i and receives and verifies the block after d_i seconds:
tau_eff = sum_i(w_i * d_i)
A miner with 10% of expected block-producing work that keeps mining the old tip for 8s adds 0.8s to tau_eff. A miner with 30% of expected block-producing work that keeps mining the old tip for 2.5s adds 0.75s. Add those weighted times and divide by target spacing to get expected stale blocks per accepted block.
tau_eff_intuition2880Ă—1392 178 KB
Equivalently, let u(t) be the fraction of total work still mining the old tip t seconds after a new block is found. Then:
tau_eff = integral_0^inf u(t) dt
So tau_eff compresses the full propagation curve into one effective old-tip mining time. The model is linear up to this point: if one miner keeps mining the old tip twice as long, that miner’s contribution to tau_eff and phi doubles.
What The Model Predicts
Under the standard Poisson approximation, if S is the number of stale competing blocks caused by one accepted block, then S ~ Poisson(phi). This gives the following quantities:
| quantity | formula |
|---|---|
| expected stale blocks per accepted block | E[S] = phi = tau_eff / T |
| probability of at least one stale competing block | P[S >= 1] = 1 - e^(-phi) |
| probability of exactly one stale competing block | P[S = 1] = e^(-phi) * phi |
| probability of two or more stale competing blocks | P[S >= 2] = 1 - e^(-phi) * (1 + phi) |
Throughout, stale rate means expected stale blocks per accepted block: E[S] = phi. The probability of one or more stale competing blocks is a different metric. The two are close when phi is small because 1 - e^(-phi) ~= phi, but P[S >= 1] is bounded by 1 while E[S] continues to grow linearly.
stale_rate_probability_model2304Ă—1344 206 KB
Fork And Reorg Boundary
A stale block is a valid discovered block that does not end up on the eventual canonical chain. In the simplest propagation race:
H
|-- A1
|-- B1
A1 and B1 are competing siblings. Whichever branch later loses leaves the other block stale.
A fork is the temporary state where different parts of the network are mining different valid tips. A fork can produce stale blocks, but a stale block is an outcome, while a fork is the competing-branch state that exists before the network converges. In the experimental section below, we call the observable fraction of canonical heights with at least one competing non-canonical block the stale-height rate. That is the binary event P[S >= 1].
A reorg happens when a node switches from the branch it had accepted to a different branch with more cumulative work. With roughly constant difficulty, “more cumulative work” is approximately “more blocks.” For example:
H
|-- A1
|-- B1 -- B2
A node that had accepted A1 will switch to B2 once it learns the B1 -> B2 branch. That creates a one-block reorg for that node, because A1 is removed from its active chain. It is a two-block competing branch, but not a two-block reorg.
So modeling reorg depth requires more than phi = tau_eff / T. The stale-block model only counts sibling blocks found during the old-tip mining window. A fork or reorg model must track the branch race after the sibling exists: which miners know about each branch, which tip they are mining, pairwise propagation delays, tie-breaking, work shares, and cumulative-work selection. For honest propagation-induced forks, the natural next model is an event-driven network simulation. The phi model remains the baseline input for how often the first competing sibling appears; the fork/reorg model adds the branch race that follows.
Block-Time Expectations
For block times themselves, PoW block discovery is modeled as a Poisson process, so the waiting time B until the next block is exponential:
P[B <= t] = 1 - e^(-t / T)
With target spacing T, healthy PoW block times have:
-
E[B] = T -
stddev(B) = T -
median(B) = T * ln(2) ~= 0.693 * T
So a healthy chain targeting 25s blocks should have a median block time around 17.3s. A median below the target spacing is normal; it is a consequence of exponential waiting times, not evidence that blocks are arriving too quickly.
block_time_and_stale_expectations2880Ă—1296 202 KB
Reach-Based Proxy
The preferred input is per-miner propagation data, because it estimates tau_eff directly:
[{time_seconds: d_i, work_share: w_i}, ...]
where each entry says that miners with work share w_i kept mining the old tip for d_i seconds. The companion model.py includes stale_rate_expectation_from_propagation_points(...) for that form. It normalizes work_share, so callers may pass fractions, percentages, or expected-work weights.
When per-miner propagation data is unavailable, we can use node-level block distribution measurements as a fallback proxy. Here reach_90 means the time it takes for a block to be distributed to 90% of measured network nodes. For example, reach_90 = 2.5s says that 90% of measured nodes had received the block by 2.5s. It does not say that 90% of block-producing work had received it, and it does not mean all work kept mining the old tip for 2.5s.
Let D(t) be the fraction of measured nodes that have received the block by time t. Then 1 - D(t) is the fraction of measured nodes that do not yet have the block. Under the proxy assumption that measured nodes are representative of where block-producing work receives and validates blocks:
tau_eff_proxy ~= integral_0^inf (1 - D(t)) dt
and therefore:
expected stale blocks per accepted block ~= tau_eff_proxy / T
This is the same area-under-the-curve idea as tau_eff = integral u(t) dt, but with node-weighted distribution data used as the fallback input.
reach_proxy_tau_eff1824Ă—1296 202 KB
If only reach_90 = r_90 is available, tau_eff is not determined by that single number, so we need a shape assumption. The headline examples below assume the first 90% of measured nodes receive the block evenly between 0 and r_90. In that case D(t) rises linearly from 0 to 0.90, so 1 - D(t) falls linearly from 1.00 to 0.10. The area through r_90 is:
((1.00 + 0.10) / 2) * r_90 = 0.55 * r_90
If the remaining 10% receives the block almost immediately after r_90, then tau_eff_proxy ~= 0.55 * r_90. If those nodes remain without the block longer, add the tail area:
tau_eff_proxy ~= 0.55 * r_90 + tail_area
Different tail assumptions give different stale-rate estimates. That is why reach_90 should produce a range, not one definitive stale-rate number.
The headline examples use target spacing T = 25s, assume measured nodes are a reasonable proxy for block-producing work, assume linear distribution from 0% at 0s to 90% at reach_90, and assume no material tail beyond reach_90:
reach_90 |
tau_eff_proxy ~= 0.55 * reach_90 |
expected stale rate |
|---|---|---|
2.5s |
1.375s |
5.50% |
2.0s |
1.100s |
4.40% |
1.0s |
0.550s |
2.20% |
Experimental Results
To compare a Zebra network against the above theory, we ran a 100-miner Zebra-only PoW network. Each node:
-
ran a single CPU miner
-
8 vCPU
-
16 GB RAM
-
1-3 Gbps connection
-
geographically distributed over 19 regions, including Australia, Singapore, India, Europe, and the US
-
modified TCP parameters per feature: add warnings and script to configure tcp for greatly improved full block propagation · Issue #10511 · ZcashFoundation/zebra · GitHub
The propagation sample uses blocks of at least 1MiB. The stale-height, stale-block, block-time, and branch-switch analysis uses all observed canonical heights in the stabilized window.
For this comparison, the analysis uses canonical heights 400 through 1468 after waiting for block times to stabilize, with target spacing T = 25s.
experiment_block_propagation1500Ă—930 64.1 KB
The mean time to reach 90% of measured nodes was 1.48s, with median 1.44s. Using the simple reach proxy:
tau_eff_proxy ~= 0.55 * reach_90
the mean effective old-tip mining time is:
tau_eff_proxy ~= 0.55 * 1.48s ~= 0.82s
With T = 25s, that implies:
phi_proxy ~= 0.82 / 25 ~= 3.26%
and the corresponding probability of one or more stale competing blocks is:
1 - e^(-phi_proxy) ~= 3.21%
This gives two theory-side quantities to compare against the trace:
-
phi_proxy ~= 3.26%, the expected stale blocks per accepted block -
1 - e^(-phi_proxy) ~= 3.21%, the expected fraction of heights with at least one competing stale block
The experiment saw 52 heights with at least one competing non-canonical block out of 1069 observed canonical heights:
52 / 1069 ~= 4.86%
There were 52 total non-canonical competing blocks, so the observed stale blocks per accepted block were:
52 / 1069 ~= 4.86%
So the reach-based theory proxy underestimates the observed stale-height event rate in this run, but both the proxy and the observed rate remain below 5%:
| metric | theory/proxy | observed |
|---|---|---|
| block-time mean (seconds) | 25.0 |
26.2 |
| block-time median (seconds) | 17.3 |
18.0 |
| stale blocks per accepted block | 3.26% |
4.86% |
| higher-cumulative-work branch-switch events | 0.25% |
0.37% |
In count terms, the proxy predicts about 34.3 stale-height events and about 34.9 stale blocks over 1069 heights. The trace observed 52 stale-height events and 52 stale blocks. The reach proxy is intentionally simple: it uses one percentile, assumes no material tail after 90% reach, and treats measured nodes as a proxy for block-producing work. The stronger conclusion is that the propagation-based estimate and observed stale behavior remain in the same sub-5% operating range.
For the higher-cumulative-work branch-switch row, split the estimate into the probability of an initial competing sibling and the conditional branch-race outcome:
P[cumulative-work branch switch] ~= P[B1 exists] * P[B2 wins | B1 exists]
The reach-based stale model supplies the first term:
P[B1 exists] ~= 1 - e^(-phi_proxy) ~= 3.21%
The trace supplies the branch-race term. Among the 52 stale-height events, 4 produced a switch to a higher-cumulative-work, higher-height branch, so:
P[B2 wins | B1 exists] ~= 4 / 52 ~= 7.69%
If we use stale blocks rather than stale heights as the denominator, the branch race term is also 4 / 52 ~= 7.69%, because every stale height in this trace had one competing stale block.
That gives:
P[cumulative-work branch switch] ~= 3.21% * 7.69% ~= 0.25%
Over 1069 canonical heights, the model therefore expects about:
1069 * 0.25% ~= 2.63 higher-cumulative-work branch-switch events
The experiment observed 4, or:
4 / 1069 ~= 0.37%
Using the observed stale-height rate for the first term instead of the reach-based proxy gives 4.86% * 7.69% ~= 0.37%, exactly four events over the window. That is a trace-conditioned check, not an independent prediction, but it shows that the observed higher-cumulative-work switch events are consistent with the branch-race model.
The difference between 3.26% and 4.86% is expected directionally: the reach_90 shortcut estimates the old-tip work area from one percentile, assumes no material tail after 90% reach, and uses only the subset of blocks with reach-90 propagation data. It also treats measured nodes as a proxy for block-producing work. The trace includes the full branch outcome over the observed height window.
The block-time data is also consistent with a Poisson process targeting 25s. The mean inter-block time was 26.2s, while the median was 18.0s, close to the expected healthy median of 25 * ln(2) ~= 17.3s.
experiment_block_times2100Ă—750 120 KB
The direct stale-height view is the count of competing blocks at each height. Each stale height in this trace had one competing block, so the stale-block count (52) is equal to the number of heights with a stale block (52).
experiment_competing_blocks_per_height1575Ă—720 18.4 KB
The stale-height-rate plot below separates the aggregate observed stale-height rate from the reach-based approximation. It does not use the rolling-window line from the earlier combined script. The orange line is the per-block approximation of P[S >= 1] from each block’s reach_90; the red line is the observed experiment-wide stale-height rate.
experiment_stale_height_rates1650Ă—780 110 KB
The trace-level fork-switch data adds more detail. It contained 56 unique switch episodes: 52 equal-work, same-height switches, and 4 higher_work_higher_height episodes. These higher-cumulative-work episodes switched from heights 408 to 409, 936 to 937, 1266 to 1267, and 1335 to 1336. These are the events counted in the 0.37% observed higher-cumulative-work branch-switch rate above.
Mainnet Comparison
When comparing runs with similar block sizes and block times, mainnet appears to have lower stale and fork rates than the devnets. Comparable devnet runs show stale-block rates around 1%, while recent mainnet observations are closer to 0.1%.
| environment | observed stale/fork rate |
|---|---|
| comparable devnet runs | ~1% |
| mainnet | ~0.1% |
There are two likely reasons for the difference.
Network Topology: Mainnet’s high mining concentration (<20 pools) minimizes propagation latency compared to the intentionally fragmented devnet. This aligns with observed devnet trends where reduced miner counts lower the fork rate by decreasing total old-tip mining time.
Measurement Gap: Devnet has 100% observability, whereas Mainnet measurement is subject to selection bias. Because Zebra nodes only gossip the winning tip, stale blocks have a approx 50% chance of remaining invisible to any single observation point. Observed Mainnet fork rates are therefore a lower bound, despite being lower than devnet results.
This likely means that we can likely expect mainnet to have a lower fork rate than our “worst case” experiments above.
Conclusion
The main result is that the propagation model and the experiment agree at the scale that matters for safety. With the modified TCP configuration, the expected stale-block rate was 3.26%, the observed stale-height rate was 4.86%, and the observed higher-cumulative-work branch-switch rate was 0.37%. Mainnet shows a meaningfully lower stale block rate than expected, however some of this could be due to a more centralized network and the inability to measure every single stale block without having access to every miner.
Most importantly, even in the worst case with blocks full of orchard transactions and a highly geographically distributed network, we observe stale block rates lower than 5%, suggesting that it is safe to move forward with a block reduction pending network wide tcp configuration changes.
Reproducing the Plots
The experiment data referenced in this report can be downloaded from this Google Drive folder.
Run the commands below from the repository root:
python3 archive/plot_tau_eff_intuition.py
python3 archive/plot_stale_rate_explainers.py
python3 plot_experiment_results.py
The scripts require Python with matplotlib, numpy, and pandas available.
The archived explainer scripts regenerate:
-
tau_eff_intuition.png -
stale_rate_probability_model.png -
block_time_and_stale_expectations.png -
reach_proxy_tau_eff.png
The experiment plotting script regenerates:
-
experiment_block_propagation.png -
experiment_block_times.png -
experiment_competing_blocks_per_height.png -
experiment_stale_height_rates.png -
experiment_stale_height_rate_proxy.csv -
experiment_results_summary.json
plot_experiment_results.py defaults to experiment data at /home/evan/src/zcash/experiments/valar-1/data. To use a different checkout or exported data directory (such as the downloaded data linked above), set POW_MODELING_DATA_DIR:
POW_MODELING_DATA_DIR=/path/to/valar-1/data python3 plot_experiment_results.py







