Project Summary
This project implements the missing Zebra Prometheus metrics for fork heights and fork lengths tracked in Zebra issue #5297, enabling production operators to monitor and alert on abnormal fork behavior using their existing Prometheus setup. It delivers upstream code and regression tests (plus brief metric definitions) and does not introduce any new monitoring platform or service.
Project Description
This project closes a specific observability gap in Zebra by implementing the missing Prometheus metrics for fork heights and fork lengths tracked in Zebra issue #5297. These metrics are needed so production operators can monitor and alert on fork and reorg behavior using Zebra’s existing /metrics endpoint and their current Prometheus setup.
When fork behavior becomes abnormal, infrastructure teams often have to spend extra time diagnosing what is happening because the most relevant fork signals are not directly available as Prometheus time series in the form requested in #5297. Zebra can compute fork-related values, but the missing work is exporting them as stable, operator-friendly metrics that can be used consistently for dashboards and alert rules.
The goal of this project is to implement the exact missing metric families described in issue #5297:
- A Prometheus histogram for fork heights across recent forks
- A Prometheus histogram for fork lengths, computed as
tip height − fork height, across recent forks - Any “best chain only” gauges referenced in the issue, if they are not already exported
Implementation will be upstream-first in ZcashFoundation/zebra and will follow safe metric design practices by avoiding high-cardinality labels such as block hashes, peer IDs, or other unbounded strings. We will add regression tests that exercise controlled fork scenarios (using Zebra’s existing test harness patterns) and verify that the metrics update correctly, so the behavior remains reliable over time. We will also include a short metric definition note and a small set of example PromQL queries to support operator adoption.
This project is intentionally narrow. It does not change consensus rules, chain selection, mining logic, or network behavior. It does not create a hosted monitoring service or a separate observability platform. Success is measured by upstream PR(s) merged (or approved and queued for merge), CI tests covering the new metrics, and the new metric families being visible in Zebra’s /metrics output.
Proposed Problem
When fork and reorg behavior becomes abnormal, infrastructure teams lose time on diagnosis and risk making the wrong operational calls because they don’t have a direct, queryable signal that explains what the chain is doing. In practice, that means slower incident response for exchanges, custodians, RPC providers, and explorers, and more uncertainty around confirmation safety during stressful periods. Zebra can compute fork-related values, but a key observability gap remains: the missing Prometheus tracking for fork heights and fork lengths (computed as tip height − fork height) across recent forks.
This gap is already documented upstream. Zebra issue #5297 states these fork observability metrics were requested by ZIP editors and are still missing from Prometheus export, and it is labeled E-help-wanted, indicating the work is needed and suitable for an external contribution.
Public evidence: ZcashFoundation/zebra#5297
More broadly, the Zcash community has been actively discussing the need for production-grade monitoring and better bottom-up metrics inside core infrastructure, because operators cannot rely on “manual investigation” during incidents.
Public context: What if Zcash stopped flying blind and had production-grade monitoring?
Proposed Solution
This project solves the problem by implementing the missing Zebra Prometheus metrics for fork heights and fork lengths described in Zebra issue #5297, so operators can monitor and alert on fork behavior directly from Zebra’s existing /metrics endpoint using their current Prometheus setup.
- Implement the missing metric families upstream in
ZcashFoundation/zebra
Add Prometheus histogram tracking for fork heights across recent forks and fork lengths across recent forks, computed astip height − fork height. Add any “best chain only” gauges referenced in the issue if they are not already exported. - Ensure the metrics are operator-safe and stable
Use low-cardinality metric design by avoiding labels that can grow without bound, such as block hashes, peer IDs, or other dynamic identifiers. Provide short metric definitions so operators understand exactly what each metric represents. - Add regression tests to keep the signals trustworthy
Add tests that exercise controlled fork scenarios using Zebra’s existing test patterns and verify that the metrics are exported and update correctly. This reduces the chance of silent regressions in future refactors. - Support immediate adoption without building a monitoring platform
Include a small set of example PromQL queries that operators can adapt for alerts and dashboards, while keeping the scope strictly to upstream metrics, tests, and brief documentation.
Solution Format
The solution will be delivered as upstream open-source changes in Zebra, with tests and brief operator-facing documentation.
Deliverables
- Zebra PR(s): Prometheus metrics for fork heights and fork lengths (computed as
tip height − fork height) for recent forks, plus any referenced “best chain only” gauges that are currently missing. - Tests: Regression tests that exercise controlled fork scenarios and verify the new metrics are exported and update correctly.
- Brief documentation: Short metric definitions and a few example PromQL queries to support operator adoption.
Dependencies
Technical dependencies
- Zebra’s existing Prometheus metrics endpoint and internal fork-tracking data structures used to compute fork height and fork length values.
Resource dependencies
- Standard CI resources to run unit/integration tests for the new metrics (no special infrastructure required).
Collaboration dependencies
- Upstream review and merge coordination with Zebra maintainers to confirm metric naming/labels and test approach and to land the PRs.
Technical Approach
We will implement this as a small, upstream-first change in ZcashFoundation/zebra, scoped to the missing metrics in issue #5297 and the tests needed to keep them correct.
1) Validate current state against #5297
- Identify the exact fork observability values Zebra already computes and confirm which Prometheus exports are missing relative to the metric list in #5297.
- Propose metric names and bucket strategy consistent with Zebra’s existing Prometheus conventions before finalizing implementation.
2) Implement the missing Prometheus metric families
- Add Prometheus histograms for:
- fork heights across recent forks
- fork lengths across recent forks, computed as
tip height − fork height
- Add any “best chain only” gauges referenced in #5297 only if they are not already exported.
- Ensure metric updates occur on fork-related state changes rather than being recomputed on every scrape.
- Use low-cardinality design: no labels that can grow without bound (no hashes, peer IDs, or dynamic strings).
3) Add regression tests that exercise fork scenarios
- Add tests using Zebra’s existing test harness patterns to create controlled fork conditions and verify:
- the new metric families are present in
/metrics - histogram observations change as forks appear and resolve
- fork length observations match
tip height − fork heightfor the scenario
- the new metric families are present in
4) Ship minimal operator-facing definitions
- Add a short in-repo note that defines each metric, the unit for each value, and a few example PromQL queries operators can adapt for dashboards and alerts.
Upstream Merge Opportunities
Which upstream repositories we plan to modify
- Primary:
ZcashFoundation/zebra
All work is upstream-first in Zebra. No long-lived fork is planned.
What changes we plan to make
- Implement the missing Prometheus metric families in issue #5297:
- histograms for fork heights across recent forks
- histograms for fork lengths across recent forks, computed as
tip height − fork height
- Add any referenced “best chain only” gauges only if they are currently missing.
- Add regression tests that exercise controlled fork scenarios and verify the new metrics are exported and update correctly.
- Add a short in-repo metric definition note with a few example PromQL queries.
How these changes benefit the wider Zcash ecosystem if merged upstream
- Zebra operators can monitor and alert on fork behavior using standard Prometheus tooling.
- The metrics ship with Zebra and are reusable across different operator monitoring stacks.
- Tests reduce the chance of regressions in these signals over time.
Coordination needed with upstream maintainers
- Confirm metric naming and label policy (low-cardinality) and the preferred test approach.
- Agree on PR structure (single PR vs split PRs) to match review preferences.
Timeline considerations for upstream merges
- Week 1: Post a short design note on #5297 and open a draft PR for early feedback.
- Weeks 2–5: Implement metrics, tests, and docs; iterate based on review until merge-ready.
Hardware/Software Costs (USD)
$0
Hardware/Software Justification
n/a
Service Costs (USD)
n/a
Service Costs Justification
n/a
Compensation Costs (USD)
$20,000
Compensation Costs Justification
Compensation covers the engineering and QA work required to deliver and land the upstream Zebra changes:
Implement Prometheus metrics for fork heights and fork lengths (computed as tip height − fork height) as specified in issue #5297.
Add regression tests that create controlled fork scenarios and verify the metrics are exported and update correctly.
Prepare merge-ready PRs, iterate on maintainer review feedback, and finalize a short in-repo metric definition note with example PromQL queries.
Total Budget (USD)
$20,000
Previous Funding
No
Previous Funding Details
No response
Other Funding Sources
No
Other Funding Sources Details
No response
Implementation Risks
- Upstream review and merge timing
The deliverables must be merged intoZcashFoundation/zebra. Maintainer review timelines may affect when the changes land. Mitigation: keep PRs small, get early signoff on metric names/labels, and iterate quickly on review feedback. - Metric correctness under fork scenarios
Fork height and fork length must be recorded accurately as forks appear and resolve. Mitigation: implement metrics directly alongside existing fork tracking logic and add regression tests that exercise representative fork conditions. - Safe metric design (cardinality and performance)
Metrics must avoid high-cardinality labels and be updated efficiently to prevent monitoring overhead or performance regressions. Mitigation: use low-cardinality labels only (or none), bounded histograms, and update metrics only on fork-related events. - Test harness complexity
Creating controlled fork scenarios for tests can be non-trivial, and tests must remain reliable in CI. Mitigation: use deterministic, CI-friendly fork scenarios (no timing- or network-dependent assertions) and validate both metric presence and expected updates in/metrics.
Potential Side Effects
- Minor runtime overhead during fork events
Updating histogram observations may add a small amount of CPU work and memory bookkeeping when forks/reorgs occur. The instrumentation is observational only and is designed to be lightweight. - Slightly larger
/metricsoutput and more Prometheus time series
New metric families increase the/metricspayload and Prometheus storage/ingestion slightly. This is minimized by a low-cardinality design (no high-cardinality labels) and bounded histogram buckets. - Potential alert noise if thresholds are configured too aggressively
Operators may see noisy paging if alert thresholds are set too tight. This is mitigated by clear metric definitions and conservative example PromQL queries.
Success Metrics
- Upstream acceptance: PR(s) implementing the fork-height and fork-length Prometheus metric families are merged into
ZcashFoundation/zebra(or maintainers explicitly mark them approved / merge-ready, if the final merge timing is outside the project’s control). - Metrics exposed with correct semantics: Zebra’s
/metricsendpoint exports the new metric families with the finalized names/units, including:- fork height histogram (recent forks)
- fork length histogram, defined as
tip_height − fork_height(recent forks)
and the design uses a low-cardinality approach (no per-fork-hash or per-peer labels).
- Regression coverage in CI: Deterministic tests are added and pass in Zebra CI, validating:
- the metrics are present in
/metrics, and - the metrics update as expected under controlled fork scenarios.
- the metrics are present in
- Operator usability: Minimal in-repo documentation is merged that defines the metrics (meaning, units, and interpretation) and includes a small set of example PromQL queries that are validated against a local Zebra run.
Startup Funding (USD)
$0
Startup Funding Justification
n/a
-
Milestone: 1
Amount (USD): 4,000
Expected Completion Date: 2026-02-27
User Stories:- “As a Zebra maintainer, I want the proposed fork-height and fork-length metric names/units/labels reviewed early, so that implementation PRs follow existing conventions and are straightforward to review.”
- “As a Zcash node operator, I want clear definitions for what these fork metrics represent (including what Zebra treats as a ‘recent fork’), so that I can interpret the data correctly.”
Deliverables: - Post a concise design note on Zebra issue #5297 proposing: metric names, units, label policy (low-cardinality), and a histogram bucket strategy (or an explicit note to follow existing Zebra histogram patterns).
- Ask maintainers whether they prefer one PR or two PRs (metrics first, tests/docs second) and record the guidance.
- Open a draft PR (or PR-ready branch) showing the planned metric family definitions/wiring location(s) and a deterministic test plan outline (no high-cardinality labels; no dashboards/services).
- Audit existing Zebra metrics to confirm whether any relevant best-chain-only gauges are already exported; if any are clearly missing, list exactly which ones would be added (names + one-line definitions).
Acceptance Criteria: - Maintainer feedback is recorded on #5297 or the draft PR (approval or requested changes).
- Draft PR/branch builds cleanly and CI passes for any included changes.
- The proposed metrics explicitly avoid high-cardinality labels (e.g., no per-fork hash / per-peer labels).
-
Milestone: 2
Amount (USD): 9,000
Expected Completion Date: 2026-03-20
User Stories:- “As a Zcash node operator, I want fork height and fork length distributions exported via Prometheus, so that I can alert on and investigate abnormal reorg behavior using my existing monitoring stack.”
Deliverables: - Implement a Prometheus histogram for fork heights (recent forks), per Zebra #5297.
- Implement a Prometheus histogram for fork lengths (recent forks), defined as
tip_height − fork_height. - Add any best-chain-only gauge(s) only if confirmed missing in Milestone 1 (additive + low-cardinality).
- Submit merge-ready PR(s) with CI passing.
Acceptance Criteria: - New metric families are visible in Zebra’s
/metricsoutput with the agreed names/units. - Metric design remains low-cardinality (no per-fork/per-peer labels).
- CI passes, and maintainer review does not identify any blocking semantic or design issues (or blocking issues are clearly enumerated for Milestone 4).
- “As a Zcash node operator, I want fork height and fork length distributions exported via Prometheus, so that I can alert on and investigate abnormal reorg behavior using my existing monitoring stack.”
-
Milestone: 3
Amount (USD): 5,500
Expected Completion Date: 2026-04-03
User Stories:-
“As a Zebra maintainer, I want deterministic regression tests for these metrics, so that future changes don’t silently break fork observability.”
-
“As a Zcash node operator, I want minimal metric documentation and example PromQL, so that I can use the new metrics immediately after upgrading.”
Deliverables: -
Add deterministic fork-scenario regression tests that validate:
- the new metrics are present in
/metrics, and - the metrics update as expected across controlled fork scenarios.
- the new metrics are present in
-
Add minimal in-repo documentation covering: metric meaning, units, interpretation, label policy, and a small set of example PromQL queries.
-
Validate example PromQL queries against a local Zebra run (sanity check that metric names match docs).
Acceptance Criteria: -
Tests run in Zebra CI and pass reliably (no timing- or network-dependent assertions).
-
Documentation is included in-repo and matches the exported metric names/units.
-
Maintainer review does not identify missing test coverage for the intended fork scenarios.
-
-
Milestone: 4
Amount (USD): 1,500
Expected Completion Date: 2026-04-10
User Stories:- “As a Zebra maintainer, I want review feedback fully addressed, so that the change can be merged without follow-up work.”
- “As a Zcash node operator, I want the metrics shipped upstream, so that I can rely on them in production after upgrading Zebra.”
Deliverables: - Address all maintainer review feedback (naming, semantics, performance, test adjustments, and doc clarifications).
- Finalize PR(s) to merged, or explicitly approved / merge-ready if merge timing is outside the project’s control.
- Update Zebra #5297 with final status and links to the PR(s) (and close it if maintainers prefer).
Acceptance Criteria: - PR(s) are merged into
ZcashFoundation/zebra, or maintainers explicitly mark them approved / merge-ready in PR review. - Issue #5297 is updated with the outcome and references to the landed work.