Hi Cosmos community! ![]()
I’ve built a tool to help operators detect flaky RPC endpoints before they cause issues.
The Problem
We’ve all been there: RPC endpoints that work 95% of the time but fail when it matters. Traditional monitoring only checks `/health` or `/status`, missing query-specific issues.
The Solution: cosmos-flake-detector
A Rust CLI tool that:
- Tests specific query paths (not just `/health`)
- Measures latency with microsecond precision (HDR histogram)
- Calculates flakiness scores (0-100)
- Exports JSON for CI/CD integration
- Runs concurrent load tests
Example Usage
cosmos-flake-detector \
--endpoints "https://rpc1.com,https://rpc2.com" \
--duration 120 \
--output results.json
Features
- Query-specific testing (abci_info, status, genesis, etc.)
- p50/p95/p99 latency metrics
- Flakiness scoring algorithm
- Concurrent testing
- JSON export
- Open source (MIT)
GitHub
saadaltafofficial/cosmos-flake-detector
Feedback and contributions welcome!
Use Cases
-
Validator operations (pre state-sync testing)
-
Chain indexers (CosmWasm endpoint testing)
-
CI/CD health checks
-
Continuous monitoring
Would love to hear if this solves a pain point for you!