[Tool] cosmos-flake-detector: Find Unreliable RPC Endpoints

Hi Cosmos community! :waving_hand:

I’ve built a tool to help operators detect flaky RPC endpoints before they cause issues.

The Problem

We’ve all been there: RPC endpoints that work 95% of the time but fail when it matters. Traditional monitoring only checks `/health` or `/status`, missing query-specific issues.

The Solution: cosmos-flake-detector

A Rust CLI tool that:

  • Tests specific query paths (not just `/health`)
  • Measures latency with microsecond precision (HDR histogram)
  • Calculates flakiness scores (0-100)
  • Exports JSON for CI/CD integration
  • Runs concurrent load tests

Example Usage

cosmos-flake-detector \
  --endpoints "https://rpc1.com,https://rpc2.com" \
  --duration 120 \
  --output results.json

Features

  • Query-specific testing (abci_info, status, genesis, etc.)
  • p50/p95/p99 latency metrics
  • Flakiness scoring algorithm
  • Concurrent testing
  • JSON export
  • Open source (MIT)

GitHub

saadaltafofficial/cosmos-flake-detector

Feedback and contributions welcome!

Use Cases

  • Validator operations (pre state-sync testing)

  • Chain indexers (CosmWasm endpoint testing)

  • CI/CD health checks

  • Continuous monitoring

Would love to hear if this solves a pain point for you!

1 Like

This would probably work best as a stand-alone site with an api

1 Like

yes no doubt, thanks for advice.