Backing up Validator Server (Physical Data Center)

Some notes on this approach:

  • Your NFS server will be a single point of failure, and running a highly available NFS cluster (or anything that touches storage, really) is a science in itself. You now have two separate interdependent HA clusters to care about instead of just one (the validator and NFS).

  • A highly available enterprise SAN is very expensive and there’s still a chance of failure.

  • NFS is very latency-sensitive, so you can’t distribute it across multiple data centers. Same goes for a SAN - there are mechanisms for cross-data center mirroring, but they’re asynchronous (and therefore useless).

  • You will need a bullet-proof failover mechanism like pacemaker with an odd number of nodes to ensure that there’s always ever at most one validator process running, otherwise, you will end up double signing. Pacemaker and friends aren’t designed for cross data center operation, either, and finnicky to operate.

  • Failover will be rather slow and you will miss blocks.

  • By sharing the disk storage, you effectively have a single failure domain: there are a number of failure scenarios that you can’t recover from, like corrupted files, a filled-up disk or filesystem corruption.

  • Most importantly: This setup does not reliably prevent a split brain/double signing scenario - there’s plenty of edge cases. If your active validator crashes at just the right time, you will double sign. Write barriers are hard enough with local storage, and even harder with any network file system (we believe we just found a Tendermint bug while verifying this).

With Tendermint/Cosmos, you’re always going to want to sacrifice availability for consistency (a “CP” system in terms of the CAP theorem - any reliable distributed system needs to be partition-tolerant). The penalty for double signing is much harsher than missing blocks.

In practical terms, this means that unless you have a solid distributed systems background and operational experience, you might be better off running a single node on highly redundant enterprise hardware rather than building a HA setup on your own.

4 Likes