Solution for running active-active validator nodes

Many validators want to run two active-validator nodes for production. We have ran validators without any problems for 1+ month. So I want to share to you guys.

  1. Servers: two validators + one tmkms server.

we did’t use sentries in this test. but you can add it for production.

2 Steps:

2.1 Install HSM server: visit tmkms for more information,

Make sure that your version of tmkms is higher than >= v0.6.3, otherwise you will get double signing.

2.2 setup tmkms.toml:

[[chain]]
id = "kava-testnet-2000"
key_format = { type = "bech32", account_key_prefix = "kavapub", consensus_key_prefix = "kavavalconspub" }

[[validator]]
addr = "tcp://validator-1-ip:26658" 
chain_id = "kava-testnet-2000"
secret_key = "/data/test.key"

[[validator]]
addr = "tcp://validator-2-ip:26658" 
chain_id = "kava-testnet-2000"
secret_key = "/data/test.key"

[[providers.yubihsm]]
adapter = { type = "usb" }
auth = { key = 4, password = "kms-validator-password-1y58g2...." }
keys = [{chain_ids = ["kava-testnet-2000"], key = 11}]

2.3 start tmkms server.

then tmkms server will try to connect to validator nodes.

2.4 edit config.toml of each validator:

# TCP or UNIX socket address for Tendermint to listen on fo
# connections from an external PrivValidator process
priv_validator_laddr = "tcp://0.0.0.0:26658"

2.5 start validator server

if succeed, you will see logs like this:

04:46:58 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://47.101.10.160:26658] signed PreVote: 2FC0C142C5 at h/r/s 34499/0/6 (102 ms)
04:46:59 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://kava-test.ping.pub:26658] signed PreVote:2FC0C142C5 at h/r/s 34499/0/6 (123 ms)
04:46:59 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://kava-test.ping.pub:26658] signed PreCommit: 2FC0C142C5 at h/r/s 34499/0/6 (102 ms)
04:46:59 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://47.101.10.160:26658] signed PreCommit: 2FC0C142C5 at h/r/s 34499/0/6 (199 ms)
04:47:00 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://kava-test.ping.pub:26658] signed PreVote:F4F042F8EB at h/r/s 34499/1/6 (123 ms)
04:47:01 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://47.101.10.160:26658] signed PreVote:F4F042F8EB at h/r/s 34499/1/6 (123 ms)
04:47:01 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://kava-test.ping.pub:26658] signed PreCommit:F4F042F8EB at h/r/s 34499/1/6 (156 ms)
04:47:01 ESC[0mESC[1mESC[34m[info] ESC[0m[kava-testnet-2000@tcp://47.101.10.160:26658] signed PreCommit:F4F042F8EB at h/r/s 34499/1/6 (212ms)

that’s all.

thanks @tarcieri for providing such a good software.

5 Likes

@ping, Thanks for sharing!

does the two validators need to share the same secret key?

A quick note on this: while this functionality is available in the latest releases of KMS, we’re still worried about validators running this configuration in a steady state. See this blog post on the topic:

Notable excerpt:

For this reason we recommend validators don’t run this sort of configuration in perpetuity, but use the functionality to failover between validators. Based on this incident, and others we’ll describe below, we in fact recommend you only run in this configuration on testnets for now because we think this operational mode needs a lot more testing to be safe .

1 Like

I don’t think secret key must be same.

B-Harvest is researching on using multiple consensus key for one validator operator. (Upgrading tendermint/cosmos-sdk)

The concept is that only one of the consensus key can propose blocks, but any of the listed key can vote on consensus.

Possible advantage of this solution is

  1. achieve active/active setup without kms
  2. achieve active active with actual decreased latency(validator server nearest to the proposer will vote), so result in shorter global block time, possibly under 2 seconds, not losing any geological resilience of the network.
2 Likes

You’re talking about some sort of load balancing on blockchain, right ?
We would be interested in the research, as we are using cosmos sdk to register a big amount of transactions in a short period of time
How do you deal with staking ? Does every consensus key gets the same voting power ?

Thanks for asking.
From our draft design, one validator operator possess multiple consensus keys.

  1. It is kind of very restricted “load balancing” + “co-location” of validator operation because more number of consensus keys implies more block header cost(size of header and computational cost). Therefore practically it will only allow several consensus keys(for practically 3, America/Europe/Asia) per validator operator.

  2. I guess the packet traffic performance will be greatly increased to allow the network to handle more transaction in given time, because most validator servers participating in a round of block will be located in same continent.(usually having much better speed if servers are closely located)

  3. on staking perspective, there is no difference. delegators are delegating tokens to “validator operator”, not “each consensus key”.

  4. not all consensus key of a validator participate in consensus. only the one which arrives earliest(from the eyes of the proposer) will be valid to participate in consensus. rest of the consensus keys will be ignored or cannot reach because of short timeout.

For example,

  1. validator A has proposing validator in SF, voting validators in FF(Frankfrut) and KR(Seoul)
  2. validator B has proposing validator in FF, voting validators in SF and KR
  3. validator C has proposing validator in KR, voting validators in SF and FF
  • when validator A propose, the validators in SF region will reply earliest, so those will participate.
  • other validators in other regions will come late, so ignored by proposer.
  • validator B propose, then same thing happens in FF region

So, if most validators have such validator co-location setup, we can comfortly reduce the timeout setup of the blockchain so that we can have network with much better performance and latency.

1 Like