Resuming from 34% halt attack


#1

If a validator or group of validators gets 1/3 + 1 percent of the voting power, and then goes offline, the network should halt.

If they never come back online, how to resume the network?


#2

I believe that they get slashed until their voting power reaches below 1/3 of the network.


#3

If that validator goes offline, there should be no consensus and the network halts. That validator won’t be slashed.


#4

I think other validators have to fork the network with new genesis file to kick that offline validator out. However it comes to a problem. What if that validator start a new chain with a new genesis file as well? That chain will have the old blocks and app state before the chain stop. Will it become ETC vs ETH case again?


#5

revoking offline validators by utc timestamp can help to resume from 34% halt attack imo.


#6

This is incorrect. In the event that 1/3 is offline, consensus has halted, so no blocks will be produced. We can not slash here, as on-chain this can’t be distinguished from one third of the validator’s internet going out.

The only ways to resolve this is to continue to wait until they come back online (e.g. if part of the worlds internet really did go out) or instead to appeal to social consensus (As @kwunyeung noted). You have to find some way to obtain social consensus about which fork should be the real one. The ideal scenario is to debug why 1/3rd went offline / wait.

If delegators are responsive, we only have to fear if the 1/3rd are colluding, or a single correlated failure knocked out one third of nodes. The point of being decentralized is that a single failure shouldn’t be able to knock out 1/3rd of the nodes. So once the validator you delegate to goes offline for a non-neglible time period, you should redelegate. Since we support instant redelegation, this alleviates the chain-halting fear. (Though it is an interesting point of analysis whether or not you should redelegate. You may get a liveness slash if you redelegated, but if the chain halts and then they come back online, you probably won’t)

If its a colluding 1/3rd, or a correlated failure, were SOL and must appeal to social consensus via “twitter, riot, forums, etc.”. After a correlated failure though, I imagine delegators will truly value decentralization more, so it shouldn’t happen again.


#7

So the likelihood of halting attacks has been hotly debated for years, I’m excited to see what happens in the real world.

If this problem is a unlikely, I think a fall back to social consensus forming will be a perfectly adequate response.

It’s possible to invent changes to tendermint that enable self healing against these attacks by automatically introducing stronger synchrony assumptions.

2/3rds of the voting power online for liveness comes for our goal of maximizing asynchrony, it’s possible to parameterize this and have the protocol dynamically adjust it.

This is effectively how CBC casper works.