Enabling Opt-in and Mesh Security with Fraud Votes

We’re putting this on the forum to start a discussion about whether it makes sense to create a new type of governance proposal on the Hub which would allow for votes to slash validators or delegators for attacking consumer chains with incorrect execution.

Several things are important to note:

  • This new proposal type would only apply to validators who had opted into an Opt-in consumer chain, or delegators who had opted into a Mesh consumer chain.
  • It is a temporary measure until fraud proof technology and surrounding systems are mature enough.
  • This type of proposal is to be used only to slash validators for incorrect execution. It is not to be used to slash validators for downtime on a consumer chain (even if they maliciously cause it to halt), or any other offenses. If we implement this type of proposal we should add text to all governance proposals created stating the above.
  • Even if the Cosmos Hub does not use this proposal type to enable Mesh Security, it is likely that other Mesh provider chains will.

Unlike Replicated Security, both Opt-in Security and Mesh Security allow a subset of the stakers on a chain to contribute to securing another chain. While the ability to secure other chains is a very powerful tool, it can also have some pitfalls. It must be possible to punish these stakers for attacking the chains they secure.

What stops attacks on blockchains?

Currently, proof of stake protocols punish validators for double signing. Double signing is dangerous for a chain when a ⅓ cartel of stakers sign two conflicting blocks at the same block height. As an example, this could be used in an attack where one victim is led to believe that they have received some money and another victim is led to believe that they have received the same money. Each history is valid on its own, but together they are invalid.

This is very good to defend against on its own, but it is not sufficient to secure a chain. The other form of attack that must be defended against is known as an incorrect execution attack. In this attack, a ⅓ cartel of stakers sign a block that simply breaks the rules of the chain. For example, a chain might have a rule that says “no tokens can be transferred from one account to another without a signature from the sending account”. In an incorrect execution attack, a cartel of stakers might just sign a block that transfers everyone’s tokens into their own wallets. People running full nodes would know that something was wrong, but light clients such as those used in IBC bridges would have no idea.

So what stops cartels of stakers from performing an incorrect execution attack today? There are many factors, but one of the most important is known as token toxicity. If a cartel of stakers on the Cosmos Hub were to perform this attack to empty all of the Hub’s IBC bridges, it would crash the value of Atom, since the Hub’s security would have been shown to be worthless.

This dynamic holds for Replicated Security. RS consumer chains are solely secured by their provider chain, in this example the Cosmos Hub. So from a token toxicity standpoint, a ⅓ cartel of Atom stakers compromising a consumer chain is exactly like this cartel compromising the Hub itself. Either way, such an attack would make the security of the Hub worthless, and this acts as an incentive for validators not to perform these attacks.

Why token toxicity doesn’t work in Opt-in and Mesh Security

With Opt-in and Mesh Security, it’s not so clear that token toxicity will keep consumer chains safe. This is because the responsibility for an attack could be much more diffuse. Let’s look at some examples.

Imagine an Opt-in Security consumer chain has a $20m TVL, and is secured by Cosmos Hub validators with $70m in stake. This is theoretically secure against double signing. The ⅓ cartel which could attack this consumer chain has $23.3m staked and slashable[1], so stealing the $20m doesn’t make sense.

It is, however, not secure against incorrect execution. If there is no way to slash for incorrect execution on a consumer chain, then we rely on token toxicity. But this $70m of stake is a small fraction of the total stake on the Cosmos Hub. It’s not clear that the malfeasance of this small fraction of validators would crash the Atom price. After all, the vast majority of Atom stakers in this scenario are honest, and the attack did not affect the Hub itself or any other consumer chains. Of course, it’s impossible to predict what price movements would happen, but the case for a complete loss of Atom’s value in this scenario is a lot weaker. I don’t think we can rely on token toxicity.

Let’s look at a similar scenario with Mesh Security. I would argue that the responsibility is even more diffuse in this case. Imagine a Mesh Security chain with $20m in TVL whose total stake is $70m, with $50m coming from a variety of provider chains. An attacker with $23.3m could deploy this capital across the provider chains to gain control of a ⅓ cartel of validators on the consumer chain. They could then perform the same attack.

In this scenario, it’s very unlikely that anyone would even think about blaming the provider chains. Since there are several of them, the attacker may only control a very small portion of the stake on each.

Slashing for incorrect execution

Both of these examples involve stakers with power over a consumer chain committing incorrect execution and not being slashed for it. The chain’s security must therefore rely on token toxicity, which for Opt-in and Mesh Security probably does not work.

The way to solve this is to avoid relying on token toxicity. If the attacker in these examples could be slashed for incorrect execution, this would solve the problem. Slashing for double signing is relatively trivial. In principle, it simply requires looking for two signatures on different blocks at the same height from the same validator. Slashing for incorrect execution is less trivial, because it requires a concept of what the “correct” execution is, which depends on the state and the execution dynamics. This requires something called “fraud proofs”.

Fraud proofs

Fraud proofs are an ongoing area of research and primarily intended for use with roll-ups, which are similar to consumer chains. A fraud proof allows you to prove that a validator signed an incorrect state transition, without needing to run a full node for the chain involved. A provider chain could accept proof that an incorrect state transition was signed by a validator and slash the stakers involved. This would enable Opt-in and Mesh security to function as intended.

However, fraud proof implementations are perpetually six months from completion. It’s a very hard technical problem. Currently there is no fraud proof framework that will work for Cosmos chains using Opt-in or Mesh security.

There are also a lot of problems that need to be solved around the edges. For example, in a naive implementation, a validator who accidentally ran the wrong binary during an upgrade might be slashed for fraud. Technically, it could be proved that they had signed an incorrect state transition if they were accidentally running the wrong binary. There needs to be a framework to handle these scenarios safely.

Additionally, fraud proofs (and ZK validity proofs) currently require a system called a “DA layer”, which is essentially another blockchain where all transactions must be posted for the fraud or validity proof to even work. If this DA layer is compromised, it may become impossible to slash for incorrect execution. In some sense, security is provided by the DA layer. What security is provided by Opt-in or Mesh security in this scenario needs to be examined more closely.

All of these challenges can and will be solved, but as the Cosmos community, we need to ask ourselves if we want to wait for that.

Fraud votes

To be able to launch Opt-in and Mesh security while work continues on cutting-edge fraud proof research, we can turn to a simple mechanism: the fraud vote. This would be a type of governance proposal. If (and only if) a staker or validator had opted in to stake on a Opt-in or Mesh consumer chain, they would become eligible to be slashed by this governance proposal. If an attack involving incorrect execution happened, proof could be submitted in this proposal. Voters could then sync up full nodes for the chain in question and verify the incorrect execution for themselves. This is certainly not as elegant as a fully automatic and optimized fraud proving system, but it should have much the same effect.

This also allows us to capitalize on an advantage that Cosmos has over rivals. Ethereum-based shared security systems such as Eigenlayer are under development. Cosmos has the lead for now, but we need to be prepared for the much larger stake on Ethereum to enter the market for shared security. However, Ethereum lacks a governance system, and it would not be possible for it to use an analogous mechanism. Perhaps a contract like Eigenlayer could use some sort of multisig or council or something, but this is clearly very centralized. They need to rely on fraud proofs. The possibility of fraud votes on Cosmos allows us to innovate and get out ahead while fraud proofs are still under development.

The YOLO scenario (hard fork slashing)

It is of course possible to run Opt-in or Mesh Security without fraud proofs or fraud votes. In this case, if validators or delegators were to cause incorrect execution, the only option for slashing them would be for the provider chain to hard fork to a version of the state where the offenders were slashed. This would essentially be the same thing as this proposal does with governance, but done with a potentially frantic back-channel hard fork. It could work, but it seems much better for something like this to be done in a controlled and intentional manner with a fraud vote as proposed here.

Arguments against fraud votes

Efficiency/spam argument

The core of fraud proof technology is the ability for the fraud proving framework to prove fraud without needing to sync up all the data that a full node needs. Our fraud votes would not have this ability, thus voters would need to sync up full nodes. This would be somewhat expensive. An attack utilizing this shortcoming could be as follows:

  • The attacker would spam the provider chain with fraud vote proposals.
  • Voters on the provider chain would become tired of the expense of syncing up full nodes to verify them.
  • The attacker would then commit an actual instance of incorrect execution on a consumer chain, and nobody would bother to vote on the fraud vote proposal.

To prevent this, it would be good to look into raising the requirements for deposit on fraud votes and maybe increase the range of scenarios under which such a deposit is burned. Also, it seems unlikely that this would be a huge problem in real life. Even if the provider chain was flooded with spam fraud vote proposals, an actual instance of theft from a consumer chain would be a big event and would have victims. Human communication outside of the blockchain could lead voters to the real fraud vote proposal, which they could then verify and vote on.

This may break down in advanced scenarios. In a world with thousands or millions of consumer chains, or one where consumer chains are lightweight and ephemeral, voters might not care enough to vote on any fraud vote proposals. But for the use cases that Opt-in and Mesh Security are currently being built for, with hundreds of chains securing relatively large amounts of value, fraud votes should work just fine. Once we reach these more advanced scenarios in the future, real fraud proofs should be ready.

Contentiousness argument

Another potential argument against fraud votes is the fact that contentious scenarios may arise. The example I have used so far is trivial in that it is obvious to everyone that the validators committing incorrect execution have done it with the intent to steal. However, imagine the following scenario: \

  • An Opt-in or Mesh consumer chain experiences an event that some consider an attack on the protocol and others consider to be an example of the attack’s “victims” trading badly and trying to socialize their losses by calling it an attack. Things like this have happened in the past involving flash loans, hostile DAO takeovers, and even the Luna depeg. Things like this happen frequently in the traditional finance world as well.
  • Validators on the consumer chain apply an emergency upgrade to stop the “attack”.
  • In the aftermath, those who feel that it wasn’t actually an attack submit a fraud vote proposal to the Cosmos Hub. Technically, it can be argued that the validators applying the upgrade committed incorrect execution because once they were running the emergency upgrade, they were no longer following the protocol originally specified by the consumer chain when it launched or in its last governance upgrade.
  • Now Cosmos Hub governance needs to answer this question.

Vitalik Buterin writes about wanting to avoid this type of scenario on Ethereum. It should be noted that in the case of real fraud proofs, the validators applying the emergency upgrade would be slashed automatically. This is one of the issues which makes real fraud proofs tricky. For example, Eigenlayer, a system similar to Interchain Security which is intended to be used with real fraud proofs faces this issue. They have built in a backdoor via a multisig “comprised of prominent members of the Ethereum and EigenLayer community” as a temporary solution (section 3.4.2).

This highlights an important issue: it’s very hard to avoid emergency upgrades, hardforks, and contentious events in real life. Ethereum may try, and it’s a lofty goal, but it’s not always possible. We saw this at the beginning of Ethereum with The DAO hack and hardfork. Even today, many Ethereum projects, like Eigenlayer, include a backdoor controlled by “prominent community members”. Cosmos should lean into its advantages, one of which is a robust and frequently used governance system built into the code and culture at a low level. In the case of the fraud vote system proposed here, this will allow us to increase our lead in the shared security space.

[1] In reality, on most Cosmos chains, the actual double signing slashing percentage is set to 5%, which means that under this analysis the security is only 5% of the total number. This should probably be changed, but that’s outside the scope of this example.

Protocol design notes

In the Cosmos governance system, votes take a fixed amount of time and only have an effect after the vote is concluded. If a fraud vote proposal is submitted to the provider too late, then the stakers in question will be fully unbonded before the voting period ends and will be able to escape slashing. With standard settings of a 2 week voting period and a 3 week unbonding period, this means that there is only one week during which a fraud vote can be submitted and successfully slash the stakers.

pu: provider unbonding period
pv: provider voting period 
epu: effective provider unbonding period

pu - pv = epu

If we reduce the voting period for fraud votes to a shorter amount of time, perhaps one week, then it will provide a longer amount of time during which the fraud vote proposal can be created. It is probably not a good idea to reduce the voting period to less than one week given the seriousness of a fraud vote.

11 Likes

I am in support of this because I think that what we have done best with in cosmos is governance.

I am not aware of other systems routinely holding votes at the scale that we do in cosmos.

Obviously, there’s potential for abuse here, in particular of the “minority rights protection” kind (think like state of delaware courts, for example). That said, this is a sensible and sober solution to the issues at hand that will allow us to grow.

Yeah I think that the YOLO scenario is best avoided. “The slasher” has always been a key differentiator of cosmos.

Risk vs Reward

Risks:

  • This system could be used outside of cases of clear fraud.
  • This system could be used to punish “fraud” along political lines, a relevant tweet is:

Rewards:

  • Mathematical fraud proofs for incorrect execution are dizzyingly hard and to this day I am not aware of a hard slash that boils down to a malicious validator.
  • This could possibly replace the hard slash mechanism altogether, ensuring that we don’t have yet another case of delegators losing 5-10% of their stake because of a configuration error on the part of their validator.
  • Growth

therefore

Let’s do it, as cautiously as we can, but let’s do it.

3 Likes

I support this initiative. One of Cosmos strengths is governance, so playing to those strengths and fast-tracking opt-in/mesh (before validity proofs are technically ready) is advantageous. I doubt RS will scale beyond a handful of chains, so for the Cosmos to stay at the center of shared security in Cosmos, it will need opt-in/mesh. Clear slashing conditions should also make the Hub more attractive as a mesh provider - consumers have stronger guarantees that Hub validators will be slashed if they incorrectly execute blocks, compared to other chains with similar market caps.

So putting additional slashing conditions on validators who opt-in to provide security makes sense: validators must be held accountable for correct execution, and the slashing conditions make the Hub a more attractive security provider. However, broad social slashing is risky - clearly defining a social contract around what exact conditions lead to a slash and what the magnitude of the slash will be is better than governance trying to reason about whether the validator committed an offense/how much the validator should be slashed, after the offense has happened.

I do think social slashing is fraught (for example, social slashing can undermine property rights - very bad!), so clearly defining the social contract early and having a plan to migrate to in-protocol proofs is essential.

3 Likes

Sounds interesting. A bit concerned about the requirement for voters to sync up full nodes, which is expensive. Im also curious as to what is the current landscape for fraud proof systems is like? I mean any major developments that I can read (possibly some papers).

In this nascent phase of ICS V1, I feel like if we provide opt out too soon, many top validators may start opting out of all consumer chains because they don’t want the slashing risk without much incentive. And it will negatively affect the narrative of inheriting $3B worth of economic security.
Apart from that I think the idea is cool.

This is what makes me absolutely love working with stride.

What I mean is, I’ve always thought that they really think things through.

Overall as I stated above, and for all of the same reasons that Aiden supports this effort, and @jtremback is proposing it, I’m in support.

2 Likes

Yeah actually right now I’m kind of feeling negative about the opt-out .

My reasoning for that is that I view the lower slots on the Cosmos hub as training wheels and so they should not be excluded, they should be instead exposed to the full difficulty of running a full Cosmos hub validator.

1 Like

Primary issue is not with lower ranked validators opting out. Its with top rank validators opting out. Lets say top 15 validators opted out from the next CC. The security gets halved since now only 50% of stake is securing that chain.
There is secondary issue with lower ranked validators opting out, and its not about seeing them as training wheels. It’s that their delegators don’t get rewards from consumer chains and as a result a delegator will get more rewards if they stake to a higher ranked validator which can afford to opt-in to all consumer chains. This will lead to stake centralization.

i believe that if the top 15 validators opt out, the security will only go down for a short time, imo, in this case a flow of smaller validators would enter the set, never seen before =) in the case when the entry price grows, the case is imo, alas - worse

1 Like

Opt-in security used to be called ICS v2 and Mesh security ICS v3. With ICS v2 consumer chains can launch with one permisionless transaction rather than going through the governance voting process. This means that many consumer chains may launch with ICS v2/Opt-in Security but then validators will perform in-depth due diligence and only opt-in for those consumer chains promising or only providing higher rewards than the costs to run that consumer chain nodes, hence leading to consumer chain competition and only the best consumer chains will get most of the high security from the Cosmos hub. With Replicated security, the competition is very low, meaning that any serious project putting a governance proposal to become a consumer chain will likely be approved to enhance the narrative of the ATOM economic zone even if no rewards are provided to validators and only costs. The biggest issue with Replicated security is that all validators have to run consumer chain nodes, even at an increasing loss as is currently the case with new consumer chains joining. We have to say a big thank you to @jtremback for introducing the soft opt-out option for the bottom 5% of validators, otherwise after Stride and Duality many of these validators would likely be forced to stop operations in the Cosmos Hub. Moreover, since for those above the 5% voting power not running consumer chain nodes there is only jailing but no downtime slashing, and double sign slashing goes through a governance process, we can see several validators above the 5% bottom voting power not running Neutron nodes and constantly unjailing after they are jailed. This means that even above the bottom 5% voting power, there is demand for the soft opt-out and since not available they are using other strategies to not run consumer chain nodes.
I think if with Fraud votes ICS v2/v3 can be implemented in Cosmos before Fraud proofs are ready, and hence going ahead of Ethereum and other projects, this seems like a great idea.

1 Like

I really like this idea @jtremback.

Getting out in front of this issue with a system that works, leveraged cosmos’s strengths and allows us to build more automated systems is a fantastic approach.

controversially, I think we should drop soft opt-out.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.