Slashing updates in replicated security

jtremback · February 1, 2023, 4:12am

One of our goals at the Informal Systems Cosmos Hub team is to be open and responsive to feedback from the community on our work. For this reason, we aim to get all important governance proposals in front of the community on the governance forum well in advance of voting so that we can modify them with feedback received.

We put up the draft Replicated Security proposal on the forum in mid-December 2022. Since then we have received a lot of feedback. This post deals with one piece of feedback in particular, and our modifications to RS in response to this feedback.

Slashing in RS

Replicated Security ensures that consumer chains have the same exact security and liveness as the Cosmos Hub itself. Part of this is that validators must be punished for consensus infractions on consumer chains. In the current RS implementation, this is done by having the consumer chain send a “slash packet” to the provider chain reporting on any misbehavior by a validator. Once this happens, the slash packets go into a queue known as the “slash throttle” so that validators representing more than a few percent of voting power cannot be slashed at once (more on that later). Once the packets get out of this queue, they trigger a punishment of the validator committing the infraction.

Just as on a normal Cosmos chain, if the infraction is double signing, the punishment is tombstoning, a permanent removal of the validator from the validator set, and a loss of 5% of the validator’s funds. This loss of funds is known as slashing. If the infraction is downtime, the punishment is removal from the validator set for 10 minutes and slashing of only 0.01% of the validator’s funds.

These punishments occur on the Cosmos Hub, so misbehavior on a consumer chain impacts all nodes (Hub and consumer chain) run by that validator.

Community concerns

We received feedback from some validators and community members that it would be too risky to slash based solely on information transmitted from the consumer chain. The concern is that some malicious code on a consumer chain could send fake slash packets and slash a validator that had not committed any infractions.

This is a concern that we have as well, and we’ve built mitigations into the design already. The slash throttle that I mentioned earlier prevents too many validators from being slashed at once, a scenario which could harm the Cosmos Hub. We also recommend that all consumer chains be fully audited before being approved by governance, to avoid loss of user funds through normal vulnerabilities, as well as malicious code crafted to send fraudulent slash packets.

We also have updates to replicated security in the works which will allow the Cosmos Hub to verify double signing and downtime evidence on its own. However, this is not trivial, and will require more work. This is known as the “untrusted consumer chain protocol”.

Our response

In response to these concerns, we have decided to curtail the abilities of consumer chain code to slash validators on the Hub, at least in the first release of Replicated Security. This is temporary, until we release the untrusted consumer chain protocol.

Cosmos Hub validators will not be slashed or tombstoned for double signing on consumer chains.

Instead, instances of double signing on consumer chains will be logged. We are collaborating with the team at Ignite on a new type of governance proposal which can be used to slash and tombstone validators who equivocate on consumer chains. Instead of slashing and tombstoning validators who double sign on consumer chains immediately, it will go through a governance vote first. This modification will add an extra layer of safety, while still punishing validators who violate the rules of consensus. Double signing is extremely rare in practice.

Cosmos Hub validators will also not be slashed for downtime on consumer chains.

They will, however, still be jailed for downtime. We have determined that jailing is essential to provide liveness guarantees for consumer chains, and that the governance process is too slow to provide the same guarantees if every single instance of downtime must pass through a 2 week voting process.

In the scenario where malicious code on a consumer chain tries to jail every single Cosmos Hub validator at once, the throttling code mentioned above will take several days to jail them all. During this time every validator who is jailed will unjail themselves quickly (most have scripts to do this automatically already), resulting in the attack not taking more than a few percent of validation power out of the set at once. Most likely it won’t get very far at all though, since the offending consumer can be removed with an emergency upgrade.

universe · February 2, 2023, 7:54am

Are you guys aware that the planned update will put ANY operator of Cosmos validators at significant legal risk if they are forced (due to jailing) to validate ALL consumer chains, even if a regulator (think SEC) considers them securities?

At latest in the moment any of these consumer chains pays out validators in their own token and this token is considered a security token, we are all doomed…

ala.tusz.am · February 2, 2023, 9:47am

@universe

Doesn’t this fall under the domain of governance rather than technical infrastructure?

If I’m not mistaken, the technical infrastructure to off-board consumer chains already exists via the “remove consumer chain” proposal.

Maybe a better place to discuss it is here: Preparing for Replicated Security

LeonoorsCryptoman · February 2, 2023, 7:48pm

We also recommend that all consumer chains be fully audited before being approved by governance, to avoid loss of user funds through normal vulnerabilities, as well as malicious code crafted to send fraudulent slash packets.

Are there also safety measures in case the initial release of the code is ok and approved, but malicious code is integrated through upgrades?.

They will, however, still be jailed for downtime.

One interesting thought is that being removed from the active set is already quite a punishment, even without having a x% of delegations slashed. Because that effectively puts your commission at risk and your reputation. So a simple but effective measure might be to make the period longer after which you can re-enter the active set. That way you already have a simple but effective trigger to make sure your uptime is ok.

In the scenario where malicious code on a consumer chain tries to jail every single Cosmos Hub validator at once, the throttling code mentioned above will take several days to jail them all.

How will this work when it happens on multiple chains at once? Will that means that we will get a long queue of waiting slashings, which will take a long time to clear?

jacobgadikian · February 2, 2023, 10:16pm

Is this mechanism only for use in the case of equivocation and downtime?

Or is it broader scope?

It makes sense from the perspective of protecting the hub from a Byzantine consumer chain.

However equivocation is very serious.

How would we get a double signer out of consensus fast?

serejandmyself · February 3, 2023, 7:43pm

Reading this. I think the role of a validator needs an addition - growing ba**s to build for decentralization

jtremback · February 3, 2023, 10:10pm

Are there also safety measures in case the initial release of the code is ok and approved, but malicious code is integrated through upgrades?.

All upgrades should be audited as well. However, the audit to determine whether a chain has malicious slashing code in it is a lot easier to do than the audit to determine if it itself is secure agains attackers. But in general, the overhead of this auditing of all upgrades is a big reason we are trying to move towards an untrusted consumer chain paradigm as soon as possible.

One interesting thought is that being removed from the active set is already quite a punishment, even without having a x% of delegations slashed. Because that effectively puts your commission at risk and your reputation. So a simple but effective measure might be to make the period longer after which you can re-enter the active set. That way you already have a simple but effective trigger to make sure your uptime is ok.

Jailing already is effectively the main punishment for downtime, not the slashing. This is why we were able to remove the slashing component for greater safety while still disincentivizing downtime.

How will this work when it happens on multiple chains at once? Will that means that we will get a long queue of waiting slashings, which will take a long time to clear?

Yes, the throttle has one queue for slashings from all consumer chains. What it effectively does is prevent jailings from consumer chains from changing the power balance on the hub radically in a short time, since this is the scenario we are trying to prevent.

Under normal circumstances, the throttle will have very little effect on day to day jailings. Sometimes if several large validators have downtime at the same time, one of them will have to wait an hour before being jailed.

jtremback · February 3, 2023, 10:22pm

Is this mechanism only for use in the case of equivocation and downtime?

The social slashing mechanism is only to be used for equivocation, as this evidence can be verified by anyone participating in governance. Validators will still be jailed automatically for downtime on consumer chains.

However equivocation is very serious.
How would we get a double signer out of consensus fast?

To actually execute an attack using double signing, an adversary would need to control 2/3s of the stake on the Hub. At this point it would be impossible to remove them using any form of slashing, since they would control everything. Slashing is designed more to enforce the ground rules of PoS, to avoid a scenario where it is economically rational for every validator to validate on a million different forks of the chain.

Social slashing works just as well to prevent this scenario. So while perhaps a double signer deserves to be removed from the set as soon as possible, there is no immediate security risk if they are removed after 2 weeks. Indeed, the unbonding period on Cosmos chains is designed to tolerate evidence of equivocation being submitted weeks after it happened.

Jcook_14 · February 4, 2023, 4:39am

Would it be feasible or beneficial, to propose that we have a specific new form of emergency Governance proposals with respect to consumer chain related governance? Essentially social slashing/jailing in the case of a double sign or malicious code, could be automatically proposed and dealt with through Governance, but with a reduced time frame and automatically execution. However, the reduced voting period can still provide enough time to find if there was malicious code involved or if a double sign was legit.

Example: (X) validator, commits a double sign on (Y) consumer chain. If these types of double signs can be logged in an on-chain queue, maybe this queue could automatically trigger some form of emergency proposal to be automatically sent to on chain governance. This same governance, emergency prop, could automatically slash (X) validator, if the proposal passes.

If for example, there could be a standard slashing packet module in both the Hub and Consumer chains, mandatory for each consumer chain to utilize for the specific purpose of social slashing. This slashing packet module would log the double sign in a queue, and automatically trigger the emergency proposal, which could go for say, 5 days. Once the vote has been finished and if it passed, the module can automatically execute the slash of (X) validator.

However, this would also give 5 days of advanced notice if a large scale slashing is going to take place, due to malicious code on a consumer chain. And the voters then have the ability to stop the large scale slash, via social coordination by voting No or NWV on the proposal.

Not sure if this is even feasible, nor if it would realistically help at all. It could move the slash time from 14 days to 5 (or whatever amount), and make the process automatic, with no reliance on any specific parties to execute the slash, rather it executes automatically, based fully off of the social consensus.

Just thought of that, and wanted to comment, in case the idea is actually possible or beneficial in any way.

jacobgadikian · February 6, 2023, 11:57am

Then it is easy for me to support these updates to replicated security.

I have been absentmindedly concerned about byzantine consumer chains, and this is a good way of ensuring that a team doesn’t launch “SlashChain” which somewhere in its code has buried:

Report xyz vals for equivocation and rekt them.

Overall I think the likelihood of an unjustified social slash is lower than the likelihood of a SlashChain.

jtremback · February 6, 2023, 5:48pm

This is essentially what we have, except the proposal must be submitted by a human. This is not likely to change the security properties.

Jcook_14 · February 6, 2023, 6:27pm

Gotcha, thanks for the clarification on that.

effortcapital · February 14, 2023, 1:05am

I understand wanting to have social slashing in the event a consumer chain has a major bug that slashes a large set of the Hub’s validators, but I have concerns.

We have already seen governance takeover for Prop 82 where off-chain agreements were likely made to kill the prop. Can’t this exact same issue can happen to slash specific validators as a form of retaliation (not like it’s too far fetched given how divided part of the community are for the vision of the Hub).

Validators need to be held accountable to do deep due diligence on the code.

With that being said, why not do the exact opposite. Slash if a validator does something seemingly malicious, but instead of burning funds it’s held in a separate pool. We then create a bipartisan/neutral governance council (voted on by Hub community) that decides whether that slashing event was warranted (was it bad consumer chain code everyone missed, or was it actually malicious validator behavior). If the former, return funds back to slashed validator. If the latter, burn the funds.

Eigenlayer is going down this path. It puts onus on the validators to still audit code, it creates a neutral third party (voted by governance and maybe rotated every quarter) that decides the slashed funds fate, and it potentially removes issues around social retaliation/governance capture.

It’s probably too late to make this change, but I’m sure I’m not the only one with concerns around social slashing.

jtremback · February 14, 2023, 1:53am

We have already seen governance takeover for Prop 82 where off-chain agreements were likely made to kill the prop. Can’t this exact same issue can happen to slash specific validators as a form of retaliation (not like it’s too far fetched given how divided part of the community are for the vision of the Hub).

Just to be clear, the governance-gated slashing in RS cannot be used for general-purpose slashing of validators. It is not possible to create an equivocation slashing proposal without a slash packet having been received for that validator from a consumer.

So given the very limited use-case here of governance acting as nothing more than another layer of safety to approve slashing events that have already been transmitted by a consumer chain, I don’t think that concerns about slashing being misused are relevant in this case.

jtremback · February 14, 2023, 1:57am

We’ve written an in depth analysis of the the effects that governance-gating will have here: Informal Blog — Governance-Gated vs. Automatic Equivocation Slashing

TLDR; The only concrete changes are that evidence will need to be submitted within a week for the governance vote to finish in time, and consumer chains will need to reduce their IBC trusting periods to 5 days.

effortcapital · February 14, 2023, 2:29am

Appreciate the quick response! Thanks for the clarification.

All systems Go!

system · February 28, 2023, 2:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

hub.tusz.mod · August 11, 2023, 12:57pm

Associated discussion:

jacobgadikian · August 14, 2023, 7:15am

Associated Discussion:

Topic		Replies	Views
[PROPOSAL #187][ACCEPTED] V9 Lambda upgrade (with Replicated Security) Software Upgrade accepted	75	11474	April 25, 2023
Prop 818 Discussion Hub Proposals	29	2031	August 22, 2023
Enabling Opt-in and Mesh Security with Fraud Votes Essays	12	1686	July 14, 2023
Preparing for Replicated Security Essays	40	4757	February 15, 2023
[Proposal] [Draft] Proportional Slashing Proposal Ideas	28	4942	June 20, 2020