[Proposal] [Draft] Proportional Slashing

Proposed Proposal Body:

In the Cosmos Hub, centralization of consensus power amongst a small set of validators can cause harm to the network due to increased risk of censorship, liveness failure, fork attacks, etc.

However, while this centralization causes harm to the network, it is a negative externality not felt directly by the stakers contributing towards delegating towards already large validators. We would like a way to pass on the negative externality cost of centralization onto large validators and their delegators.

There have been many discussions in the Cosmos community on different mediums about disincentivizing delegations that centralize the network. However, many of them suffer from sybil attacks in which their effect is negated by validator operators running multiple validators accounts.

We propose adopting a procedure called Proportional Slashing which is resistant to such sybil attacks. In this system, instead of the slashing percent being equal for all slashes, the percentage a validator gets slashed for a fault is proportional to the validator’s percent of consensus power. This way, larger validators face harsher slash amounts thus incentivizing risk-managing delegators to delegate to smaller validators. There may also be minimum and maximum bounds on the size of the slash percentages.

To solve for sybil attacks, we make it such that a validator’s slash is not only dependent on their own voting power percent, but also dependent on the other validators who fault within a short time period (exact time periods can be an on-chain governable parameter), thus suggesting correlation amongst the faults. We make it such that the slash is dependent on both the total percentage voting power of validators who fault in a time period as well as on the number of validators who fault within the same time period. Using this mechanism, it disincentivizes operators from splitting into multiple validators, because if they fault in a correlated way, they’re increasing their own slashing percent.

This also creates a secondary benefit where it incentivizes validators to decorrelate their setups from other validators. Because the mechanism only cares about validators faulting in a short time period, it does not differentiate between correlation in faults due to being controlled by a single operator or for other reasons. This incentivizes validators to differentiate their setups from other validators, to avoid having correlated faults with them or else risking a higher slash. So for example, operators should avoid using the same data centers, popular cloud hosting platforms, or Staking as a Service providers. This will lead to a more resilient and decentralized network.

For details and explanation on the exact form of the function to calculate the slash percent, as well as a technical summary of implementation method, please see this proposed ADR to the Cosmos SDK repository: https://github.com/cosmos/cosmos-sdk/blob/sunny/prop-slashing-adr/docs/architecture/adt-014-proportional-slashing.md

Would love some feedback and discussion on this before I post it on-chain!


Edits:

  1. Removed reference to specific suggested “recent slash period” time lengths (@crainbf)
  2. Added minimum and maximum slash bounds (@bharvest)
  3. Removed reference to Gini Coefficient (@Gavin)
12 Likes

For reference, I also have an implementation here:

I think this is a good idea and agree that proportional slashing should be adopted.

One thing that does stand out to me from this proposal is that the criteria to determine correlation seems a bit overly broad, at least for double signing.

An entire unbonding period (three weeks) is very long. Wouldn’t one expect that if two validators ran on the same infra and had such a failure, they’d double sign simultaneously or within minutes of each other?

Is there some way to do this objectively?

Another key thing will be to choose the parameters appropriately. Maybe it would be reasonable that a validator with 10% voting power would have 2x the slashing penalty of a small one? Or what kind of numbers do you think are reasonable?

Yeah, those times were just examples. Those are mostly upper bounds on how long it can be, because we can’t slash past the unbonding period. You’re probably right, it should likely be lower. Not sure what it should be for double sign, maybe an hour? I feel for liveness, it should be at least the length of the liveness tracking window.

These should probably be governance-controlled parameters can be changed with ParamChange proposals. I should probably just remove the examples from the proposal, as it might hang people up.


Another key thing will be to choose the parameters appropriately. Maybe it would be reasonable that a validator with 10% voting power would have 2x the slashing penalty of a small one? Or what kind of numbers do you think are reasonable?

For the k constant multiplier mentioned in the ADR, I did make that a governance parameter in my implementation. But so you think we might want the slash percent to be sublinear with validator size rather than linear?

The current design Sunny came up with is based on a square root formula, so basically the slashing percentage for a validator is equal to his voting power and in case multiple validators (n) are at fault, it becomes (sqrt(voting power validator 1) + sqrt(voting power validator 2) + …+ sqrt(voting power validator n) )^2

So for two validators:
Validator 1: 2% of the voting power
Validator 2: 4% of the voting power

The original double sign slashing percentage for each one is 2% and 4%. When both get slashed within the same timeframe for double sign, it becomes: (sqrt(0.02) + sqrt(0.04))^2 = 11.65% for each one of them.

If we take another example with the hypothesis of 3 validators having 1% voting power each:
(sqrt(0.01) + sqrt(0.01) + sqrt(0.01))^2 = 9%

and another one with 3 validators having 4% of voting power each:
(sqrt(0.04) + sqrt(0.04) + sqrt(0.04))^2= 36%

I like how elegant is the design, I think we can put upper and lower limits to the slashing percentage. @bharvest talked about starting at 1%, which could be a good idea. I would also suggest stopping at 50% or 33% so that the edgy case of 100% slashing can be somehow avoided.

1 Like

Great proposal!

For reference, this is the current design and rationale for correlated/proportional slashing in Eth2.0: https://notes.ethereum.org/@vbuterin/rkhCgQteN?type=view#Slashing-and-anti-correlation-penalties

Here it states that in Eth2.0 the plan is to slash 3*voting power of other validators that got slashed within the length of what seems to be similar to a Cosmos unbonding period (so a linear increase and a long-ish consideration period).

Intuitively; it seems to me that the square root formula might punish correlated slashings for separate validators too hard in comparison to a single high VP validator (though I agree that there should be some factor that discourages validator sybils). Maybe there could be another parameter for single validators (let’s call it j) so that slash_amount = j * power (in Ethereum’s case j appears to be 3).

I also agree with the minimum slash. Otherwise it may become extremely cheap for small validators to double-sign (e.g. currently 0.04% for the 100th validator) and a maximum to avoid 100% slashing (don’t know what a reasonable limit would be here).

While I like the proposal I have some doubts about its real effect.
As we saw in Hyung statistical analisys of delegators, very few of them are aware of the risks of staking all their tokens in just one validator. They do not care at all, in average, about risks.
What they DO really care, acording that great analisys, is about profits.

My opinion is: As long as we do not modify the code in order to alter profits we won’t see any substancial decentralization effects.

Proposal assumes that delegators act rationally, we have seen in practice that they do not. I am in favour of the proposal even if it would not have intended effect as it might help to at least decentralize stake of rational delegators. Explorers should adapt and display information about what % of slashing penalty delegator should expect.

2 Likes

@asmodat We at RNS Solutions are developing explorer with some extra metrics at Antlia explorer.
We will add expected slashing percentage for each validator.
Excellent propsoal again from @sunnya97. Thank you

IMO this proposal is PLACEBO :no_mouth:

I don’t think this proposal would have the effect of flattening the voting power distribution. In fact, it might have the opposite effect, and drive further consolidation of stake to a smaller number of large, well capitalized entities.

Increasing slashing penalties for larger validators favors sophisticated and will capitalized entities who can afford to build infrastructures that are very unlikely to fault. Smaller validators are less able to invest in high quality infrastructure and technical operations, and are viewed as higher risk operations. A simple example is the use of HSM based key management, which even in a simple configuration reduces the risk of a double sign by a considerable degree. Many small operators are not able to afford (or choose not) to pay the cost of the physical infrastructure required, and instead use local software signing with plain text keys on disk. Because the risk of a slashing events can me mitigated by the larger operators, even with a higher cost of a fault their relative risk of loss may be lower than a smaller and less sophisticated operator. It is rational to select a 1% risk of a 10% loss over a 3% risk of a 5% loss (numbers for illustration only).

The need to introduce an anti sybil measure further disadvantages small operators. For many reasons it is difficult, perhaps impossible, to reason about the risk of correlated slashing faults between validators. Small operators are more likely to have similar infrastructure, deployed in similar configurations, with identical software stacks. Many small operators do not use hardware based key management, leaving them all vulnerable to similar risks, which will correlate across diverse cloud infrastructures. While small operators can make claims about their infrastructures and operational skills, they are less able to invest in things like 3rd party audits to verify claims. If a delegator can not confidently assess the risk of correlated faults causing larger slashing events, they will assign higher risk to smaller operators.

Finally, large operators will be better able to insure against slashing losses. As markets mature, sophisticated and well funded operators are likely to be able to acquire third party insurance against slashing, negating the increased slashing penalties. The cost structures inherent in offering financial products of this type advantage larger operators, despite larger slashing penalties. It will be more difficult for smaller operators to qualify for and afford such coverage, leaving them less able to compete. Well capitalized entities, such as centralized exchanges can self insure and provide full guarantees against loss. Very few operators other than large centralized exchanges have the available capital to offer meaningful guarantees of this nature.

In addition to the negative consequences of the anti sybil measure, it is also unlikely to actually work. The incremental cost to large operators to split their operation into a number of smaller validators is small, even if they do so on diverse infrastructure. Many large operators already operate hardware in multiple physical locations and spread cloud based operations across multiple providers. In the context of the high operating costs of these entities, the incremental cost to split their operations would be small.

1 Like

Thanks for your work on this, @sunnya97

A few initial questions about your design choices:

  1. Are you relying upon slashing events to drive behavioural changes? This solution only appears to make delegators directly feel the negative externality of centralization in the case that their validator gets slashed.
  2. How can participants understand and manage risk, given a variable slashing rate that may change often and/or very quickly?
  3. I’m having a hard time seeing how this is Sybil-resistant. Is equivocation a correlated fault? ie. if I’m running multiple validators and one of them double-signs, is there an increased likelihood that my other validators will do the same?

I find the different points you are raising interesting.
Given that I am working on a slashing insurance product, maybe I can answer one of them:

DeFi and building products on the blockchain allows to open the doors and let all the validators have access to “sophisticated products”. With the current formula, small validators will actually be advantaged; they would have a lower slashing percentage and even if the cost of the insurance is high compared to larger validators, they would still be advantaged and would be able to offer an insurance at a way lower cost for them than for bigger validators.
The example of the 100th validator being at 0.04% means his slashing percentage would be 0.04% when being slashed alone. When protecting this risk, the gross cost will be lower for the 100th validator than one with 0.5% slashing risk (taking into account that there won’t be a x10 difference between the risk price of different validators). But once the 100th validator moves to a higher position, the gross cost will increase.

So from a level 1 slashing risk perspective, I think the current solution empowers the weak ones.

For level 2 slashing (2 validators being slashed at the same time), I would consider it the small validators responsibility to decorrelate: once they have the means to gain more voting power, they should commit to their role and make sure they can secure the network.
Indeed it comes with more investment but at least these investments have more chances of paying off on the long run than in the current situation.

1 Like

If a small validator’s fault is correlated with a large validator, as I understand it the small validator would be subjected to a large slash.

eg. validator A has 0.1% and validator B has 10%. If they have a correlated fault
(sqrt(0.001) + sqrt(0.1))^2 = 12.1%

This small validator just suffered a very large slash. This seems to be the opposite of the stated goal. Or am I misunderstanding?

:smiley:
Love the way you think. Everytime I am on this forum I get so excited about the COSMOS network.

1 Like

Yes this is true and this math should be forcing small validators to decorrelate.

Basically if this proposal passes and both delegators and validators don’t take it into account (meaning delegators don’t start redelegating to smaller validators and small validators don’t decorrelate) then a case like this might happen.
But I get your point regarding the small validator moving from a very low slashing percentage to a very high one while the big validator sees a smaller change.

1 Like

Hey @sunnya97, curious to know where you’re seeing the gini coefficient of validator voting power.
I just went through my records, and I’m sort of seeing a bit of the opposite ie. consensus power appears to have decentralized somewhat.

I’ve shown the Lorenz curve for April, July, August, September, and now October. Let me know if you’d like to see more in-depth data. Here are the table summaries:

image
image
image
image

As an aside, it appears that only Sikka’s power has been increasing steadily and rapidly. Using gov power charts because gov power is equivalent to consensus power:
image
image

2 Likes

Basically if this proposal passes and both delegators and validators don’t take it into account (meaning delegators don’t start redelegating to smaller validators and small validators don’t decorrelate) then a case like this might happen.

I don’t think it’s been shown that small validators can effectively decorrelate, and even if they could, they have no way to reliably signal this to delegators. Since it is impossible to accurately estimate correlation risk, a rational delegator will evaluate the slashing risk as though validators are highly correlated.

But I get your point regarding the small validator moving from a very low slashing percentage to a very high one while the big validator sees a smaller change.

Given this, the proposal fails to achieve it’s stated intent.

2 Likes

I have to agree with @mattharrop here. It’s difficult not only to decorrelate but also to signal and externalize this. Amongst smaller validators, there are only so many varying infrastructure setups and cloud providers.

Great point. Admittedly, I didn’t actually calculate the Gini change over time. I guess I meant more that there is a high Gini coefficient, rather than necessarily an increasing one. I removed reference to Gini coefficient altogether from the proposal.

1 Like