Slashing logic for vote signing

I think I learned something today, so I’ll share here. Please tell me if I got it wrong :slight_smile:

The absent vote signing slashing logic behaves different than I would have guessed. You are first evaluated on a fixed window, and then it switches to a rolling window. I think this is because of a tradeoff in the implementation using a bit array and incrementing and decrementing a single counter per validator.

Vote signing slashing works based off of missing a percentage of a total window.

So if the total window is 40,000 votes, you get slashed if you miss 50% of 40,000 (20,000 votes missed). However, the implementation is such that if you miss the first 20,000 votes, you are not slashed until vote 40,000 occurs.

After the initial 40,000 fixed window, then you are evaluated on a rolling window basis. So you could get slashed at 40,001. You do not have to wait until 80,000 to get evaluated again.

The slashing logic simplifies down to:

if num_votes < min_required_votes then slash

Where min_required_votes = 20,000. Link to real code below.

So you can only do that comparison after the initial fixed window has elapsed, but then you can just keep it rolling.

So I think slashing starts fixed window and then switches to rolling window. Due to implementation, there is not short circuit logic in the initial fixed window.

Did I get it right?

5 Likes

Thanks for sharing !

1 Like

I think this analysis is right - I’m not a dev, so may be reading it wrong. However it does open an attack vector - which can be used once - that could be exploited by adversarial validators…

It seems that the slashing takes place on the stakes placed at block 40,000. Therefore there is a window between block 20001 and 39999 where validators could remove their own Atoms, and incentivise unaware others to add theirs in place… resulting in innocents having their Atoms slashed.

For this to happen, would need the unbound time of delegated Atoms < time to complete blocks 20001 to 39999. In some cases it would be obvious that the slashing will occur - 0% uptime since block 1 - but in more marginal cases it could be very difficult to tell…

Edge case maybe, but likely an unintended consequence of the fixed initial window. (Sorry if technical terms are used incorrectly, still learning.)

3 Likes

I’ve been thinking that we should probably use a window based on the blockHeader time delta rather than a block Height based window for deciding when to slash…

Interesting point, as far as I understand the reason why this logic only begins to consider after the minHeight is because before that height you have not been signing whatsoever. This means if we were simply to adapt this logic to exclude the height > minHeight statement then the validator would always get slashed immediately after they bonded (because at their first block of being a validator they will have not met the minimum votes, because they just started!). One way to mitigate this would be to perform a different calculation for new validators at different increments, (aka take percentage calculations). The major downside of this is that this calculation is extremely sensitive to number of computations as it must be called once per block PER validator (aka. 100 times per block)

I ultimately don’t think this is an attack vector because we’ve discussed that fee-rewards/provisions should only begin to be accrued past this liveliness period (40000 blocks in this example) period meaning that if you were to not be signing and then you unbonded your tokens, you would not be at any advantage because you will have not been able to collect any rewards in that period.

1 Like

Essentially, yes. The desired semantic is that a validator which has failed to sign 20,000 out of the last 40,000 blocks which it ought to have signed is slashed.

When there haven’t yet been 40,000 blocks, this is underdefined, and presently we elect to do nothing - to always wait until at least height 40,000 - before slashing anyone. As you identify, there is an unusual edge case where a validator who fails to sign the first 20,000 blocks (or any 20,000 before block 40,000) will not be slashed until block 40,000, even though we know at the point 20,000 are missed that they will have fulfilled the criteria.

The logic is implemented in this way so that unrevoking a validator doesn’t require clearing the signature array (an expensive operation). When a validator is unrevoked and again eligible to sign blocks, we don’t want them to be immediately again slashed and unbonded (which they would be if we did nothing, since they must have missed signatures to have been unrevoked in the first place), so we reset StartHeight - meaning that only after another 40,000 blocks have passed and all their array values have been overwritten will they be again possibly slashed for downtime.

It would still be possible to implement a special check for this edge case: checking the absolute height, and slashing the validator if they have failed to sign 20,000 blocks prior to block 40,000. Doing so, however, would add a small amount of overhead to slashing checks in all future blocks, which is not ideal.

Although possible, I think this is quite unlikely - and the slashing-fraction-for-downtime may be zero initially anyways; missing out on rewards is already a penalty.

Still, I agree that this merits discussion. Perhaps one option would be to implement an explicit “grace period” for some number of blocks where validators are only jailed & unbonded, not slashed, for downtime.

Reading through your post has also led me to think of some other edge cases with validators unrevoking, being pushed out of the validator set, and then being slashed when they come back in, which I do think we need to address… followup coming soon.

2 Likes

This discussion makes me realize I don’t yet grok some key terms related to Validator events:

  • Unbond
  • Revoke/Unrevoke
  • Jail
  • Slash

My current guesses: Slashing seems like a penalty, but doesn’t directly affect your status as a Validator. Slashing may happen in conjunction with an unbond/revoke/jail which use similar logic.

Is there a guide for these Validator events? Are the other events?

Slashing is a penalty to your bonded atoms and in the current implementation always occurs with a revoke transaction. Once revoked you can think of yourself as “jailed” just meaning that once you’ve been revoked you cannot simply send an unrevoke transaction the very next block and have it go through - you must wait a “jailed-period” since the date of being revoked before being let back in via an unrevoke transaction.

1 Like

From my experience, unbond meaning the delegated tokens are not providing voting power to the validator. slashing is the penalty when a validator has done something wrong. You maybe slashed but not revoked. For example, if slashing happens with double signed, the validator may not be revoked. Another example can be slashing over continuous missing blocks. If a validator miss every 1000 blocks, it will be slashed at a certain percentage and it will be revoked when it missed 10000 blocks.

@ajc See explanation of various states here.

Reading through your post has also led me to think of some other edge cases with validators unrevoking, being pushed out of the validator set, and then being slashed when they come back in, which I do think we need to address… followup coming soon.

Follow-up here.

Sorry for the split discussion on Github and the forums; we haven’t yet delineated them as clearly as we’d like. Let’s keep this on Github for now, unless you have a proposal to radically change the slashing semantics.