Eliminate the downtime slash and reduce downtime jail

Logic and impact analysis

fully eliminate downtime slashes

In order to protect liveness, what we should be aiming for is to remove inactive validators from the set (jail them). The slash really just complicates things in the end and does not seem to have a meaningful affect on network liveness, with Osmosis being a good example.

reduce signed_blocks_window to 2500

The current SignedBlcoksWindow is too long and it can be used in time-based exploits of the cosmos hub. Attackers are aware that they have 10000 blocks in order to work toward a halt. If a chain is in a degraded state, blocks are produced more slowly, extending the duration of this window. In the worst case scenario, the window is extended indefinitely because the attacker has taken more than 33% of votepower ofline.

Reduce downtime_jail_duration to 30 seconds

Even 10 minutes is actually too long for a solution with maximum resilience. We want vote power up, provided that it is signing and in consensus with the chain. It should always or almost always take longer than 30 seconds to do an unjailtx anyhow.

Parameters

current:

{
  "signed_blocks_window": "10000",
  "min_signed_per_window": "0.050000000000000000",
  "downtime_jail_duration": "600s",
  "slash_fraction_double_sign": "0.050000000000000000",
  "slash_fraction_downtime": "0.000100000000000000"
}

new:

{
  "signed_blocks_window": "10000",
  "min_signed_per_window": "0.050000000000000000",
  "downtime_jail_duration": "30s",
  "slash_fraction_double_sign": "0.050000000000000000",
  "slash_fraction_downtime": "0.000000000000000000"
}

Edit log:

  • reduce slash_fraction_downrime to zero 10/15/2023
  • ass bites in downtime jail duration 10/15/2023
  • correct amusing typographical error 10/17/2023
  • reduce to only eliminating the downtime slash to ensure it is a clear vote 10/17/2023
  • create a new governance proposal with changes to signed_blocks_window and min_signed_per_window 10/17/2023

Iam in favor of reducing the downtime slash to zero.

I believe reducing the signed_blocks_window to 2500, approximately 4 hours, might exert excessive pressure on certain teams. Mistakes are more likely to occur under heightened time pressure, especially when unexpected issues arise.

Would like to hear a in depth explanation of the time-based exploits you are talking about though.

1 Like

So that’s actually why I want to reduce the slash to zero.

I don’t want that pressure, which we all know has caused double signs.

Do you think that I should modify the proposal to reduce the slash to zero?

The key thing is to improve resilience when some validators aren’t signing. If they are quickly removed from the set, and can quickly rejoin, no big deal. Like you, I believe that the slash here has had unintended negative consequences

@Flo

I have modified the proposal based on your feedback

1 Like

Thank you for modifying it and i understand your point. Validators might still be pressured because they fear loosing delegeation if they get jailed. Probably iam too concerned so other opinions of this topic would be appreciated.

A maintenance mode for operators would probably help with what you described. If you have a problem or a planned maintenance you switch to maintenance mode and dont have to sign while you are in it (needs a time limit) but you wont leave the active set either.

2 Likes

I do not think that decreasing blocks window is a good idea. For a lot of validators and delegators, getting jailed is scary not because it burns some delegators’ tokens, but because it leaves a mark on a validator’s reputation and leaves the validator’s delegators without staking rewards for some time, and 4 hours IMO is way too small for maintenance or for some situation when there’s a datacenter outage or something similar. This would only lead to way more validators getting jailed and I do not think that this is any good and that would solve any of the problems.

I do agree with the rest of the points (or rather, I do not have anything against to say on that).

Proposal-wise, I suggest not combining all of these points in a single proposal, as I am quite sure there would be some people (myself included) that would support part of these points but would disagree with some other ones (so as of myself, if the proposal comes up on chain with all of these topics combined, I’d vote against, despite supporting some of the points).

Also, on a slightly different topic, but related to slashing/blocks window: @jacobgadikian have you investigated the possibility of increasing the blocks window, but also increasing the min_signed_per_window? So for example, having 20k blocks window, but 0.5 min_signed_per_window would leave the validators the same 10k blocks/16h window to fix their nodes, but would jail validators with bad performance more often.
I haven’t investigated the consequences of having something similar on Hub, but on a first sight it seems like a nice idea.

2 Likes

If we did as you describe, it would increase the likelihood of a chain halt, since chain halts are produced by ensuring that 1/3rd are not signing.

So I prefer that we eliminate the slash (remember the proposal is now to eliminate the downtime slash fully) and reduce the signed_blocks_window to like 4 hours. Because then validators who cannot contribute to consensus, aren’t needed to make consensus happen.

If we did as you describe, it would increase the likelihood of a chain halt, since chain halts are produced by ensuring that 1/3rd are not signing.

Please elaborate?
Imagine the following situation: now we have 10k blocks window and min_signed_per_window = 0.05 (so a validator can skip 9500 blocks in a row without going to jail, so around 16 hours of acceptable downtime).
Now imagine if we have 0.5 min_signed_per_window and 19k blocks window.
This way any validator still has 9500 blocks of acceptable downtime, but now if a validator has really bad hardware and skips, let’s say, 70% of all blocks, they are going to get jailed, compared to what we have now. This can be made even better if we increase both min_signed_per_window and blocks window even further.
I do not see any downsides to that, it still allows validators for quite an extended downtime, while punishing those validators who are running on bad hardware and make the chain suffer. What do you think?

So I prefer that we eliminate the slash (remember the proposal is now to eliminate the downtime slash fully) and reduce the signed_blocks_window to like 4 hours.

As I was saying above, I agree with the first part and disagree with the second part. And as I was saying above as well, I strongly suggest making it two different proposals, as validators can support one point here but reject another one, making it unnecessary complicated to decide what to choose if there’s a combined proposal.

1 Like

:clap:

Truly excellent idea and truly excellent explanation. I’ve been working about 16 hours a day since amulet did the whole publishing against my will thing, but your concept is better and I’ll get it in tomorrow.

About separate proposals, do you mean:

  1. adjust downtime slashing to zero
  2. adjust signed_blocks_window and min_signed_per_window?

I think I am basically done w/thinking for today. Great explaination.

About separate proposals, do you mean:

  1. adjust downtime slashing to zero
  2. adjust signed_blocks_window and min_signed_per_window?

Sounds nice. One thing: I think that we need way more validators to chime in here though to provide their thoughts on that, before even considering putting it on chain, as for now it’s only 3 people discussing the topic that could be quite life-changing for the whole active set.

1 Like

Ah so, this will be here for a week and then I intend to put it on chain.

We can’t help that they choose not to participate and it’s not hard to check the forum.

@lexa and the hypha crew even made sure there’s an app for mobile phones even. Validators should install it.

At some point, I imagine that delegators will figure that kind of thing out and those of us who actually discuss things here and in other places will rise up.

The silent validators decision to stay silent and not participate must not stop the hub from moving forward, or worse, place it at risk of halting.

1 Like

I kinda like the idea of eliminating downtime slash to 0. Slashing event leaves the mark on validator either way, delegators punishment always sounded like “too much” for the case.
Also support vision of @freak12techno about raising % of signed blocks along with raising signed_blocks_window.

100 that should go on chain separately. 1st prop - slash fraction downtime elimination; 2nd - witndow, min % of signed and jail duration.

the only thing i willing to mention - i’d rather stick to ~8-10 hrs of allowable continuous downtime, as it’s an optimal time for human to sleep, so if we say that something happened to you node when you just went to sleep - you still would be able to wake up and do something before you slash. (talking as person from smaller team with 1-2 admins)

so my vision on slashing conditions would be

“signed_blocks_window”: “12000”,
“min_signed_per_window”: “0.500000000000000000”

that would give around 10hrs of continuous downtime assuming that you signed everything before “something” happened to you node/s

1 Like

I’d say 8h is still too small. Imagine if you go to sleep at 00:00 and wake up at 08:00, then your node goes off at let’s say 00:05, you wake you at 08:00, see that your node is down and only have 5 minutes to avoid getting jailed, and it’s almost physically impossible to sync a node’s 8 hours of lag in 5 minutes or move to another server, resulting once more in getting jailed.
IMO it should be at least 12h, so a validator can be more relaxed in getting their stuff node, as it’s crucial to avoid doing any mistakes here (that can lead to a double sign sometimes, which is extra hurtful for a validator and for the chain in general).

100 that should go on chain separately. 1st prop - slash fraction downtime elimination; 2nd - witndow, min % of signed and jail duration.

Actually jail duration is IMO not connected directly to the other points, so I also suggest having it changed in a different proposal, as well as slashing factor. min_signed_per_window and blocks_window should go together though, as they both affect the validator’s allowed downtime. So ideally I’d love to see the following proposals on chain:

  1. slash_factor_downtime to zero
  2. downtime_jail_duration to 30s
  3. min_signed_per_window and blocks_window as agreed upon in either this or a different thread.
1 Like

Ok man, you’re good at this I’ll keep it up…

Downtime jail prop to foum SoonTM

Thanks for developing an idea with human nature of operators, i kinda agree with you on that direction.

The only thing i’m thinking of now is that original Jacob’s idea of making chain more resilient to halts and more competitive to validate becomes kinda neglected if we just turn off punishment for downtime slashing and lower allowable downtime from 16 to only 12 hrs… Of course there still a change in %-signed which kinda balances that assumption, but i still have feeling that something is not aligned with original intention :thinking:

What about the props and changes separation - id say changing downtime jail duration in the separate prop is more of putting a good face without the need, just will double the amount of deposit needed. but whatever.

1 Like

What about the props and changes separation - id say changing downtime jail duration in the separate prop is more of putting a good face without the need, just will double the amount of deposit needed. but whatever.

I have a feeling that these two (downtime_jail_duration and slash_factor_downtime) are not tied together that much, unlike min_signed_per_window + blocks_window, and people may vote differently on these, while if they agree with one and not another, having these params changed in the same proposal would make them either vote yes (and accepting the change they do not want) or no (and rejecting the change they do want) or abstain (and not contributing to the tally of either). Real life non-crypto example: in 2020 we had the elections in my country for changing the main law of the country, having both really good changes and really bad ones removing all freedom of speech and other things, and people voted yes for good things, eventually accepting it and also accepting all the bad parts. This is something I’d love to avoid here, and splitting them into different proposals would make it more fair.

1 Like

Know how you feel bro, agree, should go separate.
What do you think of other point where we kinda loosing original intention with such mild adjustments?

Tbh I do not see a problem here, the amount of validators (or rather the voting power of them) who needs to go down for a chain to degrade or halt is quite big. At least all the chains I am validating have 16h or more of allowed downtime (the only exception is Decentr with 8h) and they work quite okay.
I’ve seen an issue though when some validators (especially on chains that are resource intensive, like the Hub) are constantly skipping a lot of blocks and make the chain slower, and what I suggested should jail these validators more often, so they’d be more motivated to move to a better hardware.

Also, for me I’d rather make small steps towards it, as these changes initially proposed are gonna influence the validators experience in quite a big way. I’d say what I am proposing (increasing both params so the allowed downtime is quite the same) is quite a no-brainer as it doesn’t change anything mostly for validators except it punishes the validators who are validating on a bad hardware, so I suggest having this here and seeing later how it would influence the active set, whether the initial problem is still there and whether we want to make it even harder for validators and if it’s worth it.

Will there be another thread for min_signed_per_window and blocks_window? Or should we continue to discuss here?

We have this one: Adjust min_signed_per_window to 80% - #2 by freak12techno, we can follow up there I guess. I don’t mind either staying here or moving to that one or having another thread.

Reducing downtime is a terrible idea. For example, we are in the process of fully moving to bare metal. In makes 0 economic sense to reduce downtime at this stage. Those measures are economical. They are free market tools to regulate competition and promote good work. At this stage the hub does not need it IMO. Easy proof - amount of nodes outside the active set. In other words, the demand to become a hub validator is higher than the supply available, even with this parameters.

1 Like