A formalized process for issuing **emergency** Cosmos SDK software updates

Hi again, Cosmos community! Figment Networks is looking for community consideration and feedback on our draft governance proposal for a formalized process for issuing emergency Cosmos SDK software updates.
Thanks again! :hugs:

2 Likes

As I understand this proposal is only concerning emergency upgrades, which is then described as being a case where a chain halt occurs. I think in this case a clear distinction between “critical” and “emergency” (referred to in this other draft proposal - A formalized process for issuing **critical** Cosmos SDK software updates) is required since they can be interchangeable.

I agree that this edge-case should be pre-emptied.

It could be the case that AiB is not the first organisation to find a solution if such a case occurs, so I think that the proposal should explicitly place AiB as the gate keeper for vetting and issuing the any upgrades that are required in such a case.

I am not sure of where AiB stands with regards to holding this responsibility.

Another option here is to set up a cross-community “emergency team” that is delegated with the vetting and issuance of such upgrades. Then the discussion would revolve around the election, remit and precise power of such a team. Perhaps one could argue that the set of validators already is this team, however a proposal with a clear definition could be devised and delegators could vote in favour or against accordingly.

2 Likes

This proposal and the other AiB specific one seem to have a lot of downside and little upside.

Downsides:

  • Public perception of centralization
  • Possible legal issues for AiB?
  • Unhealthy reliance on AiB

Is this really necessary? Presumably, in an emergency or critical scenario, the top validators would indeed choose to implement a patch from AiB if it solved the problem. A consensus would emerge around this. Or not. Maybe a better option would be written by someone else. We shouldn’t be reliant on AiB.

Chain halts are an interesting subset of emergencies. But even in this case, the problem is better solved by making sure there is a good channel for validators to signal their choices based on last voting power, independent of new blocks being produced.

1 Like

What are the exact lemmas in the liveness guarantee for Tendermint?

This article gives a reasonable rundown of how a BFT deals with the liveness problem because in contrast to Nakamoto consensus, BFT has a strong guarantee of safety with a weak guarantee of liveness.

However, given the 100 nodes starting out in the network that effectively means that a total network stop requires every single one to go offline. If there is two, it will stay live, it just won’t process transactions very fast, I am unsure of the curve this follows, I’d suspect it’s a logarithmic curve, or maybe even factorial. What this means is that long before the whole system has a liveness failure it’s going to get dramatically and alarmingly slower, but it’s not going to just grind to a halt unless the whole internet somehow magically turns off

the odds of this are at least under 1:10! (factorial) based on the number of large trunks connecting between continents, and the number of satellite links, should the scenario be for example a solar flare the likes of which even God has not seen.

Besides all else, the hub grinding to a snail’s pace won’t necessarily kill the network anyway, especially if there is reasonably sized hubs connected to it, and to further mitigate this problem they can also interconnect.

My guess is that as the ecosystem develops, that the hub will not be a central point of failure in the long run anyway. Once enough money is wrapped up in it, probably there will be multiple intersecting hubs and all of them running on slightly different or even fairly different codebases, since they only have to conform with the IBC and other requirements for interoperability between chains and the hub and other hubs.

Hey Jehan, thanks for the feedback.

We’d like to establish advanced directives to use the governance process for any changes to the Cosmos Hub. We don’t want to be prescriptive, just establish a process that reflects the current reality, with the idea being that until there are more core protocol contributors, validators will likely accept AiB’s implementations.

Our plan is to rework this a bit, now that we’ve chatted with Jessy Irwin and experienced a critical update event.

1 Like

We just got to see this play out last week with the vulnerability disclosure. What happened was AiB team created a new Telegram and invited people to it. Some were forgotten.

Two cool ideas came out of it though: 1. ability to send messages (potentially encrypted) to validator addresses. 2. ability to sign a message to vote stake offline.

So the problem is somewhat decomposed to messaging and signaling.

Not sure how best to implement, but formalizing emergency response, even if purely socially, seems useful.

1 Like

Could we also formalize “standard” upgrade procedure not just “emergency” one ? And kill two birds with one stone.

Because upgrade proc is incredibly vaguely set, that is it does not exist. Proposals say X and reality is Y like the last time with hub-1 -> hub-2. Also 3 days notice during weekends for something that supposedly was planned for months with unclear timelines and ways of communication and coordination is definately not a way to go and will discourage a lot of not only individuals but also companies.

3 Likes

My only objection is naming AiB (or any other single entity) in “official” pronouncements.

1 Like