We are prepared to upgrade the Cosmos Hub to cosmoshub-3 based upon
a. Commit hash: 2f6783e298f25ff4e12cb84549777053ab88749a;
b. The state export from cosmoshub-2 at Block Height 2902000;
c. Genesis time: 60 minutes after the timestamp at Block Height 2902000.
We are prepared to relaunch cosmoshub-2:
a. In the event of:
i. A non-trivial error in the migration procedure and/or
ii. A need for ad-hoc genesis file changes
iii. The failure of cosmoshub-3 to produce two (2) blocks by 180 minutes after the timestamp of Block Height 2902000;
b. Using:
i. The starting block height: 2902000
ii. Software version: Cosmos SDK v0.34.6+ https://github.com/cosmos/cosmos-sdk/releases/tag/v0.34.10
iii. The full data snapshot at export Block Height 2902000;
c. And will consider the relaunch complete after cosmoshub-2 has reached consensus on Block 2902001.
The upgrade will be considered complete after cosmoshub-3 has reached consensus on Block Height 2 within 120 minutes of genesis time.
Hi Gavin, thanks for setting up this proposal. I’ve posted my response on the document. Would like to suggest a longer upgrade time window, something like 2-3 hours, just in case.
Prior to running with and publishing a 5th upgrade proposal and possibly restricting ourselves to a timeline, we should independently work out the process of fully testing the migration and upgrade. As has been initiated by B-Harvest.
I think getting this right is more important than performing the upgrade asap, as it’s beneficial to work out and actually go through a thorough test. We can always go over the steps later on, for a 2nd time, for the proposal’s sake.
I think it’s too early for that proposal. First we need a version that fixes the migration error, as far as I know version 2.0.1 will produce the same error. Also we should probably use a gaia version with an upgraded cosmos-sdk (since there was a security issue in cosmos-sdk v0.37.1).
When we have a version we could use (I’m not sure what the state is) we should probably do a dry-run on a testnet. FrancescoSVC already mentioned that B-Harvest initiated that.
After this is done we could think about a proposal, now seems too early.
@Gavin Sorry I completely missed that reply.
As for feedback:
The Gaia Version which should be [TBD]. (This tripped me up when reading the proposal, hence my first response.)
Also what irks me a bit is the notification of failure. It seems like a centralized approach. Who’s going to post that notification? If it just means that validator operators can signal their status over there, that’s fine I guess, but maybe it should be worded differently.
Agreed, it is a centralized approach. AiB (so Jack, Bez, or Zaki) would likely post the notification. Let’s get thinking about how to decentralize this or at least make this less centralized.
My concern is that there can be a lot of noise in the other channels. I wonder if we could create an off-chain interface for signalling intention based upon voting power.
The testnet with the fixed migration software should be launched today. The proposal should include the testing results and information of the testing process for reference and verifications.
Thanks for continuing to manage this @Gavin. I feel now’s a bit pre-mature to review this in-depth, because many of the details are TBD at the moment.
Here are a few high-level thoughts in the meantime -
1 - The correct s/w version seems to be in flux. The current recommended production version is v0.34.9. However it’s not expected to last long, as another bug has been discovered and a fix is under development.
2 - We should fully test the upgrade process on a testnet, before a vote takes place to upgrade mainnet. This way we can have more confidence in the s/w and our vote. I also suggest following a process like @b-harvest organized.
3 - I feel the upgrade should only be considered successful after maybe 1000 blocks. 1 block isn’t enough. For example, the network hung after a block for many validators during the independent testnet @b-harvest organized. Also, in the past, test nets crashed on a regular basis within approximately the first 30 mins of the network coming online.
Hey @FrancescoSVC, what do you envision us doing that is different from @b-harvest’s initiative? Looking for feedback about how to make this process more predictable for everyone involved.
Quick update: Bez has frozen Gaia’s code at v2.0.3. Hyung (B-Harvest) is organizing a testnet upgrade using that code in the next ~7 hours. Thanks to Chris Remus for those suggestions.
Let’s start thinking about a date for upgrade. I’m thinking Dec 3 or Dec 4, looking for feedback.
You should know that migrating increases the likelihood of equivocation, which currently results in a 5% stake slashing. In other words, there’s an increased risk of double-signing, which results in 5% of the validator’s stake-backing (including delegator stake) being destroyed.
In order to mitigate that risk, ensure that you are using the correct genesis file by comparing its hash with those posted by other validators after the launch of Cosmos Hub 3. If 67% of the voting power is online, you don’t have to have your validator online at genesis–in fact, it is safer to wait until you are ready to launch and taken any necessary precautions.
Note the recovery scenario, for which you should be prepared for in case Cosmos Hub 2 needs to be relaunched.