Testnet Wednesday Reports

We ran our first-ever game day :game_die: on the testnet today!

In this game day, we investigated two new ideas on a brand new chain:

  1. Cancelling a software upgrade proposal using a governance-gated cancel-upgrade proposal
  2. Skipping an upgrade using the --unsafe-skip-upgrades <block-height> flag

We had 39 validators participating today on this new gameday01 chain.

cancel-upgrade

For our first event, we submitted a proposal to upgrade from Gaia v13.0.2 to v14.0.0-rc0 and then cancelled it using a cancel-upgrade proposal which validators had to vote on.

31/39 validators voted successfully on the cancel-upgrade proposal and we avoided upgrading to v14.0.0-rc0.

However – this is not a realistic scenario for the Hub. Because the Cosmos Hub voting period is 2 weeks and we usually upgrade roughly 1 week after an upgrade proposal passes, there is no time for a cancel-upgrade proposal before the upgrade height arrives.

On mainnet, it’s much more practical to just skip the upgrade height.

unsafe-skip-upgrades

For our second event, we pretended that the issue had been resolved and we were ready to upgrade to v14.0.0 via a software upgrade proposal.

As upgrade height approached, validators were informed that this upgrade was faulty as well, and everyone would need to restart their nodes using the --unsafe-skip-upgrades 31156 flag to keep running their node with v13.0.2.

Validators who successfully skipped the upgrade height would continue signing blocks as usual on gameday01, while validators who did not skip the upgrade height would apphash and stop signing. Of the 40 validators, 34 successfully applied the flag in time and were able to keep signing without interruption.

In a real-world scenario, it is perfectly fine for a some validators to halt so long as > 2/3 of the set successfully applies the flag.

Lessons learned

  • 39 validators participating
  • 23 validators voting on prop #1 (not a TIP criteria because we didn’t announce it in time :sweat_smile:)
  • 31 validators voting on prop #2 (TIP criteria)
  • 34 validators voting on prop #4 (TIP criteria)
  • 34 validators still signing after the upgrade height
  • Exactly 1 hour event run-time :tada:
  • Shout outs to James from Lavender.Five for snapshots, Benjamin from Blockscape for helping identify a potential cosmovisor bug now being investigated here.).

Some useful things that we learned along the way:

1. A list of voters can be obtained even for a proposal that is no longer active using a block height:

REST API:
curl “<API URL>/cosmos/gov/v1beta1/proposals/<proposal id>/votes?pagination.limit=1000” -H “x-cosmos-block-height:<height>”

Gaia CLI:
gaiad q gov votes <proposal id> --height <height>

2. For unusual proposal types, sometimes block explorers don’t render the information correctly.

We were using an old version of Ping Pub to display the cancel-upgrade proposal , but this issue is fixed in the newer Ping Pub. Still – it’s important to know how to view proposal info using the CLI as well!

3. How would this work on mainnet?

Testnet is a critical place not only for testing code, but working out best practices and emergency situations.

On the Hub, if a critical bug is detected in an upcoming upgrade, Hypha’s recommendation is:

  1. Communicate to all validators to restart their nodes using the --unsafe-skip-upgrades <block-height> flag. This information should be pushed via the Discord #validators-verified channel, the email list, and personal Telegram, Discord, and Signal contacts.
  2. As upgrade height approaches, advise validators to take snapshots of their nodes before the upgrade height. In case something goes wrong, having a recent snapshot can be critical to recovering the chain.
  3. After upgrade height, validators who have successfully continued signing should take another snapshot to share with the network if necessary.

Snapshots can be incredibly resource intensive, so we would also recommend that validators run more powerful machines as upgrade height approaches.

TIP period 2 ends

This Testnet Wednesday also marks the end of TIP Period 2. A more thorough report will be posted to the Testnet Incentive Program thread as payments are sent out :slight_smile:

As before, eligibility can be tracked here.

2 Likes