Findings from Game of Chains and Beyond

lexa · February 8, 2023, 3:26pm

Introduction

Game of Chains has wrapped up and we’re excited to present the results of approximately two months of testing on the technology behind Replicated Security. The findings from Game of Chains are the result of 90 validator teams and many core team members working together to stress-test the code behind Interchain Security.

Many testing methods (model-based testing, code audits, etc) require the dev team to imagine all the things that could go wrong, but there can still be a big gap between a dev team’s imagination and the kind of chaos that can ensue in the real world.

Testnets are run under conditions that are as close to the real world as we can approximate. Instead of running a local testnet on a single machine and simulating various stress conditions in a clean environment, Game of Chains allowed the actual users of the tech to come together and work together on testing the code. Not only did particpiants have to handle the specific scenarios designed for testing, they also had to contend with timeouts, unexpected edge-cases, and interactions that no one could have predicted.

Replicated Security (formerly known as ICS v1) has now been tested via:

Private testnet: A developer testnet used for testing while initial development was happening.
Game of Chains: An incentivized testnet of 90 validator teams in which participants were rewarded for stress-testing and finding bugs.
Replicated Security persistent testnet: An ongoing testnet involving many Cosmos Hub validators and consumer chains looking to onboard onto the Hub.

Bug reports from Game of Chains

Three major bugs were identified throughout this incentivized testnet:

The iterator bug
The parameter oversight
The double iterator bug

The iterator bug

v0.2.1 Game of chains patched release contains fixes for this bug.
An iterator in the codebase was being used incorrectly and returning ‘true’, resulting in the iterator stopping early. We refactored all of the code in the RS module to eliminate the possibility of this type of bug.

The parameter oversight

Info on this issue can be found in: ConsumerAdditionProposal must contain most consumer params (such as consumer unbonding period) #532.
Fixes can be found in: update consumer addition proposal #558.
We realized that we could not test timeout parameters because they were set to two weeks (too long for the testnet conditions) and the parameters can only be modified in a convoluted sequence of governance on both chains (i.e. Provider governance to launch, consumer governance to alter the params). To fix this, we modified the consumer chain code so that all parameters of the consumer chain’s RS module are specified in the consumer chain proposal on the provider chain.

The double iterator bug

Info on this bug can be found in: Fix double iterator bug #605
Fixes can be found in: Refactor: Convert iterators to array getters #596.
While testing the slash throttle, the provider chain crashed and could not be restarted due to the code opening two iterators at once and crashing the database. After this was discovered, we stopped opening two iterators at once in our code, and the Cosmos SDK team fixed the root cause of the bug. We also undertook a general refactor of the codebase to greatly reduce our use of iterators, in favor of slightly less efficient, but less error prone methods of accessing the database.

Non-bug findings

While testnets are a powerful way to find and squash bugs, there are also lots of other benefits to testnets that contribute directly to a successful full release!

A major takeaway from Game of Chains is the amount of coordination required to successfully bring a consumer chain online. Consumer chains need to coordinate with one another about launch timelines, validators need to work together to come online in the same window, and the Hub team needs to support both groups.

We also learned a lot about the range of workable default parameters for consumer chains, such as spawn times and timeout periods. These are parameters that we couldn’t test in any other way - there’s too much randomness in a chain launch to rely on audits or testing without seeing how it goes in a real world environment.

By participating in Game of Chains, dozens of validators got to learn about the infrastructure and technical needs of Replicated Security. This knowledge will lead to smoother launches for consumer chains on the Hub when the time comes. Validators got experience with the following tasks (among others):

Launching consumer chains via governance proposals (both dummy and dev versions of real projects)
Stopping consumer chains via governance proposals
Running relayers for consumer chains
Performing due diligence on consumer chain launch proposals
Getting unjailed after being jailed for downtime

Validators who participated in the incentivized testnet are now more confident about the technology and experienced in the level of coordination required for a successful launch - and they were rewarded for their time and effort.

Replicated Security persistent testnet

Following Game of Chains, Hypha launched the Replicated Security Persistent Testnet to finish testing the slash throttle and provide an ongoing testing environment.

The persistent testnet has only been active for three weeks but already has nearly a dozen validators running and a line-up of consumer chains looking to onboard. You can view the block explorer here.

This testnet builds on the success of Game of Chains and enables real-world consumer chain projects to onboard and test their tech with all the participating Hub validators. This will add even more confidence to Replicated Security features as we get to see them interact with the exact binaries that Hub consumer chains will be launching.

Further issues have been identified in the persistent replicated security testnet:

Spawn timeout
Slashing concerns

Spawn timeout

Info on this issue can be found in: Consumer client created with initial consensus state with wrong timestamp #690
Due to the many different timeouts involved in both IBC (interblockchain communication) and ICS (interchain security), special care needs to be taken to set parameters correctly when starting up a consumer chain.
These issues were identified during testing, and can be avoided by setting parameters correctly in the consumer chain proposal, whch will be checked by Hypha and the Informal team.

Slashing concerns

The issue is described here: Slashing updates in replicated security.
Fixes can be found in: disable consumer initiated slashing #692 .
Relying solely on a consumer chain’s transmitted slash packets means that malicious code on a consumer chain could endanger the provider’s security by slashing too many validators at once. While development continues on a system for a provider to verify misbehaviour, we have removed the consumer chain’s ability to slash validators directly. Instead, only jailing will be possible.

Conclusion

Game of Chains built on the success of previous incentivized testnets, with two months and 90 validator teams dedicated to testing Interchain Security under near real-world conditions. That work has also gone towards the persistent Replicated Security Testnet, where upcoming consumer chains will launch and improve their projects amongst Hub validators.

The testnet work has given validators, consumer chain teams, and Hub developers major insight into the launch process, governance, due diligence, and troubleshooting issues relating to Replicated Security.

We identified and fixed several bugs, as well as logic issues that might have otherwise caused issues in a mainnet launch, such as addressing slashing concerns from the community.

By now, the code behind Interchain Security has undergone several rounds of testing, auditing, and testnet investigation. The draft proposal for the v9 Lambda upgrade (with Replicated Security) has been on the forum since December 2022 and will be going on-chain in February 2023.

We welcome further comments and questions about Replicated Security on the forum post!

Validator reports

Several validators who participated in Game of Chains have also published their own reports on Replicated Security, Game of Chains, and the results.

If we’re missing any, please link them in the comments and we’ll update the post - reporting can only be improved by more diverse voices chiming in.

VK_S16 · February 14, 2023, 4:19am

Thank you for the report.

serejandmyself · February 15, 2023, 4:12pm

Hey @lexa i liek your reports. Do you wanna come on one of our streams, the citizen odyssey format? The idea of the stream is to get people familiar with particular targets and goals of a project / validator / hackaton. For example, we had one for games of chains. I see you do a lot of work for the chain. Maybe we could open this up a bit more on a stream dedicated to the work you do, especially revolving governance, etc

Topic		Replies	Views
[PROPOSAL #187][ACCEPTED] V9 Lambda upgrade (with Replicated Security) Software Upgrade accepted	75	11474	April 25, 2023
[PROPOSAL 77][ACCEPTED] Fund incentives for the Interchain Security Incentivized testnet – Game of Chains – from the Community Pool Community Spend accepted	29	4225	October 18, 2022
Slashing updates in replicated security Conversation	19	2308	August 14, 2023
Cosmos Hub v17.1 Chain Halt - Post-mortem Security	6	1022	June 7, 2024
Preparing for Replicated Security Essays	40	4757	February 15, 2023