How to upgrade or maintaining in a live mainnet

ping · July 7, 2018, 9:39am

Can we discuss how to upgrade or maintaining a live validators without stoping gaiad.

As we know. stoping service will get atom slashed but we have to upgrade or fix bugs sometimes.

Do you have any ideas?

kwunyeung · July 9, 2018, 9:19am

The mainnet should have a longer downtime threshold for validators to fix the downtime issues. If the validator can bring itself up again during the threshold, it should not be slashed. So you can upgrade software and restart your service in that period. We are expecting to experience more on this on gaia-7000.

We have been talking about some autoscaling idea to keep uptime without restart service in another thread. You may refer to the usage of /dial_peers endpoint here.

ping · July 9, 2018, 2:22pm

thanks
Can we put validator node and sentry nodes into a cluster like kubernetes or docker swarm.
then we can update each instance one by one.
but I am not sure if this will cause double signs

kwunyeung · July 9, 2018, 2:56pm

If you put them inside a kubernetes, I believe you still have to make each pod to have its own IP to connect, can’t treat them as one single node. Using kubernetes is good to deploy and upgrade at once but maintaining them as seperate nodes not be appropriate. Seems @aurel is using kubernetes.

jack · July 9, 2018, 9:59pm

@ping Running in a dynamic environment like that would be difficult. To bring up nodes with full data you would need snapshots of the validator which would be difficult with current kubernetes APIs.

One way to do this would be to have a “warm backup” gaiad that you move the validator key over to after shutting down your running validator.

ping · July 10, 2018, 1:02am

I agree that warm backup is a way.
We will try this later and share with everyone

aurel · July 10, 2018, 5:32am

From my knowledge, liveness slashing will occur after 5000 missed blocks. That should be enough for any kind of update/upgrade.

katernoir · July 10, 2018, 8:40am

I agree that this should be enough for every single validator to update/upgrade. However, what happens if a new update is released and a majority of validators try to update at the same time? In that case the chain will halt, because we go below the threshold, right? This could raise issues if every major update makes the chain halt for a few thousand blocks.

ping · July 10, 2018, 8:51am

if a validator node could missed 5000 blocks without slash, it is enough!

@katernoir yes, that could happen, so it should have a update plan for validators

kwunyeung · July 10, 2018, 11:50am

If validators are diversified enough, they should spread over different time zones and have different maintenance hours. The effects should be minimal as the update should be done in 15mins. Unless individual validator has been delegated with too many tokens.

Or, can the validator temporarily unbond themselves before the upgrade? There are new commands in gaiacli for unbond. Not sure if they are related.

katernoir · July 10, 2018, 12:07pm

I agree that in theory, they should have different maintenance hours. However, I think that large validator will want to upgrade their nodes as fast as possible. Therefore a new release could create unexpected downtime of the network.

Maybe it’s best to communicate scheduled updates between validators. If we coordinate this over the chat/forum, we can prevent downtimes. Or maybe the cosmos team has already figured out a way smarter option to do this

Topic		Replies	Views
List of tools created by validators for validators Validation	6	1913	July 19, 2018
Validator Node Halts with Timeout Error in Sentry Node Architecture on Cosmos Hub Mainnet (Gaia Release 14.1.0) Miscellaneous	1	294	February 8, 2024
Sentry nodes - What they are! - How they work! - Why they exist! Validation	15	11280	July 5, 2018
Make Gaia-7001 run healthier Validation	4	472	July 20, 2018
How to keep the chain data consistent after upgrading with new features? Cosmos-SDK	3	668	August 29, 2018

How to upgrade or maintaining in a live mainnet

Related topics