The mainnet should have a longer downtime threshold for validators to fix the downtime issues. If the validator can bring itself up again during the threshold, it should not be slashed. So you can upgrade software and restart your service in that period. We are expecting to experience more on this on gaia-7000.
We have been talking about some autoscaling idea to keep uptime without restart service in another thread. You may refer to the usage of /dial_peers
endpoint here.