[ANN] Tendermint KMS v0.6: multitenancy, concurrent YubiHSM admin, better logging

Tendermint KMS v0.6.0-rc0 is out. Update: the final release is out!

We are presently running it in production here at @iqlusion on both Cosmos Hub and Terra.

Expect a final release some time next week, however in the interim if you are feeling brave we’d love to get feedback about the upgrade before the final release.

This release includes a number of improvements, bugfixes, and new features. Here are some quick highlights:

  • Better logging: the previous release did not provide very useful logging in many respects - it was tricky to get it to log correctly at all, and even when it did, useful information was only visible at the “debug”/verbose loglevel, and even then the information you might care about was buried in debugging details. This release should provide log information you care about at the default loglevel.
  • Better CLI usage/help: the previous release was arcane and did not provide good usage / help, especially compared to something like gaiad. The new CLI help should be much improved over the previous version.
  • Multitenancy: this release supports concurrent operation on multiple Tendermint networks and concurrent connections to validators on the same network. We don’t (yet) recommend you use the latter feature in perpetuity as it has not been well-tested, but it should be useful to temporarily enable before failing over between validators on the same Tendermint network.
  • yubihsm-connector compatibility: the YubiHSM backend now contains an optional yubihsm-server feature which allows the KMS to optionally export a local HTTP service which is compatible with Yubico’s yubihsm-connector service. This can be used in conjunction with either yubihsm-shell or tmkms yubihsm CLI commands to administer YubiHSMs while Tendermint KMS is running.

Please see CHANGES.md for detailed release notes

Upgrade Notes

Below are some items to pay attention to when upgrading from Tendermint KMS v0.5:

state_file syntax changes

The validator state files use an incompatible syntax from Tendermint KMS v0.5.
It has been changed to match the conventions used by the rest of Tendermint,
where integer values are stored in strings rather than JSON integers.

When upgrading, you will need to either delete existing state files
(they will be recreated automatically), or ensure the integer height and
round fields contained within these files are quoted in strings, e.g.
{"height":"123456","round":"0",...}.

Unknown fields now disallowed in tmkms.toml

The previous parser for tmkms.toml ignored unknown attributes in the
config file. This means it would often ignore syntax errors, spelling mistakes,
or attributes in the wrong location when parsing files.

This has been changed to explicitly reject such fields, however please be aware
if your config file contained invalid syntax, it will now be rejected by the
parser and the KMS will no longer boot.

We suggest validating the configuration in a staging or other noncritical
deployment of the KMS in order to ensure your configuration does not contain
accidental misconfigurations which were previously uncaught.

See #282 for more information.

3 Likes

Really great work, 0.0.5 been super stable for now.

We’ve released Tendermint KMS v0.6.0-rc1, which is the last RC we intend to publish before a final release, which will likely happen early next week.

You can view a list of the changes since v0.6.0-rc0 earlier this month, and also the full list of changes since v0.5.0.

Notable changes since the last release are a number of bugfixes and improvements for users of the softsign backend, including a new tmkms softsign command which now contains the keygen subcommand as well as a new import subcommand for converting key formats. Additionally, tmkms.toml now contains a key_format option for the softsign section, allowing it to use either a raw binary key, a key in the priv_validator.json format, or a key in the Base64 format which is now the default for tmkms keygen.

At iqlusion we have just updated our production KMS cluster to this RC without problems. We would also love for some early adopters to test this RC and provide feedback prior to a final release, especially in respect to the softsign changes. Otherwise, expect a final release next week.

4 Likes

The final release of Tendermint KMS v0.6.0 is out :tada:

https://crates.io/crates/tmkms/0.6.0

We are now running this release in production here at @iqlusion without issues.

The main changes from v0.6.0-rc1 are some logging improvements including more information about the height/round/step of blocks being signed, as well as their block IDs, which is useful for understanding the operation of multiple concurrent validators operating on the same network simultaneously.

Full changelog available here: https://github.com/tendermint/kms/pull/329

2 Likes

We’ve released a minor point release to address some issues some people have experienced: v0.6.1. Changelog here:

Notably one user of v0.6.0 encountered double signing (fortunately on a testnet) while operating concurrently with two validators, owing to the handling of <nil> block IDs during PreVotes. Additionally, another validator encountered a brief outage due to a deadlock in the signal handler. Both of these issues should be addressed in this release.

We are running v0.6.1 in production here at @iqlusion on both Cosmos Hub and Terra.

2 Likes

We encountered an outage this morning related to bugs in Tendermint KMS v0.6.1. You can read our postmortem here:

We recommend updating to Tendermint KMS v0.6.3. See the postmortem for more details.