[ANN] Tendermint KMS v0.2: Validator Signing Support


#6

This will get autocreated when it doesn’t exist (we should probably add a full tmkms init command ala gaiad though).

Regarding this whole section of the config:

[[providers.softsign]]
id = "gaia-8000"
path = "path/to/signing.key"

This is a testing-only software backed signer in the event you want to test tmkms but don’t have a YubiHSM2. If you do, you can simply delete this whole section of the config.

If you do actually want to generate a software-backed key though, the command is tmkms generate path/to/signing.key

The command to run a test is: tmkms yubihsm test 1 (you had an extra keys there)


#7

Thanks! Everything works well.


#8

This will get autocreated when it doesn’t exist

@iqlusion I am setting this to test with a local testnet. Where the secret key will be created?

Thanks!


#9

The secret key will be created at the path you specify, provided the parent directories exist.


#10

Ar… I see! Thank you! That’s very clever.


#11

Got it working, thanks for the instructions!

Just want to be sure that it is intentional that the secret_connection file contains all zeroes?

xxd /opt/tmkms/config/secret_connection.key
00000000: 0000 0000 0000 0000 0000 0000 0000 0000 …
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 …

BR,
Martin


#12

That is definitely NOT intentional, and is a high-severity bug. I have opened an issue here:

https://github.com/tendermint/kms/issues/118


#13

Yes I get the same.

BTW, what capabilities are required for tmkms? I want to create an authkey which only have enough capabilities for gaiad.


#14

I have reproduced this problem on Linux. The problem does not seem to occur on macOS. I am continuing to investigate.

The only capability presently needed is asymmetric_sign_eddsa, however this may change as tmkms evolves.

I plan on writing a (customizable) provisioning tool that can generate a multi-level account hierarchy including administrative, operational, auditing, and application roles. When that happens, it will consider the full set of capabilities needed for each role.


#15

I’ve put out a small point release, v0.2.1, which should hopefully address all known bugs, including one for another feature in the original release which was buggy so I didn’t announce it:

priv_validator.json import support

If you’ve already registered a key in the genesis.json for an upcoming testnet, such as gaia-9002, you can import it into a YubiHSM2 using the new tmkms yubihsm keys import command:

$ tmkms yubihsm keys import --path ./priv_validator.json 9002

This will import the Ed25519 key from the ./priv_validator.json file into slot 9002 of the configured YubiHSM2 (you will need to have tmkms.toml configured in advance).

Secret Connection key mini-post-mortem

A quick aside: the format for SecretConnection keys (i.e. secret_key under the [[validator]] section) is now Base64. This means any previously generated keys will not load and are unusable, but that’s ok because any keys generated by a release build were all zeroes! If you see an error like this, please delete the offending key to ensure it’s regenerated:

config error: error loading SecretConnection key from tmkms.key: invalid encoding

The root cause of the issue was a bug in the subtle-encoding library used to provide constant time encoding/decoding of secret keys. This library contained code gated on a debug assertion (i.e. debug_assert_eq!) which performed the encoding/decoding operations was compiled out of release builds:

https://github.com/iqlusioninc/crates/pull/126

The bug has been fixed, and the CI configuration for this repository has been updated to run tests in release mode. I have confirmed that running the tests in release mode would’ve caught this particular bug:

https://travis-ci.org/iqlusioninc/crates/jobs/459482729


#16

Quick status report on attempting to use tmkms on gaia-9002:

We had to squash a few bugs around things like key encodings, but we managed to have tmkms live for a little bit.

We encountered a bug which appears to be a blocker for using tmkms on a testnet for the time being:

https://github.com/tendermint/kms/issues/130

It looks like it shouldn’t be difficult to resolve though. Hopefully we can have another point release out soon, along with instructions for how to enroll a tmkms-backed validator into a testnet.


#17

I’ve released tmkms v0.2.2 which, to my knowledge, addresses all remaining known bugs.

We’ve been using it live on gaia-9002 and have so far it has produced over 30,000 signatures.


pinned #18

#19

Tony, what should be the settings for both gaiad and tmkms if both of them are running on the same machine?


#20

There are a couple options for localhost usage. When configuring tmkms.toml:

[[validator]]
addr = [...] 

You can use either TCP or a Unix domain socket to talk to gaiad running on the same host:

  • Pick a port (e.g. 12345) and use tcp://127.0.0.1:12345 or tcp://localhost:12345. This will use local TCP/IP loopback
  • Use a Unix domain socket: unix:///path/to/socket

When configuring gaiad’s ~/.gaiad/config/config.toml, make sure to use the same value for priv_validator_laddr


#21

Thank you! They connected to each other correctly now.


#22

When I configure the line

priv_validator_laddr = "tcp://localhost:26666"

in config.toml, gaiad starts up with this error:
I[7126-12-07|20:42:34.593] Starting ABCI with Tendermint module=main E[7126-12-07|20:42:37.737] OnStart module=privval err="accept tcp 127.0.0.1:26666: i/o timeout" ERROR: Error with private validator socket client: failed to start: accept tcp 127.0.0.1:26666: i/o timeout

I’m on an ubuntu 18.04 host. gaiad 0.27.1

Any ideas?

EDIT: Problem solved, but brings up a question. The issue was starting things in the wrong order. Starting the KMS first solves the problem. Question – is it intended that gaiad exits if the KMS isn’t online?


#23

I tried stopping KMS after gaiad and KMS are connected. Gaiad will blame not seeing the remote signing service. If I start KMS again, gaiad won’t connect it back.


#24

If you run both processes under supervisors (e.g. systemd), the ordering doesn’t matter as KMS will continue trying to connect, and gaiad will keep restarting waiting for a KMS connection.

That said, the timeouts right now on both sides are pretty bad and not presently adjustable. I think KMS should try to reconnect more frequently, and gaiad should wait longer for a KMS connection (and ideally start asynchronously while it waits for KMS to connect)

I suspect it was this bug, which should be fixed on the develop branch:

https://github.com/tendermint/tendermint/pull/2876


#25

Hello, i have a question:

Tendermint’s height increased with time, then i did three operations:
1、generate softsign.key, and corresponding amendments to tmkms.toml of tmkms.
2、use tests/support/secret_connection.key as secret_key in tmkms.toml of tmkms.
3、change priv_validator_laddr of tendermint and then they connect each other success.

Then i discovered the height of tendermint stop increasing, i do not know why?

Is that all right? Do i have anything else to do?