The community consensus is quite clear that using a remote signer is the correct way to configure one’s validator.
Why doesn’t Notional use a remote signer?
human error
I personally call validators after they get slashed. In the case of this most recent slashon the hub, one of them is a dog and so I haven’t called them.
The other is @serejandmyself and I haven’t called them because they have produced very detailed documentation on the incident.
Both were victims of human error – and not using a remote signer.
Using a remote signer goes outside the default flow of comet. In the default flow of comet, there is a signing key in the filesystem, eg:
cd ~/.gaia/config
config % ls -a -l
total 239248
drwxr-xr-x 9 faddat staff 288 Jun 18 04:33 .
drwxr-xr-x 4 faddat staff 128 Jun 16 15:22 ..
-rw-r--r-- 1 faddat staff 1058374 Jun 16 15:30 addrbook.json
-rw-r--r-- 1 faddat staff 9446 Jun 16 15:22 app.toml
-rw------- 1 faddat staff 742 Jun 18 04:33 client.toml
-rw-r--r-- 1 faddat staff 18605 Jun 16 15:22 config.toml
-rw-r--r-- 1 faddat staff 121386449 Jun 16 15:22 genesis.json
-rw------- 1 faddat staff 148 Jun 16 15:22 node_key.json
-rw------- 1 faddat staff 345 Jun 16 15:22 priv_validator_key.json
In fact, this makes adding a remote signer more dangerous.
A number of double signing incidents have actually come from users in the process of adopting a remote signer, forgetting to take the key out of the file system, and then and ending up double signing. this is my terror.. No one is perfect, so we’ve got to design systems that avoid problems like these. It’s happened much more than you’d think.
Because of our current situation, not having a remote signer, I have always made the decision not to adopt it. There are a couple of other things motivating this decision, for example the fact that we operate generally on-premises at our office in Hanoi.
It should not be hard to to eliminate these double signing issues.
This is also the only reason that I can think of to not slash @pupmos and @serejandmyself. They were operating the software by its defaults.
To give a brief run through of what happened:
Both the dog and the robot were running the old version of neutron and had signed the block of the halt. When they changed the binary to the new version of neutron, they did not preserve the priv-validator-state.json file, which would have prevented signing with the new version of Neutron.
So indeed in both cases, this is an incident of human error.
However, I have always been a really big believer in sane defaults. Our current defaults are insane. If we want for remote signers to be used, then naked signing should no longer be an option, and it most certainly should not be the default.
I’m going to suggest that comet have explicit modes of operation, first of all a testing mode, where he’s are simply kept in ram because they don’t need to stick around for a long time. In the case of a long-running test, like the testnet command, maybe we can have a flag or something like --long-running-test. And then it could sign from a key in the file system. However, it is my preference to actually remove that feature altogether.
I would like to thank @zaki_iqlusion for reviewing this concept with me, and invite @marbar, @AdiSeredinschi, @valardragon and other ecosystem technology leads to review this and provide commentary.
motivation
I asked myself: what would get me to adopt a remote signer?
And then it hit me, I had made all of these calls to validators who had gotten slashed in so many of them had gotten slashed in the process of adopting a remote signer. The thing that would make me adopt a remote signer is making that impossible by default. That would mean that we take the signing key out of the file system.
I think that it’s really important to protect production environments and always want to have the most secure performant setup possible. I think that by making the most secure mode of operation the default for production environments, we should be able to fully avoid this issue.
How I came up with this
These are my favorite quotes on security:
Bullet points
- Default behavior should be safe behavior
- currently, signing by default from a naked key in the filesystem means that default behavior isn’t safe
- Given numerous reports of validators getting slashed while adopting remote signers due to human error, I’ve personally been hesitant to adopt a remote signer.
- It seems to me that by making sure that default behavior is safe and sane, we can:
- eliminate the opportunity to double sign while configuring a remote singer
- reduce or eliminate equivocation in cosmos by changing default software behavior to require a remote signer to sign blocks