Syncing stopped; panic: Unkown pub key type


#1

So earlier today I got a message from my monitor idicating that the syncing stopped for a given sentry. Normally I SSH into the machine restart gaiad and its working again.

This time that didn’t work. When checking the logs It seems to got stuck at block 302207, which eventually ends up in a panic causing gaiad to stop.

Log looks like this: Sep 13 15:50:01 doghouse2 systemd[1]: Started Cosmos Gaia Sentry 1. Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.011] Starting ABCI with Tendermint module=main Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.170] Starting multiAppConn module=proxy impl=multiAppConn Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.170] Starting localClient module=abci-client connection=query impl=localClient Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.170] Starting localClient module=abci-client connection=mempool impl=localClient Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.170] Starting localClient module=abci-client connection=consensus impl=localClient Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.170] ABCI Handshake module=consensus appHeight=302206 appHash=72F5882F762B55DFB119E015BA31A7F1 Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.170] ABCI Replay Blocks module=consensus appHeight=302206 storeHeight=302207 stateHeight=302206 Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.170] Replay last block using real app module=consensus Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.237] Absent validator 2E6AF0D5B1A85E173F5EBEDA93D7E7D97A88D06C at height 302207, 8216 signed, threshold 5000 module=x/slashi Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.253] Absent validator 398610C4CF11C84C89AC6975470972EE75DA17E4 at height 302207, 9581 signed, threshold 5000 module=x/slashi Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.305] Absent validator B0155252D73B7EEB74D2A8CC814397E66970A839 at height 302207, 4512 signed, threshold 5000 module=x/slashi Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.307] Absent validator B051AF0D7327EEAFAF6450DB698B7B92499273AA at height 302207, 9813 signed, threshold 5000 module=x/slashi Sep 13 15:50:02 doghouse2 gaiad[10304]: I[09-13|13:50:02.329] Absent validator EC67CD8E0F5ECD9FAE10918EE950D803F9324F1D at height 302207, 6592 signed, threshold 5000 module=x/slashi Sep 13 15:50:02 doghouse2 gaiad[10304]: panic: Unknown pubkey type Sep 13 15:50:02 doghouse2 gaiad[10304]: goroutine 1 [running]: Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/x/slashing.BeginBlocker(0x10a9b80, 0xc42175e5a0, 0xc4217534c0, 0x8, 0xc421758f40, 0x14, 0x20, 0xc422139400, 0x9, Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/x/slashing/tick.go:34 +0x6ec Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker(0xc420a14000, 0x10a9b80, 0xc42175e5a0, 0xc4217534c0, 0x8, 0xc421758f40, 0x1 Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:139 +0xea Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker-fm(0x10a9b80, 0xc42175e5a0, 0xc4217534c0, 0x8, 0xc421758f40, 0x14, 0x20, 0x Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:110 +0xba Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock(0xc420906180, 0xc421758f40, 0x14, 0x20, 0xc422139400, 0x9, 0x49c7f, 0x39b1731b, 0x Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/baseapp/baseapp.go:386 +0x237 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync(0xc420082f60, 0xc421758f40, 0x Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:206 +0x9e Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync(0xc421656650, 0xc421758f40, 0x1 Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:69 +0x6b Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x10aa740, 0xc421667da0, 0x10af640, 0xc4216566 Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:190 +0x528 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc4226d6308, 0xc422138f40, 0x9, 0x49c Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:76 +0x12f Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*Handshaker).replayBlock(0xc4209e9e00, 0xc422138f40, 0x9, 0x4 Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/replay.go:414 +0x23a Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(0xc4209e9e00, 0xc422138f40, 0x9, 0x Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/replay.go:345 +0x7f2 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc4209e9e00, 0x10b42a0, 0xc420104c80, Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/replay.go:246 +0x426 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy.(*multiAppConn).OnStart(0xc420104c80, 0xc421ce0540, 0x15) Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy/multi_app_conn.go:108 +0x527 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/libs/common.(*BaseService).Start(0xc420104c80, 0x0, 0x0) Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/libs/common/service.go:130 +0x3bd Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/node.NewNode(0xc42083d440, 0x10ac100, 0xc420992140, 0x109eb60, 0xc421329 Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/node/node.go:187 +0x693 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/server.startInProcess(0xc4201adee0, 0xc4201adf20, 0x1d, 0x0, 0x0) Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/server/start.go:98 +0x320 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/server.StartCmd.func1(0xc4209838c0, 0xc42003d180, 0x0, 0x1, 0x0, 0x0) Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/server/start.go:38 +0xaa Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/spf13/cobra.(*Command).execute(0xc4209838c0, 0xc42003d160, 0x1, 0x1, 0xc4209838c0, 0xc42003d16 Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/spf13/cobra/command.go:698 +0x46d Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420982900, 0xdbf2c0, 0xc420a07e01, 0xc42098a0e0) Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/spf13/cobra/command.go:783 +0x2e4 Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420982900, 0xc42098a0e0, 0xc420a07ed8) Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/spf13/cobra/command.go:736 +0x2b Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/libs/cli.Executor.Execute(0xc420982900, 0x1029a28, 0x2, 0xc4207c82a0) Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/libs/cli/setup.go:89 +0x4e Sep 13 15:50:02 doghouse2 gaiad[10304]: main.main() Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/cmd/gaiad/main.go:38 +0x214 Sep 13 15:50:02 doghouse2 systemd[1]: gaiad.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Sep 13 15:50:02 doghouse2 systemd[1]: gaiad.service: Failed with result 'exit-code'. Sep 13 15:50:05 doghouse2 systemd[1]: gaiad.service: Service hold-off time over, scheduling restart. Sep 13 15:50:05 doghouse2 systemd[1]: gaiad.service: Scheduled restart job, restart counter is at 6208. Sep 13 15:50:05 doghouse2 systemd[1]: Stopped Cosmos Gaia Sentry 1.

Anyone knows how to fix this. I assume it would be best to go back 50 blocks or so and the start syncing again. But afaik thats only possible with a certain backup, which I don’t have.

Atm I don’t see any other option than gaiad unsafe_reset_all, however this feels very wrong as it needs to completely resync the node.

Anyone has any suggestions for a fix?

Thx, Crytter


#2

So, I did a unsafe_reset_all and it started syncing from 0. Yet, it again stopped syncing at block 302207 with a panic: Unknown pubkey type. Only fix I can imagine now is cloning a data folder of a working sentry and replace it with the data folder that results into this error. Still its not a good fix, but im not sure what else I can do.


#3

Are you sure you are running the right version? This would explain why your sentry node isn’t in consensus with the other nodes.


#4

One cause might be if dep ensure wasn’t run


#5

The stuck sentry is running on gaiad 0.24.2-0bf061b7. What do you mean exactly with “One cause might be if dep ensure wasn’t run”. How can I run dep ensure?


#6

run make get_vendor_deps in the SDK repo. That will handle making sure all of the SDK’s dependencies are installed with the correct version.


#7

yes I did that, but with no result. It’s still stuck


#8

I saw this bug on 7005. this bug is caused by Evidence.

Here is block #302207.

Block{
Header{
ChainID: gaia-8001
Height: 302207
Time: 2018-09-13 07:44:05.967930651 +0000 UTC
NumTxs: 0
TotalTxs: 27429
LastBlockID: 964EDFF97CB4FF77F5BC35C001EC1BAD4FE96F03:1:CC3617D7459D
LastCommit: 728F50FC4153CA9F356818B93E080E7C09126E69
Data:
Validators: 47A9259082080A05ED133D2C9603FE99DCD5DB6B
App: 72F5882F762B55DFB119E015BA31A7F1B8C07A85
Consensus: D6B74BB35BDFFD8392340F2A379173548AE188FE
Results:
Evidence: 74E8D63C6C36F3409AD5CA3ECB24F7D4CCCA3608
}#E7F24667DCE11E9D78BA6558E8055DE32C123317
Data{

}#
EvidenceData{
Evidence:VoteA: Vote{27:570B6A755ED3 302207/00/1(Prevote) 104C214A0ED3 73FAAB3BD709 @ 2018-09-13T07:44:03.398900853Z}; VoteB: Vote{27:570B6A755ED3 302207/00/1(Prevote) 882090C3B8CF 20B96D112F7B @ 2018-09-13T07:44:09.828057795Z}
}#74E8D63C6C36F3409AD5CA3ECB24F7D4CCCA3608
Commit{
BlockID: 964EDFF97CB4FF77F5BC35C001EC1BAD4FE96F03:1:CC3617D7459D
Precommits: Vote{0:01F78669F951 302206/00/2(Precommit) 964EDFF97CB4 DB075A707F1C @ 2018-09-13T07:43:57.763348976Z}

Evidence data:

EvidenceData{
Evidence:VoteA: Vote{27:570B6A755ED3 302207/00/1(Prevote) 104C214A0ED3 73FAAB3BD709 @ 2018-09-13T07:44:03.398900853Z}; VoteB: Vote{27:570B6A755ED3 302207/00/1(Prevote) 882090C3B8CF 20B96D112F7B @ 2018-09-13T07:44:09.828057795Z}

I am sure that syncing will stop if a block contains double-sign evidences, but i don’t know why.
and the strange thing is that your error message is different with what I saw on gaia-7005.


#9

Here is what i saw:

I think you probably could backup data from other server which is higher than 302207. and restore to your new sentry.


#10

Thx for the help. Created backup like Melea described and successfully transferred it to the troubled sentry. Now everything works perfectly again


#11

Fix underway.


#12

Hej, thanks for bringing this up and reporting the issue, a hotfix against Tendermint master is underway and will be released later today: https://github.com/tendermint/tendermint/pull/2424


#13

Hi guys,

This bug should have been fixed in v0.24.1 of the SDK.

The panic pasted above starts off with:

panic: Unknown pubkey type 
Sep 13 15:50:02 doghouse2 gaiad[10304]: goroutine 1 [running]: Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/x/slashing.BeginBlocker(0x10a9b80, 0xc42175e5a0, 0xc4217534c0, 0x8, 0xc421758f40, 0x14, 0x20, 0xc422139400, 0x9, Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/x/slashing/tick.go:34 +0x6ec Sep 13 15:50:02 doghouse2 gaiad[10304]: github.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker(0xc420a14000, 0x10a9b80, 0xc42175e5a0, 0xc4217534c0, 0x8, 0xc421758f40, 0x1 Sep 13 15:50:02 doghouse2 gaiad[10304]: /home/nckrtl/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:139

Note the stack trace says the panic is called at github.com/cosmos/cosmos-sdk/x/slashing/tick.go:34.

So if we look there on v0.24.2, we don’t see a panic: https://github.com/cosmos/cosmos-sdk/blob/v0.24.2/x/slashing/tick.go#L34

But at v0.24.0, we do: https://github.com/cosmos/cosmos-sdk/blob/v0.24.0/x/slashing/tick.go#L34

Indeed, this was the bug that was fixed in v0.24.1.

Is it possible the node that paniced was actually running v0.24.0?


#14

Im sure I applied the hotfix asap as it became available. Gaiad version shows 0.24.2-0bf061b7. Perhaps I didn’t run other steps properly like make get_vendor_deps etc. Hard to check as its now running just fine with the backup restored from another sentry. I