Cloud Platform peering issues

I’ve having ongoing issues with nodes hosted on google cloud. They work, but they do not collect more than a handful of peers. Peer connection drop frequently. The only thing I can think of that makes GCP different from the simple cloud providers is the external <-> internal address mapping. On the GCP side I have external_address set to the external IP, and laddr set to the internal IP.

I isolated a typical peer session in the logs from both ends, one on GCP and one on OVH, a cloud provider that terminates real IP numbers directly on the instance. This same snippet repeats in the logs constantly. I’m not certain that this is a tendermint issue, but it seems likely to me. Is anyone else seeing an issue like this?

instance-1 hosted on GCP (external <-> internal IP)

Jul 18 15:19:30 instance-1 sh[2422]: I[07-18|15:19:30.869] Will dial address                            module=p2p addr=8a2f84e18c48ed7749d717a245850d92f3361086@144.217.247.181:26656
Jul 18 15:19:30 instance-1 sh[2422]: I[07-18|15:19:30.870] Dialing peer                                 module=p2p address=8a2f84e18c48ed7749d717a245850d92f3361086@144.217.247.181:26656
Jul 18 15:19:30 instance-1 sh[2422]: I[07-18|15:19:30.955] Successful handshake with peer               module=p2p peer=144.217.247.181:26656 peerNodeInfo="NodeInfo{id: 8a2f84e18c48ed7749d717a245850d92f3361086, moniker: netfix.sentry5, network: gaia-7001 [listen 144.217.247.181:26656], version: 0.22.2 ([amino_version=0.10.1 p2p_version=0.5.0 consensus_version=v1/0.2.2 rpc_version=0.7.0/3 tx_index=on rpc_addr=tcp://0.0.0.0:26657])}"
Jul 18 15:19:30 instance-1 sh[2422]: I[07-18|15:19:30.956] Starting Peer                                module=p2p peer=144.217.247.181:26656 impl="Peer{MConn{144.217.247.181:26656} 8a2f84e18c48ed7749d717a245850d92f3361086 out}"
Jul 18 15:19:30 instance-1 sh[2422]: I[07-18|15:19:30.956] Starting MConnection                         module=p2p peer=144.217.247.181:26656 impl=MConn{144.217.247.181:26656}
Jul 18 15:19:30 instance-1 sh[2422]: I[07-18|15:19:30.956] Added peer                                   module=p2p peer="Peer{MConn{144.217.247.181:26656} 8a2f84e18c48ed7749d717a245850d92f3361086 out}"
Jul 18 15:20:49 instance-1 sh[2422]: E[07-18|15:20:49.679] Stopping peer for error                      module=p2p peer="Peer{MConn{144.217.247.181:26656} 8a2f84e18c48ed7749d717a245850d92f3361086 out}" err="error with peer 8a2f84e18c48ed7749d717a245850d92f3361086: peer did not send us anything"
Jul 18 15:20:49 instance-1 sh[2422]: I[07-18|15:20:49.679] Stopping Peer                                module=p2p peer=144.217.247.181:26656 impl="Peer{MConn{144.217.247.181:26656} 8a2f84e18c48ed7749d717a245850d92f3361086 out}"
Jul 18 15:20:49 instance-1 sh[2422]: I[07-18|15:20:49.679] Stopping MConnection                         module=p2p peer=144.217.247.181:26656 impl=MConn{144.217.247.181:26656}
Jul 18 15:20:49 instance-1 sh[2422]: E[07-18|15:20:49.679] MConnection flush failed                     module=p2p peer=144.217.247.181:26656 err="write tcp 10.229.0.2:50110->144.217.247.181:26656: use of closed network connection"
Jul 18 15:20:49 instance-1 sh[2422]: I[07-18|15:20:49.740] Stopping gossipDataRoutine for peer          module=consensus peer="Peer{MConn{144.217.247.181:26656} 8a2f84e18c48ed7749d717a245850d92f3361086 out}"
Jul 18 15:20:49 instance-1 sh[2422]: I[07-18|15:20:49.742] Stopping gossipVotesRoutine for peer         module=consensus peer="Peer{MConn{144.217.247.181:26656} 8a2f84e18c48ed7749d717a245850d92f3361086 out}"
Jul 18 15:20:51 instance-1 sh[2422]: I[07-18|15:20:51.142] Stopping queryMaj23Routine for peer          module=consensus peer="Peer{MConn{144.217.247.181:26656} 8a2f84e18c48ed7749d717a245850d92f3361086 out}"

server-1 hosted on OVH (real IP)

Jul 18 15:19:28 server-1 sh[3181]: I[07-18|15:19:28.783] Successful handshake with peer               module=p2p peer=35.203.24.129:50110 peerNodeInfo="NodeInfo{id: bce60f8981d46981f975609fa233c4922694bf84, moniker: gcp.priv1, network: gaia-7001 [listen 35.203.24.129:26656], version: 0.22.2 ([amino_version=0.10.1 p2p_version=0.5.0 consensus_version=v1/0.2.2 rpc_version=0.7.0/3 tx_index=on rpc_addr=tcp://0.0.0.0:26657])}"
Jul 18 15:19:28 server-1 sh[3181]: I[07-18|15:19:28.786] Starting Peer                                module=p2p peer=35.203.24.129:50110 impl="Peer{MConn{35.203.24.129:50110} bce60f8981d46981f975609fa233c4922694bf84 in}"
Jul 18 15:19:28 server-1 sh[3181]: I[07-18|15:19:28.786] Starting MConnection                         module=p2p peer=35.203.24.129:50110 impl=MConn{35.203.24.129:50110}
Jul 18 15:19:28 server-1 sh[3181]: I[07-18|15:19:28.787] Added peer                                   module=p2p peer="Peer{MConn{35.203.24.129:50110} bce60f8981d46981f975609fa233c4922694bf84 in}"
Jul 18 15:20:47 server-1 sh[3181]: E[07-18|15:20:47.537] Connection failed @ recvRoutine (reading byte) module=p2p peer=35.203.24.129:50110 conn=MConn{35.203.24.129:50110} err=EOF
Jul 18 15:20:47 server-1 sh[3181]: I[07-18|15:20:47.538] Stopping MConnection                         module=p2p peer=35.203.24.129:50110 impl=MConn{35.203.24.129:50110}
Jul 18 15:20:47 server-1 sh[3181]: E[07-18|15:20:47.538] Stopping peer for error                      module=p2p peer="Peer{MConn{35.203.24.129:50110} bce60f8981d46981f975609fa233c4922694bf84 in}" err=EOF
Jul 18 15:20:47 server-1 sh[3181]: I[07-18|15:20:47.538] Stopping Peer                                module=p2p peer=35.203.24.129:50110 impl="Peer{MConn{35.203.24.129:50110} bce60f8981d46981f975609fa233c4922694bf84 in}"
Jul 18 15:20:47 server-1 sh[3181]: I[07-18|15:20:47.558] Stopping gossipDataRoutine for peer          module=consensus peer="Peer{MConn{35.203.24.129:50110} bce60f8981d46981f975609fa233c4922694bf84 in}"
Jul 18 15:20:47 server-1 sh[3181]: I[07-18|15:20:47.588] Stopping gossipVotesRoutine for peer         module=consensus peer="Peer{MConn{35.203.24.129:50110} bce60f8981d46981f975609fa233c4922694bf84 in}"
Jul 18 15:20:48 server-1 sh[3181]: I[07-18|15:20:48.144] Stopping queryMaj23Routine for peer          module=consensus peer="Peer{MConn{35.203.24.129:50110} bce60f8981d46981f975609fa233c4922694bf84 in}"
1 Like

I also see the “use of closed network connection” errors on our sentries on GCP, occurring every few minutes.

Two things to note:

When connecting to a peer running with --p2p.seed_mode, the peer will disconnect immediately after sending some peers and we’ll see this as a EOF. We should do a better job of sending an error message so this is less mysterious.

When connecting to a peer that already has the maximum number of connections, the peer will disconnect and we’ll see EOF.

Could explain some of what’s happening here.

I’m having the same issue. When I restart the validator, I end up with a lot of
MConnection flush failed … use of closed network connection
which persists until I restart the sentry nodes.

Sentry 1 has a public IP
Sentry 2 and the validator are behind NAT

Sentry 1

grpc_max_open_connections = 900
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 5
persistent_peers = "cccccccccccccccccccccccccccccccccccccccc@1.22.32.42:26656,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb@1.21.31.41:26656"
addr_book_strict = false
max_num_inbound_peers = 10
max_num_outbound_peers = 5
pex = true
seed_mode = false
private_peer_ids = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
max_open_connections = 3

Sentry 2

grpc_max_open_connections = 900
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 5
persistent_peers = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@1.2.3.4:26656,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb@1.21.31.41:26656"
addr_book_strict = false
max_num_inbound_peers = 10
max_num_outbound_peers = 5
pex = true
seed_mode = false
private_peer_ids = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
max_open_connections = 3

Validator

grpc_max_open_connections = 900
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 5
persistent_peers = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@1.2.3.4:26656,cccccccccccccccccccccccccccccccccccccccc@1.22.32.42:26656"
addr_book_strict = false
max_num_inbound_peers = 3
max_num_outbound_peers = 0
pex = false
seed_mode = false
private_peer_ids = ""
max_open_connections = 3

Same issue here… After restarting the validator, I get only
MConnection flush failed … use of closed network connection
until I restart the sentry nodes.

Sentry 1 has a public IP
Sentry 2 and the validator are behind NAT

Sentry 1

grpc_max_open_connections = 900
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 5
persistent_peers = "cccccccccccccccccccccccccccccccccccccccc@1.22.32.42:26656,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb@1.21.31.41:26656"
addr_book_strict = false
max_num_inbound_peers = 10
max_num_outbound_peers = 5
pex = true
seed_mode = false
private_peer_ids = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
max_open_connections = 3

Sentry 2

grpc_max_open_connections = 900
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 5
persistent_peers = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@1.2.3.4:26656,bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb@1.21.31.41:26656"
addr_book_strict = false
max_num_inbound_peers = 10
max_num_outbound_peers = 5
pex = true
seed_mode = false
private_peer_ids = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
max_open_connections = 3

Validator

grpc_max_open_connections = 900
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 5
persistent_peers = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@1.2.3.4:26656,cccccccccccccccccccccccccccccccccccccccc@1.22.32.42:26656"
addr_book_strict = false
max_num_inbound_peers = 3
max_num_outbound_peers = 0
pex = false
seed_mode = false
private_peer_ids = ""
max_open_connections = 3