There is an ongoing issue that impacts any node that runs in an environment where the local IP address of an instance does not match it’s public IP address. This is the case with Google Cloud and AWS, where instances always have an RFC1918 IP address which is mapped to a public IP address. Gaiad nodes running on GCP/AWS instances never get dialed, and are unable to maintain consistent outbound connections.
I think there are currently two open tendermint issues that represent different approaches to resolving this issue, but neither of them made it into the the release for the gaia-7000 testnet.
758 suggested letting a node configure the IP that it self reports to it’s peers. 758 was superseded by 873, which develops that idea into a node remembering the IP a peer is coming from, regardless of what IP the peer reports. If I read it correctly, 873 is suggesting that a node should maintain it’s address book using the real IP addresses of peer connections rather than the IP a peer reports.
1720 takes the opposite approach, and suggests that if an id@ip:port is set in persistent_peers I should keep on dialling that up, even if the peer reports back a different listen address. 1720 is saying the persistent_peers should override the address book.
There’s an additional complication, the impact of which I’m a little unsure of. Instances running in GCP/AWS and others environments with similar setups will communicate with peers using two or more different IP addresses. Peers that communicate internally –or externally via VPN or VPC peering– will see the internal address. Peers that communicate externally over the public internet will see that same node peering with the external IP address.
I find it difficult to maintain healthy sentry nodes on GCP or AWS, because of the peering issues. We are more successful using Digital Ocean, OVH, and a few other cloud providers who provision routable IP addresses. In Figment’s architecture, we would like to spread sentries across numerous platforms, and take advantage of the sophisticated services that only the large platforms offer. It seems to me that the approach suggested by 1720 is limited, in that it will allow us to establish and maintain persistent_peer relationships with these GCP/AWS nodes, but will not help those nodes establish public peering relations. If I understand correctly, the approach suggested by 1720 would allow these nodes to get gossiped about, and dialled by other nodes via the PEX. My knowledge of the p2p layer’s internals is shallow, so my opinion is not strong, but I think a solution is needed.
I’m interested to know what the team’s thinking is on this, and also hoping that others will share their experience.