Let's talk about relay nodes!

Relay node is an interesting idea. I suppose it could mitigate network latency.

Let’s discuss how to implement the idea?

As I understand Zaki’s definition of a relay node, the important feature that distinguish one from a normal sentry is that it peers only with manually selected peers:

  • Network access is controlled, either in firewalls by an IP whitelist or by other means.
  • One end of each peering relationship needs to set the other in persistent_peers and pex = false is set.

I have reasoned about 4 methods of controlling access, in order of resilience:

  1. Firewall white listing. Tendermint’s port is closed by default, and opened only to a white list of static IP addresses of peers. This has the disadvantage that these nodes are still communicating on the public internet, and are no less vulnerable to DDoS attack than any other sentry. There is security by obscurity, since the public IP addresses should not be gossiped, but if an IP address is discovered it becomes vulnerable, and it is problematic to change since out of band co-ordination with peers will be required.
  2. VPN connectivity. VPN connectivity can be established between relay nodes. These can be network-network IPSec, which is supported by major cloud platforms and most firewalls, WireGuard from host-host, or any technology stack mutually supported by two relay node operators. This still uses public internet, but in this case gaiad does not need a public IP address, so discovery of IP addresses to target with DDoS is harder.
  3. VPC peering. For peers that are hosted within the same cloud platform and where the feature exists, VPC peering is established. This is possible within GCP and AWS respectively, I am not familiar with other platforms. The advantage over 1&2 is that there is no exposure to the public internet. The disadvantage is that both relay nodes need to be hosted on the same cloud platform.
  4. Private link. Private connectivity can be established between validator operators. SDNs like MegaPort could be used at reasonable cost. I think this is likely not something anyone wants to get involved with in testnets, but it becomes a reasonable option on mainnet.

I don’t have a solid conclusion, but I think the ideal topology is a mix of all of these arrangements, with each relay peering arrangement made directly between operators, using whatever topology makes the most sense for each relationship.

I’d love to hear feedback, and what other’s are thinking about this.

4 Likes

Your words have the clearest and most practical points of relay nodes. We don’t need exact definition of relay node because this word has actually little bit different meaning outside cosmos gaia world(https://en.m.wikipedia.org/wiki/Relay_net). It is just more secure and fast sentry node. We can just have several approaches for that.

One of my question is that in which case there is any risk of relay node’s IP discovered. We can listen to any owned public sentry and then see the packets, then someone can hack the origin of the packet sent by its peer?

The other question is then, which relay node in 1~4 can be built without any public IP? I think this would provide the best security option.

The most interesting things from now on is the performance test results of each relay node option. Then individual validator can choose her or his best fit considering her “security/performance risk appetite”. I hope this discussion continues with many test results!

Using Wireguard links between relay node peers would be an extremely strong form of resilience peers.

Basically @mattharrop wrote down exactly what @iqlusion and I have been thinking.

1 Like