Sentry Node Architecture Overview

That’s an interesting idea. I’m also thinking the private sentry/relay nodes should not always connecting to the same set of sentry nodes in the persistent peers. As the relay nodes won’t gossip and they rely on the public sentry nodes to connect to the network, it the small numbers of public sentry nodes disconnect, the validator node can’t be synced and push votes.

It will be interesting if the relay nodes would switch to connect to different known health sentry nodes from time to time. The list of sentry nodes should be managed by the validators themselves.

This is exactly how I experienced in 7001. The sync speed was slow. Even all my connected sentries are healthy and can sync up-to-date, the validator node was always out-of-sync evening catching up was false. It had to wait

public network > sentry > relay > validator

The validator node had to wait until relay to be synced, the relay waited until sentry to be synced. That made the validator node always missed votes. If we need the validator node to be HA, the performance and availability of the front facing façade are also very important.

Currently, a single core instance would be enough for the sentries as their job is mainly for keeping them in sync. The public sentries need more memory as when they connect to more peers, they take up more memory usage. Relay nodes use less memory than public sentries as they only connect to a limited number of persistent peers. The validator node requires at least at 2-core instance with a similar amount of memory as the private sentries. Memory quite depends on the number of the peers connecting to while the validator node needs more CPU cores to keep in sync while signing votes.

1 Like

Where/how do we find the “ID” for a node on which gaiad isn’t running yet?

“gaiacli status” won’t work if gaiad isn’t running. I don’t want to run gaiad first, because then the validator would be visible. Ideally, there’s a way to find “ID” on a node where gaiad isn’t running.

I think @kwunyeung might have pointed me to this earlier in Riot…

You can run gaiad tendermint show_node_id. It reads the node_key.json to generate the node ID.

2 Likes

For (yes, shameless plug) https://validator.network, we have developed a small script to enable sentry discovery.

The script serves two purposes:

  • Avoids the hazard of requiring sentries to “dial in” to the validator(s), but instead let the validator discover sentries and only establish outbound connections.
  • Enables “local peer” discovery between sentries

Prerequisites:

  1. Requires unsafe RPC (unsafe = true in config.toml)
  2. As a consequence, the RPC should be proxied by nginx or similar to ensure only /status is exposed
  3. Sentry RPC must be behind a load balancer which will distribute traffic among instances (round robin)

So the basic idea is that anyone (be it a 3rd party, sentry or validator) requesting /status via the load balancer will receive a random status containing node-id, ip and port. Do this enough times, periodically, and one will eventually learn about all sentry nodes.

A local node (sentry or validator) can then feed this information into the local gaiad instance using the /dial_peers RPC. Like so:

#!/bin/bash -e
while true
do
    STATUS=$(curl -s https://gaia.validator.network/status)
    PEER=$(echo ${STATUS} | jq -r '.result.node_info.id')@$(echo ${STATUS} | jq -r '.result.node_info.listen_addr')

    echo Dialing ${PEER}
    curl -s -G "http://localhost:26657/dial_peers?persistent=false" --data-urlencode "peers=[\"${PEER}\"]" || true

    sleep 60
done

Example nginx configuration expose a safe RPC subset on port 80:

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    location / {
        deny all;
    }

    location /status {
        proxy_pass http://127.0.0.1:26657/status;
    }
}

HTH,
Martin

3 Likes

Hello everyone.

I posted this to Cosmos Discord too, but in the interest of time and greater exposure, am posting it here too. I want to get this right, and need expert feedback on sentry architecture. Promise to write a medium post on this once I am done :).

Here is what I’ve designed as my sentry-validator architecture based on numerous posts I have seen. I am a bit puzzled why nobody has suggested a VPN for both sentries and validators as I mention here (unless it amounts to crazy VPN costs).

Would greatly appreciate any input as I am in the process of automating this. ONE big note: I am placing the validators in the cloud – not in a data center with dedicated hardware. Please tell me your most critical thoughts. Also, I don’t see any specific mention of a “sentry” P2P option and assume that Sentry is a result of settings and context.

  1. We need a VPN on which all the sentries and validators have an IP address. This VPN itself is inaccessible to the public Internet, with respect to the addresses on the VPN. The VPN IP addresses are therefore not Internet accessible.
  2. Each sentry is assumed to be PAIRED with one or more validators each.
  3. Validators: ONLY have a single interface, which is their interface to the VPN.
  4. Sentries: DOUBLY homed (two interfaces), with one interface being to the VPN, and the other to the public Internet. Internet-facing interface is how clients (CLI, REST server, etc) communicate with the chain.
  5. Validators: pex = false in config.toml
  6. Sentries: pex = true in config.toml
  7. Validators: addr_book_strict = false in config.toml
  8. Sentries: addr_book_strict = false in config.toml
  9. Validators: persistent_peers = list of nodeid@IP of all sentries
  10. Sentries: persistent_peers = nodeid@IP of paired validators + (optionally) nodeid@IP of some or all the other sentries
  11. Sentries: private_peer_ids = nodeid of paired validators (is this even necessary if the IP of the validators is private?)

Overall, I am trying to simplify this while also keeping security as utmost importance as well as bandwidth minimization. It occurs to me that the following cardinal rules (proposed) could apply:

a. Validator AND Sentry’s persistent peers are ALL validators and ALL sentries. (literally the same value for both)
b. Sentry’s persistent_peer_ids are ALL validators.

Thanks in advance.

Connecting the sentries and validator node via VPN have been mentioned many times in different posts. It was also mentioned in the first post from @jack in this thread. Sentry, Relay, Validator are how we call the nodes depends how you set them up in the infrastructure, just like proxy and load balancer. It’s more on the functionality and how you connect the nodes. If you are looking for setting up VPN between nodes, you may consider WireGuard or Tinc.

1 Like

Currently, dVPN Node Hosting is available on the Sentinel Tendermint Testnet.
"Node on docker and share unused bandwidth "
Protocols supported:

3 Likes

What’s the benefit of the VPN? I don’t see how it makes the setup more robust or secure.

2 Likes

then how do you setup you sentries environment without vpn?

What would you need the VPN for? The communication between the nodes is already encrypted and authenticated.

1 Like

Thanks to everyone who has contributed to this thread :slight_smile:

Coming to this thread three years after it started – I’m preparing for www.Regen.Network mainnet launch, and various people have referenced this thread as a good source of info. So, I’m curious if any of this info has changed – maybe there are other threads that are more appropriate for some of these questions – I will look for other threads later.

Now that various cloud / datacenter providers offer DDoS protection (for example: OVH, Vultr): Are any validators using those DDoS solutions? Are they a decent option for protecting a validator?

When this thread started in 2018, SNA was a concept. Now, how has it worked in practice? Now that various networks have been in production for a while, what attacks have actually happened? Where can we find more info about real attacks, what impact they had, and how validators responded? How has SNA affected network latency?

What other solutions have been created and put into practice? The only other one I know is Polychain Labs’ Multi-party-computation setup, but I think that’s more focused on high availability than DDoS protection. I also see this page, that presents similar info as this thread.

Have relay nodes become a practical solution for any validators? If so, how is it going?

This part of the table seems to still be incorrect, since the validator should not be in the sentry node’s persistent peer list, because that list gets gossiped to the rest of the network.

Has --external-ip been implemented?

Private network (VPN or DirectConnect)
What’s the current situation with using VPNs for validaotr internal networks, connecting validator nodes, sentry nodes, and machines that perform maintenance? Are there validators using Tinc, Wireguard, Tailscale, ZeroTier? For me, DirectConnect is not a viable option at this point – maybe someday, or maybe never. I think a dedicated physical connection makes the validator vulnerable to physical attacks (cutting the cable, or putting a spy or injection device in the physical line), while protecting from network/software attacks. A VPN can be recreated anywhere as long as the keys and settings are preserved.

Is there still no viable cloud KMS? I’m not sure what to look for. A quick search for cloud KMS yields services from Amazon, Google, Alibaba and Tencent.

Has this kind of attack happened in production on any network yet?

@mdyring Are you still using these scripts and this architecture, or have you changed to different tools?

As I understand it, the VPN is for making an internal network, so that the validator’s public IP can be set to only allow VPN connections. But I guess the public IP still exists and can be attacked, and maybe a firewall rule only allowing VPN connections is similar (in DDoS terms) to a firewall rule that only allows authenticated connections from specific IPs? I imagine the VPN makes network setup and maintenance easier.

Ok, please let me know if there are better places or ways to ask these questions. Considering that this post has become a reference document, I think it’s worth updating or least keeping the thread going.

1 Like

Just published some documentation about sentry node config in Spanish (focused on the config/toml and app.toml files) – ahora publicamos documentación parcial para la configuración de esta arquitectura de nodos guardianes en castellano:

Seguiremos mejorando esa documentación con el pasar del tiempo.

Amazon and Google HSM confirmed to still not support the algorithm that Tendermint uses.

There is a fork of Tendermint KMS in development focused on signing in TEE (Trusted Execution Environments)** : currently, Intel(R) SGX and AWS Nitro Enclaves are supported. This would allow an alternative to the YubiHSM2 hardware HSM. See details at:

  1. GitHub - crypto-com/tmkms-light: TEE-based Key Management System for Tendermint validators.
  2. https://github.com/tomtau/tmkms/blob/feature/nitro-enclave/README.nitro.md

Zerotier fan here

But the last time I used it properly was gaia-5001.

Currently considering what it looks like if a whole chain were to adopt i2p.

There would probably be increased latency but you would definitely gain some protection because real world IP addresses wouldn’t be known.

I think there is a case to be made for chains that disappear and i2p seems to fit the bill.

Currently, my thinking on “the hot setup” is:

  • Sentries in akash
  • Blocks signed with tmkms

One thing I like very much about tmkms is that it should allow for single board computers at the edge of the network to play a bigger role in validation.

I noted some other conversations about cloud-based key management systems with some concern. I think that keys really do not belong in the cloud, even if they’re living in some kind of an enclave like nitro.

SGX shouldn’t be trusted, there have been too many incidents where total compromises are demonstrated.

why i can’t post new post !

Hello! I think about that an entire network could use i2p. Have you progressed on this research, where could I find out more? Thank you!

2 Likes

Specifically into I2P, not really. However I have looked into several other transport protocols and have tested nebula, which is a product that grew out of the team at slack.

The only difference between I2P and nebula is that I2P possibly provides privacy for the actual IP address of your machine.

Currently I’m not working too much on this but if you do have any questions about building tendermint networks I’m very happy to discuss that with you.

Twitter.com/gadikian

1 Like

Hey one other thing that I wanted to mention – I think it would be really fun to launch an i2p cosmos testnet. Would you like to try?

I think this could reinvigorate some key security practices that I’d love to see strengthened.

The Sentry Node Architecture provides a proactive approach to DDoS mitigation for Cosmos Hub validator nodes. By leveraging distributed sentry nodes and a gossip network, the solution aims to ensure the continued operation of the validator node even under attack. Validators are urged to customize and enhance the architecture according to their security needs, recognizing their individual responsibility for the robustness of their DDoS mitigation solutions.

is the i2p testnet stlll a thing?

1 Like