Sentry Node Architecture Overview


#41

That’s an interesting idea. I’m also thinking the private sentry/relay nodes should not always connecting to the same set of sentry nodes in the persistent peers. As the relay nodes won’t gossip and they rely on the public sentry nodes to connect to the network, it the small numbers of public sentry nodes disconnect, the validator node can’t be synced and push votes.

It will be interesting if the relay nodes would switch to connect to different known health sentry nodes from time to time. The list of sentry nodes should be managed by the validators themselves.

This is exactly how I experienced in 7001. The sync speed was slow. Even all my connected sentries are healthy and can sync up-to-date, the validator node was always out-of-sync evening catching up was false. It had to wait

public network > sentry > relay > validator

The validator node had to wait until relay to be synced, the relay waited until sentry to be synced. That made the validator node always missed votes. If we need the validator node to be HA, the performance and availability of the front facing façade are also very important.

Currently, a single core instance would be enough for the sentries as their job is mainly for keeping them in sync. The public sentries need more memory as when they connect to more peers, they take up more memory usage. Relay nodes use less memory than public sentries as they only connect to a limited number of persistent peers. The validator node requires at least at 2-core instance with a similar amount of memory as the private sentries. Memory quite depends on the number of the peers connecting to while the validator node needs more CPU cores to keep in sync while signing votes.


#42

Where/how do we find the “ID” for a node on which gaiad isn’t running yet?

“gaiacli status” won’t work if gaiad isn’t running. I don’t want to run gaiad first, because then the validator would be visible. Ideally, there’s a way to find “ID” on a node where gaiad isn’t running.

I think @kwunyeung might have pointed me to this earlier in Riot…


#43

You can run gaiad tendermint show_node_id. It reads the node_key.json to generate the node ID.


#44

For (yes, shameless plug) https://validator.network, we have developed a small script to enable sentry discovery.

The script serves two purposes:

  • Avoids the hazard of requiring sentries to “dial in” to the validator(s), but instead let the validator discover sentries and only establish outbound connections.
  • Enables “local peer” discovery between sentries

Prerequisites:

  1. Requires unsafe RPC (unsafe = true in config.toml)
  2. As a consequence, the RPC should be proxied by nginx or similar to ensure only /status is exposed
  3. Sentry RPC must be behind a load balancer which will distribute traffic among instances (round robin)

So the basic idea is that anyone (be it a 3rd party, sentry or validator) requesting /status via the load balancer will receive a random status containing node-id, ip and port. Do this enough times, periodically, and one will eventually learn about all sentry nodes.

A local node (sentry or validator) can then feed this information into the local gaiad instance using the /dial_peers RPC. Like so:

#!/bin/bash -e
while true
do
    STATUS=$(curl -s https://gaia.validator.network/status)
    PEER=$(echo ${STATUS} | jq -r '.result.node_info.id')@$(echo ${STATUS} | jq -r '.result.node_info.listen_addr')

    echo Dialing ${PEER}
    curl -s -G "http://localhost:26657/dial_peers?persistent=false" --data-urlencode "peers=[\"${PEER}\"]" || true

    sleep 60
done

Example nginx configuration expose a safe RPC subset on port 80:

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    location / {
        deny all;
    }

    location /status {
        proxy_pass http://127.0.0.1:26657/status;
    }
}

HTH,
Martin