Performance of Tendermint in Case of Large Number of Nodes

Hi all, could anyone share some information regarding this question: How many nodes can participate in a Tendermint PBFT network before the transaction rate is reduced – and, what is the network traffic that is produced in these networks? I know that the results will depend on the application, but any insights would be useful. Thanks a lot.

Hi,
Thanks for reaching out.
300

Year 0: 100
Year 1: 113
Year 2: 127
Year 3: 144
Year 4: 163
Year 5: 184
Year 6: 208
Year 7: 235
Year 8: 265
Year 9: 300
Year 10: 300

On the grand scale globally, it is improved by a new paradigm in edge and core routing, with exceptional scalability, carrier-class reliability, environmentally conscious design :

  • aggregation services routers, optics fibers evolution across internet backbones improvements

  • low latency in the vacuum if data go via
    optical inter-satellite links and phased array

  • Moore law, Amdahl’s law, quantum computing mainstream usage sooner or later.

More details

There is much room in spare for more transaction volume per day.

Hope it helps

Hi,

thanks a lot for your answer. I would be very thankful if you can share some insights on how you tested it and what would be the procedure to perform such tests? Also some insights into network traffic would be great.

Thanks again.

Sure


Kysenpool produces a fantastic job with the outpost for network stats, alongside other validators, monitoring the ecosystem.

Hi again,

I just want to make sure I understood the numbers good. So today Tendermint can handle 300 nodes without a drop in the transaction rate, right? Could you please clarify what are the years 0-10 in the above answer? Many thanks.

Hi,

I have done some testing on Tendermint with testnets (https://github.com/informalsystems/testnets) using AWS EC2. I got the block creation time a bit too big (~10mins), where the mempool size is almost all time at its full (10K), but there are some time slots where is changes its size. Here I attach my config and the graphs. I just wonder why the block creation time is that big and what is going on? Btw, I have 100 validators network, deployed on AWS t3.medium instances and I did a load test with 1000txs/s for the duration of 5 mins. Thanks a lot for your time.

Configs:

# This is a TOML config file.
# For more information, see https://github.com/toml-lang/toml

##### main base config options #####

# TCP or UNIX socket address of the ABCI application,
# or the name of an ABCI application compiled in with the Tendermint binary
proxy_app = "kvstore"

# A custom human readable name for this node
moniker = ""

# If this node is many blocks behind the tip of the chain, FastSync
# allows them to catchup quickly by downloading blocks in parallel
# and verifying their commits
fast_sync = true

# Database backend: goleveldb | cleveldb | boltdb
# * goleveldb (github.com/syndtr/goleveldb - most popular implementation)
#   - pure go
#   - stable
# * cleveldb (uses levigo wrapper)
#   - fast
#   - requires gcc
#   - use cleveldb build tag (go build -tags cleveldb)
# * boltdb (uses etcd's fork of bolt - github.com/etcd-io/bbolt)
#   - EXPERIMENTAL
#   - may be faster is some use-cases (random reads - indexer)
#   - use boltdb build tag (go build -tags boltdb)
db_backend = "goleveldb"

# Database directory
db_dir = "data"

# Output level for logging, including package level options
log_level = "main:info,state:info,*:error"

# Output format: 'plain' (colored text) or 'json'
log_format = "plain"

##### additional base config options #####

# Path to the JSON file containing the initial validator set and other meta data
genesis_file = "config/genesis.json"

# Path to the JSON file containing the private key to use as a validator in the consensus protocol
priv_validator_key_file = "config/priv_validator_key.json"

# Path to the JSON file containing the last sign state of a validator
priv_validator_state_file = "data/priv_validator_state.json"

# TCP or UNIX socket address for Tendermint to listen on for
# connections from an external PrivValidator process
priv_validator_laddr = ""

# Path to the JSON file containing the private key to use for node authentication in the p2p protocol
node_key_file = "config/node_key.json"

# Mechanism to connect to the ABCI application: socket | grpc
abci = "socket"

# TCP or UNIX socket address for the profiling server to listen on
prof_laddr = ""

# If true, query the ABCI app on connecting to a new peer
# so the app can decide if we should keep the connection or not
filter_peers = false

##### advanced configuration options #####

##### rpc server configuration options #####
[rpc]

# TCP or UNIX socket address for the RPC server to listen on
laddr = "tcp://0.0.0.0:26657"

# A list of origins a cross-domain request can be executed from
# Default value '[]' disables cors support
# Use '["*"]' to allow any origin
cors_allowed_origins = []

# A list of methods the client is allowed to use with cross-domain requests
cors_allowed_methods = ["HEAD", "GET", "POST", ]

# A list of non simple headers the client is allowed to use with cross-domain requests
cors_allowed_headers = ["Origin", "Accept", "Content-Type", "X-Requested-With", "X-Server-Time", ]

# TCP or UNIX socket address for the gRPC server to listen on
# NOTE: This server only supports /broadcast_tx_commit
grpc_laddr = ""

# Maximum number of simultaneous connections.
# Does not include RPC (HTTP&WebSocket) connections. See max_open_connections
# If you want to accept a larger number than the default, make sure
# you increase your OS limits.
# 0 - unlimited.
# Should be < {ulimit -Sn} - {MaxNumInboundPeers} - {MaxNumOutboundPeers} - {N of wal, db and other open files}
# 1024 - 40 - 10 - 50 = 924 = ~900
grpc_max_open_connections = 900

# Activate unsafe RPC commands like /dial_seeds and /unsafe_flush_mempool
unsafe = false

# Maximum number of simultaneous connections (including WebSocket).
# Does not include gRPC connections. See grpc_max_open_connections
# If you want to accept a larger number than the default, make sure
# you increase your OS limits.
# 0 - unlimited.
# Should be < {ulimit -Sn} - {MaxNumInboundPeers} - {MaxNumOutboundPeers} - {N of wal, db and other open files}
# 1024 - 40 - 10 - 50 = 924 = ~900
max_open_connections = 900

# Maximum number of unique clientIDs that can /subscribe
# If you're using /broadcast_tx_commit, set to the estimated maximum number
# of broadcast_tx_commit calls per block.
max_subscription_clients = 100

# Maximum number of unique queries a given client can /subscribe to
# If you're using GRPC (or Local RPC client) and /broadcast_tx_commit, set to
# the estimated # maximum number of broadcast_tx_commit calls per block.
max_subscriptions_per_client = 5

# How long to wait for a tx to be committed during /broadcast_tx_commit.
# WARNING: Using a value larger than 10s will result in increasing the
# global HTTP write timeout, which applies to all connections and endpoints.
# See https://github.com/tendermint/tendermint/issues/3435
timeout_broadcast_tx_commit = "10s"

# The name of a file containing certificate that is used to create the HTTPS server.
# If the certificate is signed by a certificate authority,
# the certFile should be the concatenation of the server's certificate, any intermediates,
# and the CA's certificate.
# NOTE: both tls_cert_file and tls_key_file must be present for Tendermint to create HTTPS server. Otherwise, HTTP server is run.
tls_cert_file = ""

# The name of a file containing matching private key that is used to create the HTTPS server.
# NOTE: both tls_cert_file and tls_key_file must be present for Tendermint to create HTTPS server. Otherwise, HTTP server is run.
tls_key_file = ""

##### peer to peer configuration options #####
[p2p]

# Address to listen for incoming connections
laddr = "tcp://0.0.0.0:26656"

# Address to advertise to peers for them to dial
# If empty, will use the same port as the laddr,
# and will introspect on the listener or use UPnP
# to figure out the address.
external_address = ""

# Comma separated list of seed nodes to connect to
seeds = ""

# Comma separated list of nodes to keep persistent connections to
persistent_peers = ""

# UPNP port forwarding
upnp = false

# Path to address book
addr_book_file = "config/addrbook.json"

# Set true for strict address routability rules
# Set false for private or local networks
addr_book_strict = false

# Maximum number of inbound peers
max_num_inbound_peers = 200

# Maximum number of outbound peers to connect to, excluding persistent peers
max_num_outbound_peers = 200

# Time to wait before flushing messages out on the connection
flush_throttle_timeout = "10ms"

# Maximum size of a message packet payload, in bytes
max_packet_msg_payload_size = 10240

# Rate at which packets can be sent, in bytes/second
send_rate = 20000000

# Rate at which packets can be received, in bytes/second
recv_rate = 20000000

# Set true to enable the peer-exchange reactor
pex = true

# Seed mode, in which node constantly crawls the network and looks for
# peers. If another node asks it for addresses, it responds and disconnects.
#
# Does not work if the peer-exchange reactor is disabled.
seed_mode = false

# Comma separated list of peer IDs to keep private (will not be gossiped to other peers)
private_peer_ids = ""

# Toggle to disable guard against peers connecting from the same ip.
allow_duplicate_ip = false

# Peer connection configuration.
handshake_timeout = "20s"
dial_timeout = "3s"

##### mempool configuration options #####
[mempool]

recheck = false
broadcast = true
wal_dir = ""

# Maximum number of transactions in the mempool
size = 10000

# Limit the total size of all txs in the mempool.
# This only accounts for raw transactions (e.g. given 1MB transactions and
# max_txs_bytes=5MB, mempool will only accept 5 transactions).
max_txs_bytes = 1073741824

# Size of the cache (used to filter transactions we saw earlier) in transactions
cache_size = 10000

##### consensus configuration options #####
[consensus]

wal_file = "data/cs.wal/wal"

timeout_propose = "500ms"
timeout_propose_delta = "100ms"
timeout_prevote = "400ms"
timeout_prevote_delta = "100ms"
timeout_precommit = "400ms"
timeout_precommit_delta = "100ms"
timeout_commit = "400ms"

# Make progress as soon as we have all the precommits (as if TimeoutCommit = 0)
skip_timeout_commit = true

# EmptyBlocks mode and possible interval between empty blocks
create_empty_blocks = false
create_empty_blocks_interval = "0s"

# Reactor sleep duration parameters
peer_gossip_sleep_duration = "10ms"
peer_query_maj23_sleep_duration = "2s"

##### transactions indexer configuration options #####
[tx_index]

# What indexer to use for transactions
#
# Options:
#   1) "null"
#   2) "kv" (default) - the simplest possible indexer, backed by key-value storage (defaults to levelDB; see DBBackend).
indexer = "kv"

# Comma-separated list of tags to index (by default the only tag is "tx.hash")
#
# You can also index transactions by height by adding "tx.height" tag here.
#
# It's recommended to index only a subset of tags due to possible memory
# bloat. This is, of course, depends on the indexer's DB and the volume of
# transactions.
index_tags = ""

# When set to true, tells indexer to index all tags (predefined tags:
# "tx.hash", "tx.height" and all tags from DeliverTx responses).
#
# Note this may be not desirable (see the comment above). IndexTags has a
# precedence over IndexAllTags (i.e. when given both, IndexTags will be
# indexed).
index_all_tags = false

##### instrumentation configuration options #####
[instrumentation]

# When true, Prometheus metrics are served under /metrics on
# PrometheusListenAddr.
# Check out the documentation for the list of available metrics.
prometheus = true

# Address to listen for Prometheus collector(s) connections
prometheus_listen_addr = ":26660"

# Maximum number of simultaneous connections.
# If you want to accept a larger number than the default, make sure
# you increase your OS limits.
# 0 - unlimited.
max_open_connections = 5

# Instrumentation namespace
namespace = "tendermint"

I think because of a lot of gossip overhead among 100 nodes, t3.medium might not be able to handle 1000tx/s within several seconds.

practically, the validator nodes usually have much higher resource than t3.medium, and it usually has
only several very trusted internal peers to minimize gossip overheads.

so, i expect those are the reasons. From the number 10min(=600s), I expect only 1~2 nodes are succeeding for block creation in several seconds, rest of the attempts are failed because of lack of time I guess.

therefore, i suggest you to limit

  • max_num_inbound_peers = 10
  • max_num_outbound_peers = 10

and try the test again.
with this config, you might need sophisticated peering control to make all 100 nodes connected indirectly. (creating 5 clusters with whole connections and connecting 5 clusters can be a strategy.)

I think I have tried with:

  • max_num_inbound_peers = 40
  • max_num_outbound_peers = 30
    but the problem I had is that the 100 validators somehow do not get connected at all.

from my experience, i made a small script to write each config.toml with predefined persistent_peers.

if your nodes are i=1 to 100 then,
persistent_peers_of_node_i = node_(i+k), node_(i+10k) (where k=1 to 9)
of course you can deduct 100 if (i+k) or (i+10
k) exceeds 100

then all nodes will be well connected with less peers (18 each) :slight_smile:
theoretically, the maximum distance between any two random nodes is 2.

but Tendermint p2p is not very efficient in peering at first.
it is due to exponential backoff and weird dialing behavior.
this(https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-050-improved-trusted-peering.md) will let you have more stable connection among trusted peers.

Thanks a lot. So you basically suggest to use this tree: https://github.com/tendermint/tendermint/commit/701e9cac4d91474bfc04082e56a583bf77524f18 and to define 2 new parameters in the config.toml file: unconditional_peer_ids, and persistent_peers_max_dial_period. Right? Btw. these unconditional_peer_ids can be specified from the file config/addrbook.json file, but the file is empty. I use testnets: https://github.com/informalsystems/testnets
So basically I can add like 15-20 validators addresses to the unconditional_peer_ids field, or what would you suggest? Also what would you suggest for the value for the second parameter: persistent_peers_max_dial_period? Many thanks in advance!

It is included after Tendermint v0.33, so you can use any version after v0.33

those 2 new parameters are not for genesis.json, but for config.toml in each node.
and, for unconditional_peer_ids, you should not put “validator keys” but should put “node id”.
node id can be seen by “gaiacli tendermint show-node-id” from each node.
this can be seen “after” the initiation of gaiad.

for persistent_peers_max_dial_period, 60 is reasonable i guess.
it means you dial every disconnected persistent peers for each minute, which will not be any burden.

I have tried this for getting the node id:

[ec2-user@ec2-3-81-200-67 gaia]$ gaiacli status -n http://ec2-3-84-78-212.compute-1.amazonaws.com:26657
{“node_info”:{“protocol_version”:{“p2p”:“7”,“block”:“10”,“app”:“0”},“id”:“31aaac3e4f7ecd2dfeee16089ba2dcd2c9824e8b”,“listen_addr”:“tcp://0.0.0.0:26656”,“network”:“testnet_abcd”,“version”:“0.31.7”,“channels”:“4020212223303800”,“moniker”:“ec2-3-84-78-212.compute-1.amazonaws.com”,“other”:{“tx_index”:“off”,“rpc_address”:“tcp://0.0.0.0:26657”}},“sync_info”:{“latest_block_hash”:“ABEEF5CBEB169334B754C20CF436BA38776C73FD6E628C64A96C0BF4D297DE9F”,“latest_app_hash”:"",“latest_block_height”:“1”,“latest_block_time”:“2020-04-17T09:59:44.415854Z”,“catching_up”:false},“validator_info”:{“address”:“B221B42E63FD825E711B115EDD8B36E99BA939A1”,“pub_key”:{“type”:“tendermint/PubKeyEd25519”,“value”:“94wpwymSS0Ht56I2o206Kyc7CdagEiQTkPqu8NT7cS4=”},“voting_power”:“1000”}}

The id should be:

"id":"31aaac3e4f7ecd2dfeee16089ba2dcd2c9824e8b"

Right? Many thanks again!

Hi again,

I have tried to test with tendermint v0.33.3-13eff7f7 as you suggested and my config file is the following:

proxy_app = "noop"
moniker = "ec2-3-81-200-67.compute-1.amazonaws.com"
fast_sync = false
db_backend = "memdb"
db_dir = "data"
log_level = "*:error"
log_format = "plain"
genesis_file = "config/genesis.json"
priv_validator_key_file = "config/priv_validator_key.json"
priv_validator_state_file = "data/priv_validator_state.json"
priv_validator_laddr = ""
node_key_file = "config/node_key.json"
abci = "socket"
prof_laddr = ""
filter_peers = false

[rpc]
laddr = "tcp://0.0.0.0:26657"
cors_allowed_origins = []
cors_allowed_methods = [ "HEAD", "GET", "POST",]
cors_allowed_headers = [ "Origin", "Accept", "Content-Type", "X-Requested-With", "X-Server-Time",]
grpc_laddr = ""
grpc_max_open_connections = 900
unsafe = false
max_open_connections = 900
max_subscription_clients = 100
max_subscriptions_per_client = 5
timeout_broadcast_tx_commit = "10s"
max_body_bytes = 1000000
max_header_bytes = 1048576
tls_cert_file = ""
tls_key_file = ""

[p2p]
unconditional_peer_ids = "b40cc7aa377ab45889b742c167f9125eae148481,415fb1af49aaa7aaebecfe4fe7e238705202f01d,1574e695bd87248c546ca4a5324a0dde3d8778f0,6211a2e98bd37a5b3c30b26a00545fa906284e5e,72c76e3301b385dd10c9fc134252d805bf84160e,741ef03088876cd1b2f027bd722f670e3a54be26,682cd3ee433d6a5768383f39e6462d8fffc1eb6f,62f58f1fc186f69b04d80a0964529d0ba95a1aaf,573e67d2eb1896fb2d26e4ec7771d704ffa2ca1b,2dced2728c615d6b9706907eed7d72aa4d4b0ed5"
persistent_peers_max_dial_period = "60s"
laddr = "tcp://0.0.0.0:26656"
external_address = ""
seeds = "1e91a1dcab23c6e9b4a678a0b5eb47f4f9eb7f55@ec2-52-90-202-100.compute-1.amazonaws.com:26656"
persistent_peers = ""
upnp = false
addr_book_file = "config/addrbook.json"
addr_book_strict = false
max_num_inbound_peers = 10
max_num_outbound_peers = 10
flush_throttle_timeout = "10ms"
max_packet_msg_payload_size = 204800
send_rate = 51200000
recv_rate = 51200000
pex = true
seed_mode = false
private_peer_ids = ""
allow_duplicate_ip = true
handshake_timeout = "20s"
dial_timeout = "3s"

[mempool]
recheck = false
broadcast = false
wal_dir = ""
size = 5000
max_txs_bytes = 1073741824
cache_size = 5000
max_tx_bytes = 1048576

[fastsync]
version = "v0"

[consensus]
wal_file = "data/cs.wal/wal"
timeout_propose = "2s"
timeout_propose_delta = "200ms"
timeout_prevote = "2s"
timeout_prevote_delta = "200ms"
timeout_precommit = "2s"
timeout_precommit_delta = "200ms"
timeout_commit = "5ms"
skip_timeout_commit = true
create_empty_blocks = false
create_empty_blocks_interval = "0s"
peer_gossip_sleep_duration = "10ms"
peer_query_maj23_sleep_duration = "2s"

[tx_index]
indexer = "null"
index_keys = ""
index_all_keys = false

[instrumentation]
prometheus = true
prometheus_listen_addr = ":26660"
max_open_connections = 5
namespace = "tendermint"

So for each node I have specified 10 connections in the unconditional_peer_ids, in the form: node 1 -> 10,20,30,…,100; node 2 -> 11,21,31,…,1; node 3 -> 12,22,32,…,2; … node 100 -> 9,19,29,…,99. So each node can access any other node in max 2 hops. I had to create some scripts to add unconditional_peer_ids to the nodes configs after the nodes are started and then to restart each node. When I check the tendermint_p2p_peers metric in the influxdb, it shows me that most of the nodes does not have connection, or have 1 connection, only node0 has 10 or 1 (it changes). This is the graph of the metric tendermint_p2p_peers:


Do you have any idea why is it so? How can I specify in my config.toml file in advance (before the nodes are started) the unconditional_peer_ids? If I use gaiacli, the nodes must be running, but for the testnets to automatically create and start the nodes, they use config.toml predefined file template. Many thanks in advance!

Hi,

I have managed to run the v 0.33 version with my setup. However, if only unconditional_peer_ids field is defined, without persistent_peers, the connection to the unconditional_peer_ids is not established from some reason. Also I put max_number_of_inbound_peers and max_number_of_outbound_peers equals 10, so the nodes made additional connections beside the 10 connections I specified in the unconditional_peer_ids (i.e. in the persistent_peers) so that the maximal number of connections per node was 30 = 10 (persistent_peers or unconditional_peers) + 10 inbound connections + 10 outbound connections. I also tries to specify the max_number_of_inbound_peers and max_number_of_outbound_peers equals 0, so that each node only makes the connections to the persistent_peers, i.e. unconditional_peers but that did not work, and the nodes finished with 0 connections. Is there a way to tell node to only connect to the persistent peers? Also do you have any other suggestion how I can further improve the performance of the network? Many thanks! Btw. the block time interval now looks like this (for 100 validators and 1000tx/s for the period of 3 mins):

if only unconditional_peer_ids field is defined, without persistent_peers, the connection to the unconditional_peer_ids is not established from some reason.

this is an expected behavior, because, if you see “unconditional_peer_ids”, it only store node_id. So, your node does not have IP address to peer the node. So, the unconditional_peer_ids is only supposed to be used when calculating maximum inbound/outbound peers, but not supposed to be used for making new connection.

Also I put max_number_of_inbound_peers and max_number_of_outbound_peers equals 10, so the nodes made additional connections beside the 10 connections I specified in the unconditional_peer_ids (i.e. in the persistent_peers) so that the maximal number of connections per node was 30 = 10 (persistent_peers or unconditional_peers) + 10 inbound connections + 10 outbound connections.

this is also an expected result. If you want to only allow 10 specified peers, then you should set

  • maximum inbound connection = 0
  • maximum outbound connection = 0
    this is because unconditional peers can be connected regardless of maximums.
    Then I expect you will get max 10 peers, which are members of the 10 specified peers

I also tries to specify the max_number_of_inbound_peers and max_number_of_outbound_peers equals 0, so that each node only makes the connections to the persistent_peers, i.e. unconditional_peers but that did not work, and the nodes finished with 0 connections.

one condition you should check is whether you put all 10 specified peers in “persistent_peers” or not. or at least, those specified peers should dial to your node. unconditional_peer_ids itself cannot generate connection because it does not have IP address information and it is supposed to be used only for ignoring maximum connection numbers.

if the connections are not made with persistent_peers setup, then I guess there might exist a bug. please let us know if you fail to have those 10 specified peers only.

try put 10 specified peers in persistent_peers. the nodes need them to be listed in persistent_peers so that they are allowed to dial the peers.

unconditional_peer_ids only means that the node will “allow” the peer to connect even if maximum connection is reached, but it does not generate connection itself.