Send failed error

bharvest · August 24, 2018, 10:56am

I am witnessing send failed error from gaiad log.
It occur with one or several peers at the same time.

For each peer, error count can be up to 30000 times in an hour. When concentrated, several peer can cause up to more than 200 errors in 1 second.

Send failed occuring peer changes over time. I saw more than 20 peers causing this error so it is not node specific problem I guess. More like a structural problem of gaiad software.

During the error messages, the origin(who occured send failed error) and victim had no issue with their hardware resources including cpu/ram/traffic/maxpeernum/etc.

I can suspect two problems in gaiad software or its configure.

too much attemp to send data to specific peer although the sending is failing repeatedly.
receiver’s mempool is too small compared to its strong hardware.

Let’s discuss further on this topic and get over this together.

mdyring · August 25, 2018, 6:46am

We are seeing these as well. Coincidentally, I have just created an issue to get better Prometheus metrics. One of them (tendermint_p2p_peer_pending_transmit_bytes) would help pinpoint lagging peers easily, without the wall of text.

It might still make good sense to log this, but I think it might make better sense to give up quickly on the peer. Say, disconnect immediately and then increase a counter (perhaps persisted in addrbook) that tracks how many times this happened. The likelyhood of connecting to this peer should then decrease as this counter increases (so new peers candidates should be ordered by counter, desc)

jack · August 27, 2018, 4:54pm

I really like the idea of better prom metrics. Where is that issue?

mdyring · August 27, 2018, 5:46pm

Topic		Replies	Views
Net_info vs prometheus peer num and send failed Validation	2	647	October 25, 2018
Problem connecting validator nodes (Error{Failed to decrypt SecretConnection}) Tendermint	1	985	September 12, 2018
Auth failure: conn.ID error preventing validators from connecting Tendermint	1	3170	January 14, 2022
Error trying to join mainnet "Connection failed @ recvRoutine" Miscellaneous	4	824	November 20, 2019
Gaia-6002 Postmortem Validation	4	1866	July 13, 2018

Send failed error

Related topics