I am witnessing send failed error from gaiad log.
It occur with one or several peers at the same time.
For each peer, error count can be up to 30000 times in an hour. When concentrated, several peer can cause up to more than 200 errors in 1 second.
Send failed occuring peer changes over time. I saw more than 20 peers causing this error so it is not node specific problem I guess. More like a structural problem of gaiad software.
During the error messages, the origin(who occured send failed error) and victim had no issue with their hardware resources including cpu/ram/traffic/maxpeernum/etc.
I can suspect two problems in gaiad software or its configure.
- too much attemp to send data to specific peer although the sending is failing repeatedly.
- receiver’s mempool is too small compared to its strong hardware.
Let’s discuss further on this topic and get over this together.