Great discussion, and thanks @barnabemonnot for chiming in - great to see you here! Just wanted to share some notes on the Comet mempool and try to decouple a few things.
The mempool does have a very inefficient flooding (“push”) protocol to gossip txs, and neither the mempool nor the underlying peer connection degrade gracefully. However the expectation is that spam prevention is handled at the app-layer by tx fees. The mempool’s interface to this is the CheckTx method of ABCI, which allows the application to accept or reject txs into the mempool (ie. by checking for fees, signature, etc). Only if a tx is accepted into the mempool by the application does the mempool start gossipping it.
This is not to say there’s no problems with the mempool design, but ultimately entry into the mempool is gated by the app’s fee system, and a fee system that can adequately modulate entrance into the mempool has been a gaping hole in the ecosystem forever. This is compounded further by a desire to accept low or zero fee IBC txs to improve UX and lower burden on relayers.
Currently validators set subjective min fees in their local config, and increase them / restart nodes as necessary to respond to spam. This is obviously not really sustainable. Apps could also automatically adjust this min fee locally to help validators respond in a more automated way to increased load in their local mempool without encoding something in protocol - this would potentially be easier and less controversial than an in-protocol fee system. Validators could set a local floor and increase based on a PID or some controller that monitors the local mempool size. It would probably need to be tuned a bit. Of course then there’s also the possibility of an in-protocol fee adjustment mechanism like eip-1559 or variations thereupon, but there’s plenty of considerations around that, as others are discussing
There are a few different things being done to improve bandwidth use in the mempool. The most immediately useful is probably this patch I started work on last week (based on an idea from the Comet team’s Dan) that just reduces the number of peers you gossip mempool txs to. Many validators want large peer connectivity to reduce missed blocks, but they really only need to gossip txs to a subset of those peers. This should massively reduce amplification and can be rolled out by individual validators so it’s easy to start testing, perhaps as soon as next week.
There’s also been a bunch of work on a push-pull mempool initially prototyped by Celestia that we upstreamed and fixed to work with Comet. This allows the mempool to move more towards being request (“pull”) based, which can greatly limit amplification, ultimately at the cost of latency. This is currently targeting Comet v1.0, whose alpha release should hopefully come out before end of year, but unfortunately this might not be able to help with current issues unless we backport. It also seems to be most effective for large txs.
There were also some improvements earlier this year to the block part gossip in the consensus reactor, which isn’t related to the mempool, but can significantly improve bandwidth usage overall under high load. Those improvements are also only targeting Comet v1.0, but in principle it shouldn’t be hard to back port if there’s significant demand.
There might be more we can do to improve the mempool and relieve some of the pressure here, but ultimately there seems to be widespread agreement that the mempool should just not be comet’s concern. The fixes we have should help stabilize things, but the ecosystem probably needs to move towards alternative solutions that move mempool functionality into the app or validator side car processes. Thane started an ADR on that here - we’d welcome any feedback there!