Our understanding of the Cosmos Hub mempool issues

Hey yea, I’m not completely 100% on the block size increase myself. That being said, it is a very standard size in the blockchain world, and I am ideologically a “big blocker”. I also have a feeling (kind of the whole point of this thread) that block size and mempool size must be tuned together. Once our tests complete, we’ll be able to make a stronger recommendation on the block size change.

Was this a Banana Queen?

1 Like

Hey, great to have someone with a ton of real world RPC experience in here!

In this analysis, I’m mostly looking at mempool gossip performance as a possible culprit. I think that improving mempool performance could be a win by itself. During a S.P.A.M. event, we’ve observed a lot of nodes suffering, not just the RPC used.

That being said, the multithreading suggestion is a good one. I’m actually not sure where that needs to happen- Comet or Cosmos-SDK. Maybe both. This highlights another tricky part of these performance tuning questions: they can involve a lot of libraries maintained by different teams.

The fact that there are stateful checks in the p2p hot loop via checktx are somewhat inherently a liveness vulnerability.

@kjessec and @dogemos are right to emphasize the risk from sequential stateful checks .

Once we start building under the adr 110 model, these stateful checks can be scaled horizontally in a way they can’t be right now.

2 Likes

I would like first to thank everyone for contributing their understanding to this very important topic and offering some ideas how things can be improved. Let me try to offer my perspective:

  • I agree with @zaki_iqlusion that the nice properties of the existing system is that it is very robust and that messages get propagated through the network pretty fast. In fact, a part of the problem is that the current design is probably too robust. We have actually been doing some research back in 2020, as it was clear even back then that this is probably the most challenging and least mature part of the tendermint/comet. And we at Informal, are clearly not the only ones being aware of this, zaki_iqlusion, jack, dev, ismail, xla, chris goes, etc, have been also talking about this for a very long time. It turns out that, although there is a ton of research on gossip and consensus based systems, there is almost no research on the mix of the two, i.e., consensus systems (includes tx gossiping, aka mempool, also vote and block gossiping, not just consensus protocol) on top of gossip systems. The closest research in our view was the line of papers around BAR (Byzantine, Altruistic, Rational) model, for example Bar Gossip (https://www.cs.cornell.edu/lorenzo/papers/bar-gossip.pdf). We have published our results in this paper (https://www.inf.usi.ch/faculty/pedone/Paper/2021/middleware2021b.pdf), and note that it actually assumes only crash faults. Scientific work on the BFT version is still in progress and we will hopefully have something soon to share. So how to design efficient and secure large scale, BFT tolerant, gossip based consensus systems, is in my view, still an open and very challenging research question. And this might help explain my perspective to @jacobgadikian that this is a known design problem, not an issue, in a sense we can’t fix it in a short time frame, as you do when you find some implementation or misconfiguration bug. Since this was raised in terms of a security incident, we (Informal) wanted to keep a low comms profile while working on it (the right, professional thing to do), and that might have been misinterpreted as not considering the issue important and not working hard on it. We in fact, deeply care about it and have been working on this for a very long time (not just in the last two months).
  • The other aspect here is that comet’s existing architecture and implementation does not allow us to easily implement any novel protocol/design idea, especially that involves changing the gossip/p2p layer. The existential challenges we faced in 2022 with tendermint (before Informal took over stewardship of the comet project) were actually related to this exact problem, and hopefully we all learned that making changes in tendermint/comet gossip/p2p layer need to be done with a lot of understanding and care, and with super thorough QA process. Strengthening QA and testing process are actually areas where the Comet team at Informal has spent a lot of effort since we have taken over stewardship of comet and you can read about this here: CometBFT Documentation - CometBFT Quality Assurance - v0.38 and CometBFT Documentation - Method - v0.38. Mempool and gossip inefficiencies have been identified as one of the most important and critical problems of comet, and the Comet team has been working on it the whole year. Bucky has mentioned in his post ([Proposal] Cosmos Hub adopt the Skip Block SDK - #35 by ebuchman) some results and some work which is still in progress. We unfortunately needed to spend quite significant time reverse engineering current design and intentions from the current code, so we can start to make improvements in a safe way. We also needed to put some monitoring tooling in place so we can really measure the impact of changes we are making. But this is all in place now, and we are ready to unlock R&D in those areas, and we are super happy to collaborate with anyone interested in helping with this. Dogemos’s perspective re CheckTx is also a super valid observation and insight.
    Finally, there is a bit orthogonal effort to get rid of comet’s mempool altogether (ADR 110: Remote mempool by thanethomson · Pull Request #1565 · cometbft/cometbft · GitHub), and I am strongly supporting this initiative for two main reasons: 1) it separates nicely concerns (and strongly signals that tx propagation and block creation should happen outside comet and that we want to work with others on figuring out right APIs and responsibilities) and 2) reduce comet’s footprint and responsibilities (we might avoid a need to solve super hard research problem mentioned above or at least we can focus on a simpler version of it).
7 Likes

Informal worked with amulet to release this information to the whole world and that is the opposite of a low comms profile

That isn’t what you said.

You said there was no security issue. These are radically different things.

AFAIK, one of the stateful checks is on the sending account’s balance, right? It seems that this would need to continue being single threaded.

Not just balance, pubkey, nonce… whatever is related to checking the authority of the sender.

One way to optimistically parallelize is to shard CheckTx runs by sender address. Biggest state-dependent part in verifying the authority of the sender is nonce check, and fee deduction. All of these can be serialized if we maintain a per-sender lane for these.

But other parts - like the “body” of a Txn - is rather interdependent across other transactions, due to the validity of a transaction largely depends on the global view of the canonical state…


How about we don’t perform CheckTx at all at proposer’s mempool? May sound a little drastic, but I think it makes sense:

  • Computationally, CheckTx cost is at most as expensive as normal DeliverTx run. This means that validator runs every single transaction at least 2 times. All transaction runs, including CheckTx, do make use of global mutex lock, and this is one of the biggest performance upper-bound. All Txns will get run in DeliverTx call anyways - I think the notion of “Checking the transaction before inclusion” is only adding unnecessary computational load. Should a transaction fail, it will fail in DeliverTx call anyways.

  • IMO there is little to no value in running CheckTx at validator’s mempool. Most Txns would have been already “simulated” on local RPCs before hitting the proposer. Proposer re-running those transactions is a double validation; now at this point, the proposer’s CheckTx result might be different to the result from a local RPC.

  • (cont’d) I mean there is no guarantee that any Txn validation check will pass because the order of txns isn’t guaranteed due to network topology. Unless the goal is to only include Txns that passed the check.

  • Threat imposed here (given we lifted CheckTx calls) is spam. But currently the mempool checks those spam txns anyway, so in a way it’s already prone. Removing CheckTx is actually a benefit in this regard because it would cut Txn-related computational cost at least by half.

  • Any further protective measures should be App’s concern IMO; ultimately it comes down to how efficient the validation is, which CometBFT alone can’t make sense of. With ADR110 and the likes this part should be a lot customizable

There’s a little UX corner case too - kind of frequent under congestion. Let’s say a user sends a transaction via Keplr. Tx simulation on Keplr was OK, then the node behind Keplr would have propagated the txn to global mempool. Now, proposer performs CheckTx again, and it fails there, because of state inter-dependency between other Txns that Keplr node didn’t have at the time of simulation. Transaction fails, and it’ll be rejected (or kept in mempool for Recheck later?). User never gets any signal back, because the Txn is never to be included in any block.

3 Likes

I’m somewhat new to this subject, but one important role of CheckTx as I understand it is to flush the mempool of a spammer’s txs when they run out of gas. This is especially powerful with the addition of a fee market, as they will run out of gas much more quickly.

Eliminating CheckTx entirely would make it so that this flushing wouldn’t happen. Seems like it would result in blocks full of failed Tx’s.

Multithreading CheckTX would delay the flushing by a block (the spammer would have to get their txs in and have gas deducted for real before the check started failing). In this “last spam block” you would have a bunch of failed Tx’s.

Maybe it would be possible multithread the majority of the code in CheckTx, but have a special case with a shared data structure or whatever between threads just for gas. Seems tricky though. We’re getting into the domain of Cosmos-SDK here though, maybe @marbar can chime in.

However, I am excited to see how just reducing the mempool size will work. This will reduce not only network usage gossiping but also CheckTx load (I think).

All people who feel underheard, undervalued, misunderstood, dismissed, have their fight or flight mode activated. Add a dangerous element & it’s 10x worse.

How many tropes have been portrayed in films & TV & literature about somebody sounding an alarm and being repeatedly put off or waved away?

That’s why it’s such bad practice to do that -
Everybody loses.

More apologies all around. Give benefit of the doubt to get benefit of the doubt.

If you think you’re owed an apology, lead by example. Forgive by example too. Nobody is a villain in their own mind.

No more combat. No more historical justifying.

No more exclusive high table bs -and-
no more mud on the work table.

Make good processes in lieu of bad dramas.

Somebody said the Hub is paying way too much for security. What it’s been getting is a social engineer’s Swiss cheese dream target.

– Thank you for your patience during this unscheduled maintenance. –

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.