CAP theorem and FLP impossibility... What can be done to prevent the Stopocalypse?

This is just a kinda loose and pretty fresh idea that has been accumulating in my mind for some years now, and in the last 12 months I have done a fair bit of study on the subject of distributed systems, and as I was reading this forum I came across a subject relating to emergency responses in the event of a liveness failure, the main vulnerability of the Tendermint BFT consensus.

I will just refer to biological systems, for a clue, and specifically, the way in which the brain and cardiovascular system coordinate with each other, at the lowest level.

Our body has different levels of autonomy and automation and isolation between parts of the nervous system that act as firewalls that ensure that when something goes wrong, there is countermeasures.

It seems to me that there is not much difference between this structure and how any fully resilient network system would retain its three vital properties, Consistency, Availability and Partition-tolerance/resistance.

The design of the Tendermint consensus gives you C and P, but the A is therefore deprecated. For this reason, the one big danger of this model is that the whole system stops making blocks at all. Nakamoto consensus has low capability with the C, and for this reason it takes so long for it to reach finality, and even, in theory, its finality is only secured to at best a degree of like 99.9999%, but a 0.00001% chance that someone rebuilds a whole chain with a smaller total when all block hashes are summed, that the entire chain could be forked and cause the mother of all reorgs.

Ok, I think that is enough background for those who are not as familiar with the issues relating to what things distributed networks can do, so, here is my idea…

We have already in existence system designs that have strength in two and weakness in the other, I covered the two above, so we have CP for TM, AP for BTC (etc) and what about AP?

AP is what you have in a system like Kafka, which does not try to certify the data at all, it only makes sure that it is impossible to split the network (a double spend is a network split) and impossible to disconnect any nodes, at least not for long, unless they are stuck with only one path to the internet that is physically disconnected.

Since biological systems have solved this problem long ago, I think that they are a model for how it can be solved. But more conservatively, as I saw a topic about the possibility of the very young Cosmos Hub mainnet going down, and ‘emergency measures’, I just want to put forward an idea for how to deal with it.

In some respects, the design approach of Cosmos already has a pathway towards the solution and it’s really just a matter of time, the way I see it, we have Nakamoto on one side, which mostly also focuses on the ‘double spend’ (network partition) problem, and there isn’t really any big solution going on here, but suffice it to say that it could be possible to have a PoW chain running in parallel, slowly replicating the Hub chain as a backup system, which nodes can fail over to connect to in such an event.

This would cover a lot of scenarios but it would almost completely close up the weakness of availability in TM’s design.

But I think the ultimate solution, as that one there is probably already in someone else’s mind as a countermeasure anyway, and if not, it is now, the third pillar can be made stronger by building a PA focused, mainly communication network system, like Kafka. This kind of connection completely closes up the possibility of chains losing contact with the hub, with a PoW chain running in slow motion with low user demand and primarily archive/backup purpose, a Kafka style unbreakable communication network, could form a secondary backup channel that keeps the (hopefully many thousands of) subchains moving, even if one or more of the hubs between them has shut down.

One last thing, that will also help, aside from using different architectures to provide backup service, is that it actually will be a huge advantage, so long as the protocol is robust, the implementations vary widely. This will ensure that in the chains of chains, one attack can’t take them all down.