Historic Data Integrity on Storage and Validation


#1

Projects such as BigChainDB use external storage to save the Blockchain data. Similarly I am planning to use external storage to save large scale data and use Cosmos SDK to provide the blockchain capability on top of it (distributed nodes validate all the entries as they are being added onto the storage).

One question I have is, how Tendermint ensures the correctness of old blocks (that are on the storage)?

Recently there are incidents of ransomware corrupting the data on storage (actually it encrypted the data, demanding money to provide the decryption key). Essentially there is nothing (in Tendermint) that prevents anyone to access the database or harddisk directly and selectively overwrite parts of the record data.

I understand that block hashes has to match and changing the data will result in hash mismatches. The question I have is: when the old data on the storage is changed, how does Tendermint (or my app built on top of it), would know that somewhere on the disk some hashes are mismatching for some records?

Is the complete chain validated at all times (whenever a new block is being added)?

For example, on one node, one could open one year back records (on the database), and change the name / owner and save it. The hashes could be mismatching but no one really notices it, since it is old data. And then after a day or two, repeat the same on another node, and so on, till majority of the nodes are in inconsistent state.

Given that you cannot get back the data from hash, and that the system does not believe the minority nodes (if any, which may still have the original data floating somewhere), the veracity of the data becomes questionable.

Now, this may not have created new facts (since hashes do not match), but certainly created a scenario where the recorded facts became questionable. If someone disputes the records, there is no way to verify / validate the them.

How Tendermint prevents these kind of situations?

PS: If ransomeware could encrypt the whole data, it can certainly choose to overwrite portions of the data (and I personally know IT companies that got 30+ machines affected and locked up till they paid money). The worst part is, at the midnight their automated daily backup systems kicked in, literally replacing their pure backups with encrypted data copies, before anyone could understand what was going on in the morning.


#2

Essentially there is nothing (in Tendermint) that prevents anyone to access the database or harddisk directly and selectively overwrite parts of the record data.

Correct

Is the complete chain validated at all times (whenever a new block is being added)?

No it is not.

How Tendermint prevents these kind of situations?

Currently it doesn’t. If someone tries to gossip such a block, it will not be possible to validate. For instance if your node is so compromised, and I’m downloading blocks from you, I will mark you as a bad peer for sending me this bad block.

This isn’t something we can really deal with short of running an integrity check on the entire blockchain all the time.

We could add support for something like that in the future, but it’s not really a priority now because you can always remove your data and resync from the network.

For this to be a problem, the attack would have to compromise more than 2/3 of the validators. If they manage to do so, well, to some extent they deserve to win :wink:

That said, it would be interesting to add such integrity checking in the future. I think other databases do this under the guise of “anti-entropy”. It’s something we should look into one day, but I don’t think it’s really been a problem for any existing blockchain networks yet either, and they would have the same issue.


#3

Thank you @ebuchman

It is good to know the capabilities and limitations of the system, so that we can use the SDK more effectively. Your explanation was helpful in clarifying things.

My understanding currently is that, only new / latest proposed blocks/records come to gossip and old records usually tend to not come in validation again - is it correct? I am trying to understand the scenarios when an old block might come into gossip again. If the compromised block in question is, say 1 year old record, and assume that there are many blocks (records) after it, would there be a situation where this 1 year old record comes into validation again?

I totally agree.

But I am coming from a slightly different perspective. In a distributed system (such as these blockchains), the attack need not be centralized or coordinated anymore. For a distributed system, there will be distributed compromises / attacks. It is not required that these attackers are working together or for a single goal - it could be that each one is trying to alter the truth (the blockchain data) to their own advantage.

Imagine after the election is over, every party trying to alter the data to make the votes in their own favor. This may not elect them as the winner (since the hashes do not match), but it certainly makes the voting results questionable.

Human greed is powerful element. You might have heard the phrase “history is constantly being rewritten” from George Orwell’s 1984 (“Who controls the past controls the future: who controls the present controls the past,”).

Blockchain is taking a huge responsibility of recording the truth. And for human beings, most of the times, truth is not that important, especially as time progresses. They will do everything in their power to change the history in their own favor, if they get a chance.