CHIPs discussion phase: Partial Set Security

Partial Set Security is a reimagining of Opt-in Security (thanks to @effortcapital for the idea), which would allow only a subset of Hub validators to run a physical node for each consumer chain, while still allowing each consumer chain to be secured by the full stake of the Hub. This would work by allowing validators who were not running a physical node to delegate to a validator running a physical node. The implementation of this feature should be straightforward (while still being a lot of work), but before it is started, several questions need to be answered.

We’re posting this to get all of the questions we have identified into one place where they can be discussed, as well as soliciting the community to ask any questions that we may have missed.

High level questions

Is token toxicity a good analytical framework?

The design of Partial Set Security revolves around the idea that if the entire stake of a provider chain does not secure a consumer chain, it may not be secure against attacks that validators cannot be slashed for. These types of attacks may be incorrect execution in the absence of fraud or validity proofs, and liveness attacks. They are stopped in the single chain case by token toxicity- the idea that validators do not carry out these attacks because the chain’s staking token would crash in price if they did. Token toxicity should also hold for shared security techniques where the entire stake of the provider secures the consumer, such as Replicated Security. I’ve written about token toxicity more here.

Partial Set Security is intended to preserve token toxicity by allowing the entire stake of the provider to secure each consumer, even if every single validator does not run a physical node. But token toxicity is not completely proven. It is a good explanation for the continuing operation and security of most blockchains, but it is very hard to quantify. For example: does token toxicity hold if 99% of the provider’s stake is staked on a consumer? It very likely does. Does token toxicity hold if 0.1% of the provider’s stake is staked on the consumer? It very likely does not. But where is the line?

We’d love to get some more thoughts about token toxicity, and whether it is a good framework for shared security questions.

Are the issues with the idea of validator delegation?

Are there any issues with validator delegation that are not present with normal delegation? It’s not even really clear that normal delegation is a good idea, but it seems to be working pretty well so far. Could validator delegation have unforeseen interactions?

Is running physical nodes the main cost?

Partial Set Security addresses the main criticism of Replicated Security- the high cost of making every validator on the Hub run every consumer chain. But how much of the cost of validation is node operation and how much is the risk of slashing? This has not been quantified. Will validators balk at being forced to delegate to another validator? Can proportional slashing (a protocol where accidental double signs which only affect a single validator incur far lower slashes) mitigate these concerns?

Implementation questions

How to handle slashing?

If a validator delegates instead of running a physical node, its delegators must be slashed in the case that the validator it has delegated to commits an offense and is slashed. But should the delegating validator itself be tombstoned? How is downtime handled?

How to handle commission?

It is obvious that a validator which delegates its power should get a lower commission than a validator which runs a physical node. But there are several possible approaches.

No commission for delegating validators

The simplest approach is to make it so that validators which delegate to other validators receive no commission for that consumer chain, with the validator running the physical node receiving it all. The delegating validator’s reward is simply not being penalized for downtime.

Commission split

However, the delegating validator does incur some risks. If they are tombstoned for offenses, this is an obvious risk. Even if they are not tombstoned but their delegators are slashed, it is a reputational risk. For this reason, it might be appropriate to allow for a split of the commission between the physical node running and the delegating validators. But who sets the overall commission, and the split? And is this really necessary? After all, a validator who is concerned about these risks can just run a physical node.

Variable commission

Maybe it is necessary to allow validators to set a different commission per consumer chain. After all, some chains may be more expensive to run than others.

How to enforce the minimum physical node requirement?

This is probably the meatiest question about this design: how to ensure that a minimum amount of validation power (probably ⅓ is good) runs physical nodes? In the Partial Set Security paper, we lay out 4 alternatives. The first two, stopping consumer chains that go below this threshold immediately, and forcing the last validators running a consumer chain to continue running it to avoid going below the threshold can probably be dismissed out of hand. This leaves two viable options:

Top ⅓ are required to run all consumer chains

In this option, the top ⅓ of validators would be required to run all consumer chains. This would ensure that all consumer chains had the minimum amount of physical nodes needed for safety. Validators outside of the top ⅓ could still run physical nodes, stopping and starting them as consumer chains became more and less profitable. This is great because it is extremely simple, but it also has downsides. It could act as a centralization vector, as validators might prefer to delegate to the top ⅓ to avoid having to change their delegations because they would know that the top ⅓ would always run physical nodes. Conversely, it could also act as a Sybil incentive, since large validators might break up their stake to stay out of the top ⅓ and benefit from the optionality of being able to stop and start consumer chains. It might also lead to strange performance characteristics as the number of physical nodes in a consumer chain’s set might fluctuate with the chain’s token price and profitability,

Validators must make a periodic commitment to run a physical node

In this variant, validators would need to commit to running a physical node for a consumer chain for a certain length of time, maybe 6 months. This would be more complicated, but might have better incentives for validators. We would need to figure out how the timing and mechanism of this would work. I can think of at least 3 different ways:

  • Every 6 months, validators decide which consumer chains they will run physical nodes of for the next 6 months. All consumer chains share the same cycle.
  • For a given consumer chain, every 6 months all validators decide whether they want to run physical nodes for the next 6 months. Consumer chains do not share the same cycle.
  • Validators can start running on a consumer chain any time, but once they start, they must continue for 6 months.
5 Likes

So, I just wanted to see how you define “physical node”. At notional we are defining that as an on site machine. Commitments from validators to do that would frankly improve hub security a great deal.

Does this mean that we’re going to try to prohibit vaas on the cosmos hub? Personally I think that is a good idea, though there are of course challenges with that, like the fact that it isn’t currently possible to prove the existence of self-operated hardware or softeware.

Sorry, maybe something isn’t clear in the explanation here.

In Partial Set Security, if a validator does not want to run a node for a given consumer chain, they can delegate their stake to another validator who runs the node. We refer to this as a “physical node”.

I bet you explained it totally fine and I missed that.

Thanks for clearing that up!

Also, I think this is elegant.

Note: I scrolled up and actually it was pretty confusing. Suggest that physical be taken out of the above description, since we can just call it a node and it won’t be ambiguous anymore.

Yea I agree. We had at one point been calling them “physical nodes” and “virtual nodes” but that was confusing and unnecessary and this is just a vestige of that. Another problem with this post is that it is a post about a linked paper so if you just read the post it might not be clear.

I will drop a comment in tomorrow with a glossary and brief description and maybe reformat the paper and this post to make it clearer.

1 Like

Thanks for sharing this idea but it seems there are many similarities with the soft opt-out idea?
The top validators are ok running consumer chains and they have the resources for that, soft opt-out gives an option to the bottom 5% by voting power to soft opt-out, this idea gives the option to the top validators as well.

In soft-opt out the validators who choose this option at the bottom 5% still receive the rewards from consumer chains (although still negligable anyway) and are part of the validator set, just liveness would be affected slightly hence why 5% bottom voting power was chosen and not a larger value like 10%. This new idea removes the rewards from the validators who ‘soft opt-out’.

Then it is mentioned ‘The delegating validator’s reward is simply not being penalized for downtime.’ this is also similar to Soft opt-out for the bottom 5%, there is no jailing for downtime for not running consumer chains.

The issues that we are trying to prevent are liveness failures with one third of voting power down, and two thirds of voting power malicious.
For these two issues I think that we should pay more attention to uptime and governance participation of validators.
Uptime: a great uptime is an indication that upgrades are done fast, efficiently and on time, as well as high quality infrastructure. To minimize the risk of liveness failures the long term uptime of validators should be also taken into account, ie. what is more secure, a consumer chain being run with 50% voting power of the top validators by uptime, or 50% voting power with the worst validators by uptime?

Governance: which validator is more likely to be malicious, a new or recent validator who doesn’t vote or participate here in the forum, or a genesis validator since 2019 who voted on most proposals and it is very active here in the forum? Which consumer chain would be more secure, one run mostly by high governance participation validators being active for years in the forum, or one run by mostly new and unkown validators?

There are 180 validators in the Cosmos Hub active set, after several years there is a lot of data about long term performance/uptime as well as governance participation and more. This data should be used and analysed to determine different risk levels of each validator for liveness failures or for being part of a two thirds malicious attack. These risk levels might also affect the rewards distributions for example or other variables, I don’t think assuming all 180 validators are the same and have the same risk profile is the best approach.

2 Likes

I think we might sunset soft opt out if we added partial set security, since it provides another way for small validators to opt out.

The difference with this is that validators are required to either run a node, or delegate to another validator who runs a node, instead of just letting the smallest validators have no involvement with ICS at all.

I don’t think that these are factors that we would want to take into account in-protocol. They are easy to game. However, they are definitely factors that validators who aren’t running a physical node should take into account when choosing which other validator to delegate to. One of the interesting things about this is that I think that validators are actually better equipped to judge other validators than delegators are. This could have a minor decentralizing effect if validators prefer to delegate to small validators to run physical consumer chain nodes instead of the top 7.

Let’s analyse, how could governance participation rate in non-spam proposals be gamed? Some validators have over 90% overall governance participation, how could some new malicious validators game this? Firstly, they would need to put many spam proposals, and if the validators who previously had 90% governance participation vote in all the spam proposals, they will still have much higher governance participation rate than the malicious validators.
How could uptime be gamed by some new malicious validators? If only long-term average uptime performance over many months or years is taken into account, the malicious validators’ uptime wouldn’t be considered until after a reasonable amount of time. Also, the malicious validators wouldn’t try to game it by having bad uptime, but instead by having great uptime and then when selected attack with downtime. But having really great uptime is not so easy even if they try, there are only a handful of validators with fewer than 100 blocks missed in the last 3 months.
But the other idea of EffortCapital about the vote power tax is even better than this idea, because it provides revenue to the small validators to run many consumer chains, so no need for this partial set idea or the soft opt out, and also it would greatly improve decentralization over time, thus increasing the value of replicated security.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.