CHIPs discussion phase: Partial Set Security (updated)

Click to open previous draft discussion post

Partial Set Security is a reimagining of Opt-in Security (thanks to @effortcapital for the idea), which would allow only a subset of Hub validators to run a physical node for each consumer chain, while still allowing each consumer chain to be secured by the full stake of the Hub. This would work by allowing validators who were not running a physical node to delegate to a validator running a physical node. The implementation of this feature should be straightforward (while still being a lot of work), but before it is started, several questions need to be answered.

We’re posting this to get all of the questions we have identified into one place where they can be discussed, as well as soliciting the community to ask any questions that we may have missed.

High level questions

Is token toxicity a good analytical framework?

The design of Partial Set Security revolves around the idea that if the entire stake of a provider chain does not secure a consumer chain, it may not be secure against attacks that validators cannot be slashed for. These types of attacks may be incorrect execution in the absence of fraud or validity proofs, and liveness attacks. They are stopped in the single chain case by token toxicity- the idea that validators do not carry out these attacks because the chain’s staking token would crash in price if they did. Token toxicity should also hold for shared security techniques where the entire stake of the provider secures the consumer, such as Replicated Security. I’ve written about token toxicity more here.

Partial Set Security is intended to preserve token toxicity by allowing the entire stake of the provider to secure each consumer, even if every single validator does not run a physical node. But token toxicity is not completely proven. It is a good explanation for the continuing operation and security of most blockchains, but it is very hard to quantify. For example: does token toxicity hold if 99% of the provider’s stake is staked on a consumer? It very likely does. Does token toxicity hold if 0.1% of the provider’s stake is staked on the consumer? It very likely does not. But where is the line?

We’d love to get some more thoughts about token toxicity, and whether it is a good framework for shared security questions.

Are the issues with the idea of validator delegation?

Are there any issues with validator delegation that are not present with normal delegation? It’s not even really clear that normal delegation is a good idea, but it seems to be working pretty well so far. Could validator delegation have unforeseen interactions?

Is running physical nodes the main cost?

Partial Set Security addresses the main criticism of Replicated Security- the high cost of making every validator on the Hub run every consumer chain. But how much of the cost of validation is node operation and how much is the risk of slashing? This has not been quantified. Will validators balk at being forced to delegate to another validator? Can proportional slashing (a protocol where accidental double signs which only affect a single validator incur far lower slashes) mitigate these concerns?

Implementation questions

How to handle slashing?

If a validator delegates instead of running a physical node, its delegators must be slashed in the case that the validator it has delegated to commits an offense and is slashed. But should the delegating validator itself be tombstoned? How is downtime handled?

How to handle commission?

It is obvious that a validator which delegates its power should get a lower commission than a validator which runs a physical node. But there are several possible approaches.

No commission for delegating validators

The simplest approach is to make it so that validators which delegate to other validators receive no commission for that consumer chain, with the validator running the physical node receiving it all. The delegating validator’s reward is simply not being penalized for downtime.

Commission split

However, the delegating validator does incur some risks. If they are tombstoned for offenses, this is an obvious risk. Even if they are not tombstoned but their delegators are slashed, it is a reputational risk. For this reason, it might be appropriate to allow for a split of the commission between the physical node running and the delegating validators. But who sets the overall commission, and the split? And is this really necessary? After all, a validator who is concerned about these risks can just run a physical node.

Variable commission

Maybe it is necessary to allow validators to set a different commission per consumer chain. After all, some chains may be more expensive to run than others.

How to enforce the minimum physical node requirement?

This is probably the meatiest question about this design: how to ensure that a minimum amount of validation power (probably ⅓ is good) runs physical nodes? In the Partial Set Security paper, we lay out 4 alternatives. The first two, stopping consumer chains that go below this threshold immediately, and forcing the last validators running a consumer chain to continue running it to avoid going below the threshold can probably be dismissed out of hand. This leaves two viable options:

Top ⅓ are required to run all consumer chains

In this option, the top ⅓ of validators would be required to run all consumer chains. This would ensure that all consumer chains had the minimum amount of physical nodes needed for safety. Validators outside of the top ⅓ could still run physical nodes, stopping and starting them as consumer chains became more and less profitable. This is great because it is extremely simple, but it also has downsides. It could act as a centralization vector, as validators might prefer to delegate to the top ⅓ to avoid having to change their delegations because they would know that the top ⅓ would always run physical nodes. Conversely, it could also act as a Sybil incentive, since large validators might break up their stake to stay out of the top ⅓ and benefit from the optionality of being able to stop and start consumer chains. It might also lead to strange performance characteristics as the number of physical nodes in a consumer chain’s set might fluctuate with the chain’s token price and profitability,

Validators must make a periodic commitment to run a physical node

In this variant, validators would need to commit to running a physical node for a consumer chain for a certain length of time, maybe 6 months. This would be more complicated, but might have better incentives for validators. We would need to figure out how the timing and mechanism of this would work. I can think of at least 3 different ways:

  • Every 6 months, validators decide which consumer chains they will run physical nodes of for the next 6 months. All consumer chains share the same cycle.
  • For a given consumer chain, every 6 months all validators decide whether they want to run physical nodes for the next 6 months. Consumer chains do not share the same cycle.
  • Validators can start running on a consumer chain any time, but once they start, they must continue for 6 months.

After lots of discussion with validators, consumer chains, and community members (most of it occurring outside of this thread), we’ve made some big updates and simplifications to how partial set security will work. The biggest change is that we have gotten rid of validator delegation. Validators were just not willing to do it. New discussion draft is below:

Partial Set Security

The next major update of Interchain Security will introduce Partial Set Security. Partial Set Security increases the flexibility available to consumer chains, allowing them to dial in the right mix of economics, validator set agility, and security. Additionally, it will let validators choose whether or not they want to validate on any given consumer chain in most cases.

These changes will allow consumer chains to launch much more quickly and reduce the workload on Hub validators while allowing consumer chains to choose how much security they need.

How Partial Set Security improves on Replicated Security

Currently, consumer chains are created with a governance proposal, and then get the entire security of the Hub’s validator set. This is a simple system, with guaranteed high security, but it lacks flexibility. Running more chains puts more work on validators, without necessarily increasing their rewards by much. This is not a problem for large validators who earn millions in commission, but it is a strain on smaller validators.

It also puts a lot of pressure on consumer chains generate enough in rewards to pay the validators for their work. It does not allow consumer chains to select the level of security that they need, or to scale security as they grow their TVL.

Finally, the need to get a governance proposal to pass to create a consumer chain limits the rate of growth of ICS, due to the friction involved.

Feature A: Opt-in consumer chains - permissionless or permission-lite

Partial Set Security could enable consumer chains to be launched permissionlessly, without a governance proposal. A consumer chain can be added with a simple transaction, as long as the chain ID is not currently in use (details on anti-squatting measures to follow).

Once a consumer chain has been created in this way, validators can opt in to validate it if they want to. We expect many consumer chains will be launched this way, and will be able to grow their validator sets organically.

Validators (and their delegators) are only entitled to rewards from a consumer chain if they are opted in. Some validators may only opt in to consumer chains with attractive rewards, and some may still opt into every consumer chain so that their delegators never have to worry about missing out.

Permission-lite

It may be advantageous to use the existing governance interface to launch opt-in consumer chains. The idea here is that only validators who voted YES would be opted in to run a consumer chain. Validators could vote ABSTAIN to signal that they don’t want to run a consumer chain themselves, but don’t want to stop other validators from running it. Governance votes for opt-in consumer chains would use a much lower quorum threshold, so it would not be hard for such a vote to pass. The idea here would not be to regulate which chains could join (although the community might show up to vote NO on outright scams), but just to use the existing governance interface on the Cosmos Hub that validators are familiar with.

More writing on this option here.

Feature B: Top-n

While most consumer chains will likely launch with opt-in, some high profile consumers may want a guaranteed level of security. This is what top-n provides.

The top-n for a consumer chain specifies what percentage of the Hub’s security that consumer chain would like to guarantee. When the consumer chain starts, the top n percent of the Hub’s validator set will be obligated to run the consumer chain.

Even though validators outside of that top-n percent are not obligated to run the consumer chain, they can still choose to run it if they want. Many probably still will so that they don’t miss out on rewards, and as a selling point to consumers.

Let’s look at a few scenarios:

  • A top-n of 100% is equivalent to Replicated Security.
  • With a top-n of 65%, a consumer chain gets more than half of the Hub’s economic security, but with only 23 validators.
  • A top-n below 33% is not possible. This is important for incentive reasons, so that the top validators can never be forced to run a consumer chain they don’t want (with 33% they could veto the proposal).

Downtime

On an opt-in consumer chain, when a validator is jailed, they are simply opted out of that consumer chain automatically, and cannot opt back in for a time period. There is no jailing on the Hub.

If a consumer is using top-n, then the validators in their top-n set can still be jailed on the Hub for downtime. This mechanism provides top-n’s guaranteed level of security.

Validator power cap

Each consumer chain can also set a cap on the power that any individual validator can have on their chain. This has several purposes.

  • It can prevent a large validator entering a small consumer chain from immediately controlling the chain.
  • It can prevent downtime from a few large validators on a consumer halting the chain.

It’s best not to use this too heavily to avoid distorting the PoS system, but a reasonable cap can have a beneficial effect on the chain’s Nakamoto coefficient.

We’ll release some analysis on different top-n and cap scenarios soon.

Fraud votes

Shared security systems where a partial stake of the provider chain secures consumer chains can be vulnerable to a security challenge known as the subset problem. To put it simply, this is a situation where a malicious subset of the provider chain’s validator set control a consumer chain and attack it in a way that they cannot be slashed for. This is not an issue for systems where the whole stake of the provider secures consumer chains, such as Replicated Security. The subset problem is a potential issue for Partial Set Security.

Fraud votes are a way to mitigate the subset problem. A fraud vote is a way for Cosmos Hub governance to slash validators that misbehave on consumer chains. It is simply a governance proposal that will slash a validator(s) if it passes. It is to be used strictly for slashing validators who commit incorrect execution on a consumer chain, for example taking all of the money out of someone’s wallet.

Fraud proofs and zk validity proofs solve this problem without a vote, so once these technologies are working in Cosmos, we will be able to remove fraud votes.

This is a powerful tool, but it will have some limits that prevent a proposal from even being created if certain conditions are not met:

  • Validators cannot be slashed for an offense on a consumer chain they are not validating.
  • Fraud votes do not apply to consumer chains with more than 1/3 of the Hub’s stake.
  • Fraud votes do not apply to consumer chains using top-n, since these cannot have less than 1/3 of the Hub’s stake.
9 Likes

So, I just wanted to see how you define “physical node”. At notional we are defining that as an on site machine. Commitments from validators to do that would frankly improve hub security a great deal.

Does this mean that we’re going to try to prohibit vaas on the cosmos hub? Personally I think that is a good idea, though there are of course challenges with that, like the fact that it isn’t currently possible to prove the existence of self-operated hardware or softeware.

Sorry, maybe something isn’t clear in the explanation here.

In Partial Set Security, if a validator does not want to run a node for a given consumer chain, they can delegate their stake to another validator who runs the node. We refer to this as a “physical node”.

1 Like

I bet you explained it totally fine and I missed that.

Thanks for clearing that up!

Also, I think this is elegant.

Note: I scrolled up and actually it was pretty confusing. Suggest that physical be taken out of the above description, since we can just call it a node and it won’t be ambiguous anymore.

2 Likes

Yea I agree. We had at one point been calling them “physical nodes” and “virtual nodes” but that was confusing and unnecessary and this is just a vestige of that. Another problem with this post is that it is a post about a linked paper so if you just read the post it might not be clear.

I will drop a comment in tomorrow with a glossary and brief description and maybe reformat the paper and this post to make it clearer.

1 Like

Thanks for sharing this idea but it seems there are many similarities with the soft opt-out idea?
The top validators are ok running consumer chains and they have the resources for that, soft opt-out gives an option to the bottom 5% by voting power to soft opt-out, this idea gives the option to the top validators as well.

In soft-opt out the validators who choose this option at the bottom 5% still receive the rewards from consumer chains (although still negligable anyway) and are part of the validator set, just liveness would be affected slightly hence why 5% bottom voting power was chosen and not a larger value like 10%. This new idea removes the rewards from the validators who ‘soft opt-out’.

Then it is mentioned ‘The delegating validator’s reward is simply not being penalized for downtime.’ this is also similar to Soft opt-out for the bottom 5%, there is no jailing for downtime for not running consumer chains.

The issues that we are trying to prevent are liveness failures with one third of voting power down, and two thirds of voting power malicious.
For these two issues I think that we should pay more attention to uptime and governance participation of validators.
Uptime: a great uptime is an indication that upgrades are done fast, efficiently and on time, as well as high quality infrastructure. To minimize the risk of liveness failures the long term uptime of validators should be also taken into account, ie. what is more secure, a consumer chain being run with 50% voting power of the top validators by uptime, or 50% voting power with the worst validators by uptime?

Governance: which validator is more likely to be malicious, a new or recent validator who doesn’t vote or participate here in the forum, or a genesis validator since 2019 who voted on most proposals and it is very active here in the forum? Which consumer chain would be more secure, one run mostly by high governance participation validators being active for years in the forum, or one run by mostly new and unkown validators?

There are 180 validators in the Cosmos Hub active set, after several years there is a lot of data about long term performance/uptime as well as governance participation and more. This data should be used and analysed to determine different risk levels of each validator for liveness failures or for being part of a two thirds malicious attack. These risk levels might also affect the rewards distributions for example or other variables, I don’t think assuming all 180 validators are the same and have the same risk profile is the best approach.

2 Likes

I think we might sunset soft opt out if we added partial set security, since it provides another way for small validators to opt out.

The difference with this is that validators are required to either run a node, or delegate to another validator who runs a node, instead of just letting the smallest validators have no involvement with ICS at all.

I don’t think that these are factors that we would want to take into account in-protocol. They are easy to game. However, they are definitely factors that validators who aren’t running a physical node should take into account when choosing which other validator to delegate to. One of the interesting things about this is that I think that validators are actually better equipped to judge other validators than delegators are. This could have a minor decentralizing effect if validators prefer to delegate to small validators to run physical consumer chain nodes instead of the top 7.

Let’s analyse, how could governance participation rate in non-spam proposals be gamed? Some validators have over 90% overall governance participation, how could some new malicious validators game this? Firstly, they would need to put many spam proposals, and if the validators who previously had 90% governance participation vote in all the spam proposals, they will still have much higher governance participation rate than the malicious validators.
How could uptime be gamed by some new malicious validators? If only long-term average uptime performance over many months or years is taken into account, the malicious validators’ uptime wouldn’t be considered until after a reasonable amount of time. Also, the malicious validators wouldn’t try to game it by having bad uptime, but instead by having great uptime and then when selected attack with downtime. But having really great uptime is not so easy even if they try, there are only a handful of validators with fewer than 100 blocks missed in the last 3 months.
But the other idea of EffortCapital about the vote power tax is even better than this idea, because it provides revenue to the small validators to run many consumer chains, so no need for this partial set idea or the soft opt out, and also it would greatly improve decentralization over time, thus increasing the value of replicated security.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hey, we’ve changed a lot of how PSS will work, and updated this post. All comments above this one are on the old post.

3 Likes

So, if i get this correctly:

  • A new ICS chain arrives, lets call her BettyX
  • BettyX puts a prop onchain
  • Validator 1-10 vote, yes, validator 11 votes abstain and validator 12 votes no
  • Does this mean that validators 10 and 11 will be allowed to do a soft opt out and validators 1-10, will not be allowed to soft opt out?
1 Like

Any validator can opt out at any time. The validators who voted yes go into the consumer chain’s starting validator set.

2 Likes

Ty! One last clarification: validators can opt out after already running an ICS chain? I.E. Validator bob runs 3 ICS chains. Bob realizes that its too much for them and decided to opt out mid way from chain ICS 1 and 2.

What do you think about PSS vs a rollup centric model with decentralized sequencers
I think we are making it too complex with all these jargons like Top-n, opt-in, permissionless, permission-lite. Sometimes less is more. If a consumer chain has to think about so many paradigms, I think they will get confused and will add unnecessary friction to adoption.
I also don’t feel like top-n validators should be forced to run a consumer chain. Many of these top validators don’t vote for regulatory concerns etc why should they be forced to run a consumer chain?
Also if a top n validator does not want to run such consumer chains, they will likely split the validator. Hence this mechanism actively promotes sybilling in some sense?

I also feel like this top down approach of PSS is not right where delegators are locked with their validators on all PSS chains if I understand correctly?
Mesh security got it right to have delegators under control for choosing their validators on other mesh chains. I think a similar mechanism should be applied to PSS where some n no of validators opt in as a decentralized sequencer and delegators can chose to stake some derivative of ATOM to them to provide economic security?
In such a model, no one is forced to do something which they don’t want and there can be all sorts of consumer chains ranging from single sequencer rollups(Ethereum style) to entire Hub validator set acting as sequencers to a consumer chain(current Replicated Security style)
This also lets smaller validators keep their delegators on base chain without being forced to run unnecessary consumer chains, hence maintains decentralization
I am just thinking out loud on this and its not necessarily a solid point

Wouldn’t this start a FOMO among validators to be part of the new chains and vote yes?

1 Like

Maybe it gives the flexibility to consumer chains to select what level of security they need to start, but is it limiting the other validators to even join? meaning if a CS starts with 65% TopN. Now Top 23 validators need to join the set to start the chain but, is this everything the CS needs or validators below in the ranks can also join?

Regarding the latest update on the thread, we wanted to express our support for this CHIP implementation. Very elegant and flexible design whilst keeping features as simple as possible :ok_hand:. Thanks for this qualitative work. We see no particular improvement recommendations here.

2 Likes

@Ghazni_Stakecito

  • On the subject of jargon, you’re seeing all of it because we’ve opened up the design process here on the forum and are weighing different options against each other. When this is in production, we will have settled on a much smaller set of options and simplified the messaging to make it understandable.
  • On the subject of top-n, top validators (and all other validators as well) are currently forced to run Replicated Security consumer chains. Top-n gives us the ability to transition these to Partial Set Security without changing their security properties too much.

This is definitely a key difference between PSS and Mesh. I personally think that there is somewhat of a benefit in the simplicity of PSS. Delegators choose a validator and let them handle decisions of risk/reward of validation, as they specialize in it. Consumer chains get validators signing up and get their full stake immediately. Validators get to leverage their knowledge of the space, and finally have something other than marketing and uptime vanity metrics to differentiate themselves.

Of course all this can also be implemented in terms of mesh security, with various frameworks around it, but that stuff is not currently done yet.

1 Like