Vulnerability Coordination Retrospective: Cosmos Mainnet Security Advisory Magenta, 09-30-2019

On Friday, September 20th a high-severity security vulnerability in Tendermint that impacted all versions of Tendermint was reported through the Tendermint Bug Bounty program. The issue was reported at 7:27 UTC, with triage beginning immediately among core development teams who were operating in North America and Europe.

The Bug

The vulnerability that was reported demonstrated a flaw in Tendermint logic that, when exploited, could cause a remote panic. This bug could allow for a malicious attacker who sent a nil public key during a handshake to crash the application. Because Tendermint did not validate remote public keys in the handshake process, this vulnerability would have allowed for an attacker to carry out a Denial of Service attack against public sentry nodes on Tendermint-powered networks.

Per best practices for running a secure node, most validators who operate the Cosmos Hub choose to actively mitigate against the risk of sentry node attacks by not publicly exposing all of their sentry nodes on the network. For an attack launched against the network to be successful, the attacker would have to conduct significant reconnaissance on the network and its operators in order to collect a list of nodes that accept connections from the open internet, and then cause gaiad process to halt a number of nodes with a modified p2p implementation. Based on our current understanding of the topology of the Cosmos Network and the liveness properties of Tendermint (which require two thirds of the network to connect), it is extremely unlikely that this attack on public sentry nodes could have been leveraged against operators with enough stake to disrupt the network.

Investigation and Response

Within hours of the bug being initially reported, the triage team verified the issue and created tooling to monitor for exploitation of this vulnerability in the wild. This was done in parallel with the development of a patch by the core development teams, who were prepared to release the patch quickly if it became necessary. Though the team had the opportunity to rapidly respond if needed, 0 incidents of exploitation were surfaced, and the triage team opted for a more deliberate process to distribute patches downstream.

Remediation

Over the past few months, the core development team has been aware of issues (like a long-standing handshake malleability vulnerability) with the secret connection in Tendermint, and as a result they have been working to refactor and significantly improve this code. After validating the issue that was reported, members of the core development team began discussing technical solutions including a simple validation of the data the peer, and a more complex approach of wrapping all p2p interactions in goroutines to prevent panics from terminating the Tendermint process to resolve the issue. After a brief period of discussion, the team chose the simple data validation approach as the quickest, fastest way to resolve the issue.

After writing, reviewing, and testing the code comprising the patch, the Tendermint team made a patch available in versions 0.31.9, 0.32.5, and the CosmosSDK team made a patch available in version 0.34.8.

Vulnerability Coordination

Initially, the triage team had stated a preference for trying to merge the bug fix in time to roll the patch into the Cosmos Hub 3 upgrade scheduled for Tuesday, September 24 but it was decided that it was best to avoid interfering with the network upgrade and to plan for an alternate security patch. This allowed for the update to include the Go security patch that was released on September 26.

In parallel with the technical remediation work for the issue, the triage team kicked off our internal process for vulnerability notification. In the case of high and critical severity bugs, we are committed to providing pre-notification of security vulnerabilities 24 hours before publishing a public advisory to organizations, projects, and entities dependent on code we have developed.

As part of our round of pre-notification, we proactively reached out to several exchanges with a functional security@ inboxes, and we posted a message to the Cosmos forum letting the community know that a patch for a high-severity vulnerability would become available at 6am UTC on Wednesday, October 2. This time was specifically chosen to accommodate our community outside of North America.

Within a business day of pre-notifying impacted parties that we would be releasing a security update for Tendermint and the CosmosSDK, the Tendermint and CosmosSDK teams cut releases that fully remediated the issue once service providers patched their software and the reporter was awarded a bounty for the bug reported through HackerOne.

Retrospective

For security to be successful, it must be a shared responsibility across all stakeholders across the entire Cosmos ecosystem. More than anything else, we are extremely thankful for the ongoing contributions to security being made by our community, and we are thrilled that the work we have done to encourage reporting of vulnerabilities through security@tendermint.com and our HackerOne bug bounty program has been successful to date.

After discussing the coordination and timeline of this issue, the core triage team has identified several opportunities for action among stakeholders that would improve coordination in the future.

* As part of an internal retrospective, the core developers of the CosmosSDK and Tendermint will be improving the documentation of the software release process. By better documenting the process and including specific instructions that would expedite identifying and catching similar bugs,  we hope to reduce errors and find flaws in code more quickly in the future. Additionally, we will begin testing upgrades with forks of mainnet code as recommended  by bharvest. 

* Given the heightened importance of security in blockchain technology, all blockchain companies need to open a dedicated, monitored line of communication open for vulnerability coordination. Without this, it is nearly impossible to ensure that impacted stakeholders receive timely communications and can take quick action to remediate significant security issues; it is neither reasonable nor scalable to ask security teams to open support tickets through customer service to then hope that advisories  make it to the security or engineering teams in time for quick action.  As an industry, we desperately need to set a strong standard for coordination and disclosure of security bugs: more exchanges, development teams, and organizations who depend on our code and who may be impacted by security incidents to set up and monitor a security@ email.

* As part of our retrospective for Cosmos Mainnet Advisory Blue, our first emergency hard fork of mainnet, we began exploring options that would enable us to quickly, discreetly coordinate with Cosmos Hub validators in the case of a security emergency.  Several validators reached out to inquire about the status of the emergency mailing list, which was originally intended to be used only in cases where an emergency hard fork was required. Because there was significant interest in receiving more communication about security issues, we will  be taking significant steps over the next few weeks to improve security advisory distribution to our community, including:
	* Adding a security parameter to the CosmosSDK that allows validators to provide a security email address as a way to opt-in to emergency coordination measures. 
	* Distributing advisories via an RSS feed, in addition to posting advisories to the Cosmos Network forum.
	* Continuing to develop a security mailing list that will be used to distribute advisories to all stakeholders whether emergency forking is required or not. 

Special thanks to fudongbai for reporting this issue, and to both Zaki Manian and Jack Zampolin for their contributions to this post.

5 Likes

Update for Cosmos Mainnet Security Advisory Magenta

On Wednesday, October 9 a high-severity security vulnerability in Tendermint that impacted the security patch released in Tendermint versions 0.31.9 and 0.32.5, and Cosmos SDK version 0.34.8 was reported through the Tendermint Bug Bounty program. This issue was reported at 3:31 UTC, with triage beginning immediately among core development teams who were operating in North America, Europe, and Japan.

The Bug

In the patch for Cosmos Mainnet Security Advisory Magenta, the core development team initially chose the simpler of two options outlined in the initial advisory to remediate the code flaw that would have enabled an attacker to launch a Denial of Service attack against public sentry nodes on the Cosmos Network.

In Secret_connection.go at line 126 in v. 0.31.9 of Tendermint, a nil check was used where a type assertion was required per best practices in Go, and the handler was not wrapped in a panic rescuer to prevent the nodes from panic.

Investigation and Response

Within a couple of hours of cutting the initial security release, core developers discussed the first patch for the issue and had identified the risk that it may not sufficiently resolve the issue. Within an hour and a half of receiving and verifying the vulnerability report, the triage team continued previous monitoring of the network for exploitation. At the time of this post, we are unaware of any incidents of exploitation.

Remediation

After validating that the initial patch for Magenta was flawed, members of the core development team discussed technical solutions to better resolve the issue. The team chose to rewrite the patch following best practices in Go that include using type assertions, and opted for an additional mitigation by wrapping all p2p handler interactions in goroutines to prevent panics from terminating the Tendermint process to resolve the issue.

After writing, reviewing, and testing the code comprising the patch, the Tendermint team made a patch available in versions 0.31.10, 0.32.6, and the CosmosSDK team made a patch available in version 0.34.9.

Vulnerability Coordination

To quickly remediate this vulnerability, the triage team chose to fast-track the release of this patch as a hot fix in lieu of waiting until Monday, October 14th to cut a new software release. In this case, we have chosen not to observe our standard 24 hour pre-notification window for a security fix, and we began communications with impacted parties after the patch was made available to the public.

The issue will be fully remediated once service providers patch their software, and we recommend updating to the latest, most secure versions of the CosmosSDK and Tendermint.

Retrospective

After discussing the coordination and timeline of this issue, the core triage team has identified several opportunities for action among stakeholders that would prevent the recurrence of a similar issue in the future in the future.

As part of an internal retrospective, the core developers of the CosmosSDK and Tendermint will be improving their software review and testing processes to prevent errors in code like this one from making it into production software.

  • The core developers who were involved in writing the initial patch will review best practices in Go, with a focus on type assertions.
  • The testing suite for CosmosSDK and Tendermint will be tuned to improve detection of code flaws, and to prevent them from being introduced into production environments.

Special thanks to fudongbai for reporting this issue.

5 Likes