On Friday, September 20th a high-severity security vulnerability in Tendermint that impacted all versions of Tendermint was reported through the Tendermint Bug Bounty program. The issue was reported at 7:27 UTC, with triage beginning immediately among core development teams who were operating in North America and Europe.
The Bug
The vulnerability that was reported demonstrated a flaw in Tendermint logic that, when exploited, could cause a remote panic. This bug could allow for a malicious attacker who sent a nil public key during a handshake to crash the application. Because Tendermint did not validate remote public keys in the handshake process, this vulnerability would have allowed for an attacker to carry out a Denial of Service attack against public sentry nodes on Tendermint-powered networks.
Per best practices for running a secure node, most validators who operate the Cosmos Hub choose to actively mitigate against the risk of sentry node attacks by not publicly exposing all of their sentry nodes on the network. For an attack launched against the network to be successful, the attacker would have to conduct significant reconnaissance on the network and its operators in order to collect a list of nodes that accept connections from the open internet, and then cause gaiad
process to halt a number of nodes with a modified p2p implementation. Based on our current understanding of the topology of the Cosmos Network and the liveness properties of Tendermint (which require two thirds of the network to connect), it is extremely unlikely that this attack on public sentry nodes could have been leveraged against operators with enough stake to disrupt the network.
Investigation and Response
Within hours of the bug being initially reported, the triage team verified the issue and created tooling to monitor for exploitation of this vulnerability in the wild. This was done in parallel with the development of a patch by the core development teams, who were prepared to release the patch quickly if it became necessary. Though the team had the opportunity to rapidly respond if needed, 0 incidents of exploitation were surfaced, and the triage team opted for a more deliberate process to distribute patches downstream.
Remediation
Over the past few months, the core development team has been aware of issues (like a long-standing handshake malleability vulnerability) with the secret connection in Tendermint, and as a result they have been working to refactor and significantly improve this code. After validating the issue that was reported, members of the core development team began discussing technical solutions including a simple validation of the data the peer, and a more complex approach of wrapping all p2p interactions in goroutines to prevent panics from terminating the Tendermint process to resolve the issue. After a brief period of discussion, the team chose the simple data validation approach as the quickest, fastest way to resolve the issue.
After writing, reviewing, and testing the code comprising the patch, the Tendermint team made a patch available in versions 0.31.9, 0.32.5, and the CosmosSDK team made a patch available in version 0.34.8.
Vulnerability Coordination
Initially, the triage team had stated a preference for trying to merge the bug fix in time to roll the patch into the Cosmos Hub 3 upgrade scheduled for Tuesday, September 24 but it was decided that it was best to avoid interfering with the network upgrade and to plan for an alternate security patch. This allowed for the update to include the Go security patch that was released on September 26.
In parallel with the technical remediation work for the issue, the triage team kicked off our internal process for vulnerability notification. In the case of high and critical severity bugs, we are committed to providing pre-notification of security vulnerabilities 24 hours before publishing a public advisory to organizations, projects, and entities dependent on code we have developed.
As part of our round of pre-notification, we proactively reached out to several exchanges with a functional security@
inboxes, and we posted a message to the Cosmos forum letting the community know that a patch for a high-severity vulnerability would become available at 6am UTC on Wednesday, October 2. This time was specifically chosen to accommodate our community outside of North America.
Within a business day of pre-notifying impacted parties that we would be releasing a security update for Tendermint and the CosmosSDK, the Tendermint and CosmosSDK teams cut releases that fully remediated the issue once service providers patched their software and the reporter was awarded a bounty for the bug reported through HackerOne.
Retrospective
For security to be successful, it must be a shared responsibility across all stakeholders across the entire Cosmos ecosystem. More than anything else, we are extremely thankful for the ongoing contributions to security being made by our community, and we are thrilled that the work we have done to encourage reporting of vulnerabilities through security@tendermint.com and our HackerOne bug bounty program has been successful to date.
After discussing the coordination and timeline of this issue, the core triage team has identified several opportunities for action among stakeholders that would improve coordination in the future.
* As part of an internal retrospective, the core developers of the CosmosSDK and Tendermint will be improving the documentation of the software release process. By better documenting the process and including specific instructions that would expedite identifying and catching similar bugs, we hope to reduce errors and find flaws in code more quickly in the future. Additionally, we will begin testing upgrades with forks of mainnet code as recommended by bharvest.
* Given the heightened importance of security in blockchain technology, all blockchain companies need to open a dedicated, monitored line of communication open for vulnerability coordination. Without this, it is nearly impossible to ensure that impacted stakeholders receive timely communications and can take quick action to remediate significant security issues; it is neither reasonable nor scalable to ask security teams to open support tickets through customer service to then hope that advisories make it to the security or engineering teams in time for quick action. As an industry, we desperately need to set a strong standard for coordination and disclosure of security bugs: more exchanges, development teams, and organizations who depend on our code and who may be impacted by security incidents to set up and monitor a security@ email.
* As part of our retrospective for Cosmos Mainnet Advisory Blue, our first emergency hard fork of mainnet, we began exploring options that would enable us to quickly, discreetly coordinate with Cosmos Hub validators in the case of a security emergency. Several validators reached out to inquire about the status of the emergency mailing list, which was originally intended to be used only in cases where an emergency hard fork was required. Because there was significant interest in receiving more communication about security issues, we will be taking significant steps over the next few weeks to improve security advisory distribution to our community, including:
* Adding a security parameter to the CosmosSDK that allows validators to provide a security email address as a way to opt-in to emergency coordination measures.
* Distributing advisories via an RSS feed, in addition to posting advisories to the Cosmos Network forum.
* Continuing to develop a security mailing list that will be used to distribute advisories to all stakeholders whether emergency forking is required or not.
Special thanks to fudongbai for reporting this issue, and to both Zaki Manian and Jack Zampolin for their contributions to this post.