In the spring of 2020, it really mattered to me what the definition of “software supply chain security” ought to be. I was working at In-Q-Tel, a strategic investor for the US intelligence community, and co-authoring a research paper that attempted to measure the frequency of software supply chain attacks. We picked a definition that emphasized instances of malicious software being widely distributed through an existing distribution channel. Almost as soon as the paper was published though, the definition broke down, repeatedly.
Now, writing in the summer of 2024, several major software supply chain incidents later, I’ve come to accept that the only workable definition, which I’ll discuss below, is broad—so broad that a careful observer might even accuse the definition of encompassing all of software security.
In short, this piece is a reflection on my wrestling with the definition of software supply chain security: pinning down a definition, confronting mounting evidence of the definition’s flaws, and accepting a broader definition.
The original definition
While drafting a research paper later published as “Counting Broken Links” my co-authors and I methodically combed through old news reports, whitepapers, GitHub issues, and other sources to find all known software supply chain attacks. One day, the lead author, Dan Geer, who is something like the Gandalf of quantitative computer security research, challenged us to provide a definition of a “software supply chain attack,” the very thing we intended to count. We settled on this definition: software supply chain attacks “intentionally insert malicious functionality into build, source, or publishing infrastructure or into software components with the goal of propagating that malicious functionality through existing distribution methods.” In short, it was all about the distribution of malicious code through existing channels.
The article proposed two major types of software supply chain attacks and nine minor types. Based on the historical attack data we collected, we argued that the two major types were attacks on “build, source, and publishing infrastructure” and attacks on “software registries.” For instance, back-doored compilers, popularized in the O.G. article on software supply chain security by Ken Thompson, fall in the first category; the legions of malicious open source software packages that have been discovered over the past 10 years would fall in the second. The table below provides the categories and data in the original report.
Count of Reports:Attacks by major and minor categories
Major type | Build, source, and publishing infrastructure | Software registry | |||||||
---|---|---|---|---|---|---|---|---|---|
Minor type | Build system compromise | Firmware implant | Source code system compromise | Publishing: Certificate attack | Publishing: Delivery system compromise | Account takeover | Dependency compromise | Malicious package | Typosquatting |
Count | 11:13 | 7:32 | 9:39 | 6:18 | 29:35 | 11:14 | 12:333 | 51:1,373 | 15:1,247 |
Table from Geer, Tozer and Meyers, “Counting Broken Links,” USENIX ;login:, December 2020. Note: Each cell in the count row provides both the count of “reports” and “attacks” separated by a colon. A “report” is a public disclosure of one or more software supply chain attacks, e.g., a blog post from a security researcher. An “incident” is a single instance of an attack reaching a target, e.g., the download of a compromised application from a download server. |
The “Counting Broken Links” article with this table was published the same week in late 2020 as SolarWinds, the mother of all software supply chain attacks. During this compromise, Russian intelligence operatives corrupted the build process of SolarWinds, a major network software company, and implanted malicious code that traveled via SolarWinds’ own software updates to its customers. Our definition was consistent with this attack. We presented at the NSA’s Science of Security conference and our GitHub repository with the underlying data started gaining traction. But then everything started to fall apart.
The definition breaks down
A mere three months later, a new type of attack materialized that didn’t fit within the existing typology. A new attack type called “dependency confusion” was coined when security researcher Alex Birsan self-published a Medium article sub-titled “How I Hacked into Apple, Microsoft, and Dozens of Other Companies.” What was clever about this new attack type is how it took advantage of the non-intuitive behavior of package managers, allowing an attacker to trick developers into downloading malicious code from an external package registry rather than, as planned, an internal package registry. While similar to typosquatting, which was already a minor category in our typology, this attack didn’t actually involve a typo. Our original definition of software supply chain security had already been stretched. We added another minor category and moved on.
Then in December 2021, Log4shell happened and the “internet was on fire.” Now our typology suffered a mortal wound. The earlier typology focused exclusively on the insertion of malicious code, but the Log4shell vulnerability didn’t involve malicious code. Nevertheless, Log4shell clearly represented a widespread vulnerability in the software supply chain. It was an easily exploited and severe vulnerability, introduced by a flaw in a widely popular open source Java logging library. The episode revealed a crucial flaw in our existing definition of software supply chain security: unintentional security flaws in widely used open source software had no place. That original typology, for the purposes of my career, was dead only 18 months after invention.
Accepting a broader definition
Upon reflection, the “supply chain” aspect of software supply chain security suggests the crucial ingredient of an improved definition. Software producers, like manufacturers, have a supply chain. And software producers, like manufacturers, require inputs and then perform a manufacturing process to build a finished product. In other words, a software producer uses components, developed by third parties and themselves, and technologies to write, build, and distribute software. A vulnerability or compromise of this chain, whether done via malicious code or via the exploitation of an unintentional vulnerability, is what defines software supply chain security. I should mention that a similar, rival data set maintained by the Atlantic Council uses this broader definition. (Full disclosure: I’m now a non-resident fellow at the Atlantic Council. If you can’t beat ‘em, join ‘em.)
I admit to still having one general reservation about this definition: It can feel like software supply chain security subsumes all of software security, especially the sub-discipline often called application security. When a developer writes a buffer overflow in the open source software library your application depends upon, is that application security? Yep! Is that also software supply chain security? Yep! Perhaps the subsuming of software security by software supply chain security is inevitable in an era in which software development depends so heavily on open source software. Research I co-authored suggests that the typical claim that 80 to 90 percent of modern software applications are actually open-source code is, in fact, a conservative estimate. Our measurements indicated that some smaller software applications are more than 99 percent open source software.
In short, I’ve come to accept that new attack types will continue to occur, forcing the creation of new “minor” categories, and that the broader definition too will likely need to evolve. In other words, writing in the summer of 2024, it now matters to me a lot less what the definition of “software supply chain security” ought to be.
John Speed Meyers is the head of Chainguard Labs at Chainguard. He is also a non-resident senior fellow at the Atlantic Council.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.