The recent log4j vulnerability (often called ""log4shell") has made headlines and given a wider array of people insight into something that was previously invisible to them: namely open source software. Much of the media simply states that there is a problem, without describing what the problem is. Much of the materials in the technical press are expressed in ways that are inaccessible to people outside the software development industry.
Like most industries, software developers have their own jargon. Code means instructions given to computers, not ciphers. Libraries mean building blocks of code, not buildings full of books. The following introduction will endeavor to avoid jargon. It inevitably will fall short.
What is Open Source?
All non-trivial software these days are composed of building blocks, stacked one on top of another. There are many ways to categorize such building blocks, but for the purposes of this discussion we will divide these building blocks into two categories: business logic and administrative details. Business logic is often core to the competitive advantage of a product or a service that a company offers, and therefore generally is not shared with others. We call this closed source or proprietary software. As with everything there are exceptions, some businesses thrive by providing support or services for open source products, but in any case, this category is not the one that log4j occupies.
The other category is administrative, infrastructure software that's required but not really interesting from a business logic point of view. Such building blocks do something that needs to be done, and generally do it in a way that does not provide material competitive advantage. Some companies find it useful to make the source to these building blocks available to others free of charge as open source. To the extent that others have a similar interest, it may make sense to share the workload and benefits.
The Apache Software Foundation provides a vendor neutral place for such collaborations. In general, such collaborations involve a number of senior engineers who are employed by companies who intend to directly benefit from this collaboration. Individuals, members of academia, and members of the public sector who demonstrate merit are also welcome to participate as "invited experts" in our projects.
Log4J is an example of such a collaboration. One that has been vibrant and ongoing since its first public release in 2001.
It is not only those who participate in the development of these building blocks that can make use of them. A combination of high quality and low (in fact, no) acquisition cost, makes it attractive for companies to make use of these building blocks. In the case of log4j, the use is very widespread indeed.
What is the Log4J vulnerability?
This vulnerability involved three building blocks: Log4J, LDAP, and JNDI, only the first being developed at the ASF. A combination of intentionally designed features (as opposed to mistakes or bugs) could be used by a determined hacker to introduce a fourth, untrusted, building block that they supply into the mix. That untrusted building block could then do whatever mischief it desires.
LDAP and JNDI are older than Log4j by about a half dozen years. The potential vulnerability that enables the introduction of building blocks from untrusted sources resides in the JNDI building block, and is contained in a portion of that building block that isn’t as commonly used as it once was.
But JNDI by itself was, and is, only a potential vulnerability. The complementary code that made an exploitable vulnerability was added to log4j in 2013. The fact that the JNDI could be combined with another building block to produce a vulnerability wasn’t widely known at that time.
The potential vulnerability in JNDI was later identified in 2016 in a presentation at a security-related “Black Hat” conference. Buried in that presentation was a recommendation not to do exactly what log4j was doing. For whatever reason, that recommendation didn’t make it back to those who were maintaining log4j.
Even at this point, versions of log4j containing this vulnerability were widely deployed. While we will never know for certain, it seems unlikely that this vulnerability was widely exploited at that time. Had there been wide exploits, it seems likely that people would have investigated, and those investigations would have resulted in identifying the root problem.
Identification of the problem and deploying the fix
The vulnerability was discovered much later by a visual inspection of the code by a third party. This third party reported the problem to the log4j team in late November 2021. It took that team approximately two weeks to make a proper fix. Along the way, incomplete fixes were made available. While the fix was being developed, it became clear that the vulnerability was known by others, making it all the more important that the fix be made available quickly.
Once the fix was available, notifications including machine readable metadata were sent out within a half an hour.
Unfortunately, this is only the start of the supply chain. Businesses that made use of this building block then had to identify which (if any) of their products were affected, replace the Log4j building block with a new version, test the result, and make their fixed product available to their customers. Those customers in turn would need to install new versions. Deploying even a small change on this massive scale presents significant logistical challenges.
Given the visibility of this vulnerability, uptake has been faster than we have seen with prior issues; but even so, we expect that there will still be installations with vulnerabilities many months from now, or even years in some cases.
What have we learned / what can we do differently?
Unfortunately, there are no silver bullets. There are a number of partial measures, each of which individually is unlikely to make a material difference, but in combination may improve things.
- Automated code scans. These can be useful and can find real problems. They are unlikely, however, to find this specific type of problem - we are probably a decade or more away from having an AI powerful enough to find architectural problems such as this one. Such scans also require a skilled practitioner to identify which of the issues identified by the scans are real problems, as opposed to false positives.
- Manual inspection. Such reviews are labor intensive (and therefore costly) and the skills necessary to do one properly are in short supply. The best way to proceed is to prioritize highly critical building blocks first. Past efforts to identify highly critical building blocks didn’t identify Log4j as such, but such efforts are rapidly improving. Even so, any list of critical projects is going to miss something for someone. Everyone needs to pay attention to their own supply chain.
- Software Bill of Materials (SBOMs). We need to recognize that nothing will ever catch 100% of the issues. As stated previously, making the fix quickly is only the start of the supply chain. Being able to rapidly identify which products are affected by a given vulnerability, which SBOMs can enable, will reduce the time it takes for fixes to be deployed.
We continue to brainstorm for other ideas, and are working with other groups to bring these ideas to reality.
What is The Apache Software Foundation
The Apache Software Foundation (ASF) is a US 501(c)3 non-profit charity that acts as a steward for several hundred open source projects. The ASF is one of the largest open source organizations in the world, and is home to prolific projects such as Apache Hadoop, Apache Tomcat, Apache Cassandra, and scores of others. The ASF provides a known and vetted set of governance, intellectual property, and release processes as well as providing a vendor-neutral place for contributors to collaborate. However, within those processes there is a high degree of autonomy for each project. Projects decisions are driven by the people doing the work. The ASF Board of Directors is responsible for project oversight, ensuring the health and vitality of each project while generally not interfering with the projects technical direction.
In brief numbers the ASF has:
- 237 Million lines of code
- 650,000+ contributors
- 8,370+ committers (individuals who have earned the status of being able to directly change source code).
- 350+ projects
- Funding - about $2million per year
All of our software is licensed under the commercially-friendly Apache license (v2) and made freely available.
How the ASF addresses project security
- Each ASF project is governed by a Project Management Committee (PMC) consisting of individuals who have earned merit by working on the project.
- PMC members collectively manage their own community decisions regarding technical development and releases.
- Projects adhere to various foundation-wide policies, including for trademarks, making releases, and handling reported security vulnerabilities.
- ASF policies require PMC reviews with minimum levels of voting required for new releases.
- Changes to project source code are tracked and visible to all interested parties.
- The ASF has a security team and a VP of security who acts with board authority.
- The ASF security team sets a common policy, maintains security contacts for PMCs, and provides support for projects responding to security issues.
- We publish metadata about all vulnerabilities across all our projects in a consistent and machine-readable way.
- When faced with a critical vulnerability, we aim to mitigate the threat before disclosure.
- Initial reports are processed in private under rules for responsible disclosure.
- Once an issue becomes public, the focused attention often leads to additional issues being detected.
- We aim to be fast to release and fast to fix, even if that requires multiple releases.
Links / Further reading
- CVE-2021-44228 - mitigations
- Log4Shell (wikipedia)
- List of Software (un)affected
- Log4J Security Vulnerabilities
- BlackHat USA 2016
- Position Paper (provided to White House, EOP/NSC)
- White House Summit Statement