Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Repository Security

Info
title"Proposal"

These ideas have been collected into a single proposal - see Repository Security/

Nat Pryce

The default install of Maven is a security risk. A default install of
Maven automatically installs and runs code from the central repository
over the Internet using a protocol that has a number of vulnerabilities:

a) it downloads JARs using unsecured http
b) it doesn't authenticate the downloaded JARs are correct
c) it runs the code with the full permissions of the user without using
ANY of Java's sandbox mechanisms

The only "authentication" it does is to check the MD5 hash of the JAR
but it loads the canonical hash value from the SAME location as the JAR
so opening up the protocol to man-in-the-middle attacks. Alternatively,
if ibiblio itself is cracked (as has happened with debian and savannah,
for example), an attacker can easily replace a JAR with a hostile
version, and Maven cannot detect the attack.

If you use HTTP with no other security, there is NO way to tell if the
connection is actually being connected to where you want, or is being
intercepted by a man-in-the-middle attack. HTTPS would at least
guarantee that you are connected to the right repository and that data
is not being modified in transit. It wouldn't protect against a cracked
repository, but that can be done by public key crypto.

Currently, checking the MD5 checksum validates that the code has been
downloaded correctly but doesn't validate that the JAR is what it says
it is (e.g. it doesn't authenticate the JAR). If a cracker compromised
ibiblio they could replace a JAR and its MD5 with attack code to
piggyback that code into organisations on the back of the normal use of
Maven. The danger of a central repository of code is that it's a
tempting place to hide trojans or other attack code. As proyal said:
"you could do some *very * interesting stuff if you snook in a hijacked
xerces or commons-* jar into ibiblio.."

So, even with a secure HTTPS connection to the repository, an attacker
has a single, central weakpoint to attack. With the proposed new
scheme, where Maven downloads the MD5 hash from two locations, there are
two weakpoints.

A solution would be for the repository to store signatures for each
JAR. Whoever uploads the JAR to the repository will create a signature
with their private key. Their public key will be available from many
locations. Trust in public keys can either be created by a web of
trust, such as with PGP or GPG, or by signing the key by a trusted third
party whose key is well known and can trusted.

The keys of trusted uploaders should be explicitly passed to Maven by
the user after they have acquired them through some non-Maven mechanism
and verified them. They should not automatically downloaded by Maven.
This process should be made as easy as possible by writing a utility.

Maven can use the public keys it knows about to verify that the JARs it
downloads from the repository are actually what they purport to be.

I would seriously suggest getting a security expert to check the
protocol Maven uses. I'm no security expert so this scheme might have
some weak points.

Judging from talking with existing Maven users, security is something
that many people don't think about. They assume that Maven will do the
right thing because it is presented as an example of best practices for
Java projects. Considering some of the organisations that use Maven
(banks, for example), I think security should be a top priority.

Julian C Dunn

A number of key security issues have been identified in the way Maven
resolves and retrieves dependencies. These issues have been identified in
documents written by Nat Pryce and John Casey:

http://docs.codehaus.org/display/MAVEN/Repository+-+Security http://docs.codehaus.org/display/MAVEN/Repository+-+Security+by+nat+pryce

Casey proposes to tighten up the repository upload procedure, which is a
good first step. However, signing all artifacts (and in particular, the
ongoing workload of needing to distribute derivative certificates)
may prove to be too onerous a procedure.

Pryce proposes an reasonable solution from a security perspective: storing
JAR signatures in the repository as well. Presumably users would then
obtain the public keys of the signers through some non-Maven means.
However, whether this would actually happen is a matter of discussion. My
suspicion is that most people (aside from the truly paranoid) don't even
do this for the code they download from the main Apache repository, e.g
httpd.

A solution that I propose is to adopt the model taken by both the FreeBSD
and NetBSD projects for their software packaging. In the "ports" (FreeBSD)
or "pkgsrc" (NetBSD) systems, the checksums and sizes of the source
tarballs are stored in a separate location, i.e. in the ports/pkgsrc CVS
repository itself. Access to CVS is tightly controlled, just as it would
be for the actual code itself. The checksums and sizes are transferred
to the users' machines by a secure mechanism, i.e. CVS over SSH or cvsup.

Adapting the model to the Maven project would entail the following:

1. Store checksums and sizes of JARs/distributions at Apache.org. Apply the
same control to the checksum repo as would be kept for actual CVS commits
to the code.
2. When Maven requires an artifact, it securely (e.g. over HTTPS)
retrieves the size/checksum information from Apache.org.
3. It then retrieves the artifact from ibiblio. If ibiblio has been
compromised, the checksum/size will not match and Maven would refuse
to continue.

References:
-----------

1 Using FreeBSD Ports:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ports-using.html

2 NetBSD Pkgsrc:
http://www.netbsd.org/Documentation/pkgsrc/

3 NetBSD Pkgsrc info about 'distinfo' where checksum/size info is stored:
http://www.netbsd.org/Documentation/pkgsrc/components.html#components.distinfo

John Casey

This is a simple Copy/Paste from an email I sent out to the maven2 users.

Recently, an issue has come up which I'm sure most or all of you are aware of. After some brief discussion on the list, and at the request of other maven developers, I'm sending this email as part justification, part summary, and part proposed solution.

Basically, at issue is the security (or lack thereof) of the maven repository, especially on ibiblio. What assurances can we offer maven users that the libraries they download are in fact published by each project's team and free of malicious code? My assessment is that the security of the repo could be compromised in several ways.

First, there is very little security involved in the MAVEN-UPLOAD process via Jira. Essentially, if we can find the library referenced in the ticket, we will upload it to ibiblio (if I'm wrong here, please correct me...I want this to be an accurate depiction). If a user request, we could easily be duped into throwing up a corrupted (read: malicious) version of a library.

Second, since the only nod to security we give is on the front side (upload to ibiblio), we can't guarantee the continued integrity of the libs in the repo. For instance, if I gain illicit access to ibiblio's web server and drop in a new library (say, junit-3.8.1.jar), maven users have no way to know that the library has been compromised and is bogus. First build they run, I'm on their machine and executing God knows what (maven doesn't run in a very restrictive sandbox, remember).

When it was first presented to me, I didn't put too much stock in this argument. I thought that nearly every packaging system was based on blind trust, and why should maven concern itself with something so ubiquitous? If I download a .rpm from somehost.com and install it, I'm trusting that site to maintain benign and uncorrupted rpms. Even when MD5's are generated, they are placed side-by-side with the original file (something we're doing now in the repo, to verify download integrity).

In light of steadily escalating focus on security, this will not always be an acceptable way to package and distribute software. The market is becoming more and more sensitized to the security ramifications of software, especially networked software. Also, retrofitting a release of maven with security enhancements is no trivial task. This is important to realize, because we're not just talking about securing the client side, but also the repo itself, which could automatically affect unpatched clients.

I guess the best thing we came up with in the list was to basically mimic the debian process for establishing trust for all publishers granted access to ibiblio.

(see //www.debian.org/devel/join/newmaint)

This will probably require something like setting up a root certificate for maven, then using that to issue derivative certs to approved publishers. Then, in order to upload, the cert chain with which the library was signed will have to verify up to our root...we could even restrict the authorization of that cert to a particular path on the repo (fe, I could only upload to the /commonjava path on www.ibiblio.org/maven). This could take care of security on the upload side of things.

As for the client (download) side, we'd still want to support the multiple/custom repo option (maven.repo.remote), so we'd probably want to squawk loudly if the repo didn't have a key in the local keystore (or maybe if the lib itself didn't have a cert derived from one in the local keystore...I dunno). Ths user can still choose to use a proprietary/custom remote repo, but we'd make sure he understood that we won't guarantee the integrity of its contents.

One nice side effect of signing/keying everything is that trusted publishers will have an automatic access mechanism for publishing to ibiblio: wagon scp, using their key. They could even use this key to enable SCM snapshot tagging from the CI environment at build time (CVS extssh). CI integration in general would probably become much cleaner...

I know I'm mangling years of carefully crafted security jargon in this email, so just s///g whatever I'm saying wrong with the right term/concept. More than anything, I'm hoping to (a) raise awareness and (b) offer a rough framework for solving or even just discussing this issue. I'm trying to poke you all and get you talking about this. (smile)

The approach I've discussed above may sound draconian, but I think we should be trying to build m2 around some concept of verifiable trust. I think the current industry climate demands it, and the current unprotected repo model will not allow mass adoption of maven without it...

Thanks for reading this novel, and I look forward to your comments.