Repository Scanning in Archiva

Scanning of a repository is done periodically to ascertain what has changed in the repository.

  1. On the first scan, the entire repository is scanned.
  2. On subsequent scans, only content that is new or changed since the last scan is picked up.
  3. The scan is required to pick up content that arrives into the repository via non-monitored means.
    • Content that arrives via a WebDAV PUT is automatically processed.
    • Content that arrives via a Proxy Request is automatically processed.

The Scan Lifecycle.

  1. All content falls into 3 categories CONSUMED, IGNORED, UNKNOWN.
    • CONSUMED content is content that is managed by Archiva.
    • IGNORED content consists of generated content or transient content.
    • UKNOWN content is what falls throught the cracks in the above 2 categories. Typically, this means the content doesn't conform to the repository structure, or is generally unknown.

The lifecycle of a scan is as follows.

  1. Perform a SCAN with an inclusion filter of "*/" and an exclusion filter containing those elements predetermined to be IGNORED.
  2. On identification of a file, attempt to resolve it to an Artifact object.
    1. If a valid Artifact object is created, flag as CONSUMED, store it in the Database.
    2. If not able to convert to an Artifact object, flag as UNKNOWN, create report entry in ARTIFACT_HEALTH database table.

CONSUMED Files

Include Pattern

Type

Consumed By

**/*.pom

MavenProject

Convert to Project Model.
Save Model to Database.
Auto Convert embedded <repositories>
Auto Convert embedded <pluginRepositories>
Lucene XML contents.
Lucene Effective POM contents.

**/*.jar

Artifact (jar)

Convert to Artifact Model.
Generate Missing Hashcodes.
Compute JDK Revision.
Determine Sealed.
Save Model to Database.
Lucene Archive TOC.
Lucene Classnames.
Lucene Public Methods.

**/*.ear

Artifact (ear)

(same as jar)

**/*.war

Artifact (war)

(same as jar)

**/*.car

Artifact (car)

(same as jar)

**/*.sar

Artifact (sar)

(same as jar)

**/*.mar

Artifact (mar)

(same as jar)

**/*.rar

Artifact (rar)

(same as jar)

**/*.dtd

Artifact (dtd)

Convert to Artifact Model.
Generate Missing Hashcodes.
Save Model to Database.
Lucene DTD contents.

**/*.tld

Artifact (dtd)

Convert to Artifact Model.
Generate Missing Hashcodes.
Save Model to Database.
Lucene TLD contents.

**/*.tar.gz

Artifact (distribution)

Convert to Artifact Model.
Generate Missing Hashcodes.
Save Model to Database.
Lucene Archiva TOC.

**/*.tar.bz2

Artifact (distribution)

(same as *.tar.gz)

**/*.zip

Artifact (distribution)

(same as *.tar.gz)

**/*.sha1

Hashcode

Report on Saved Hashcode to Actual Hashcode.

**/*.md5

Hashcode

Report on Saved Hashcode to Actual Hashcode.

**/*.asc

Signature

Report on signature validation.

**/maven-metadata.xml

Repository Metadata

Convert to Repository Model
Cross Validate listed versions to available versions in  repository.
Save Model to Database.
Lucene XML contents.

**/*\-site.xml

Site Metadata

Lucene file contents.

**/*.xml

Xml Content

Lucene file contents.

**/*.html

Html Content

Lucene file contents.

**/*.block

Auto-Xml/Text Content

Lucene file contents.

**/*.config

Auto-Xml/Text Content

Lucene file contents.

**/*.xsd

Xml Content

Lucene file contents.

**/*.txt

Text Content

Lucene file contents.

**/*.TXT

Text Content

Lucene file contents.

**/*.bar

Binary Content

- no direct consumption -

**/*.nbm

Binary Content

- no direct consumption -

IGNORED Content

Content in this category is never indexed, nor reported as bad or unknown. It exists on disk solely for the benefit of the client using Archiva.

Pattern

Reason

**/.htaccess

Web server specific content control mechanism.

**/KEYS

GPG Signatures File.  Not used by Archiva directly.

**/*.rb

Ruby script file.

**/*.sh

Shell screipt file.

**/.svn/**/*

Subversion Control Directory.

**/.DAV/**/*

DAV Server Control Directory.

UNKNOWN / BAD Content

Content that does not fit into the above categories are automatically placed into this category.
However, some UNKNWON / BAD Content is well understood, and can have a 'Quick Fix' associated with it.

Pattern

Type

Quick Fix Option

**/*.bak

Backup File

Remove from repository

**/*~

Backup File

Remove from repository

**/*-

Backup File

Remove from repository

**/*.distribution-tgz

Distribution Artifact from M1

Rename to *.tar.gz

**/*.distributino-zip

Distribution Artifact from M1

Rename to *.zip

**/*.plugin

Plugin from M1

Rename to *.jar

  • No labels