Repository Scanning in Archiva
Scanning of a repository is done periodically to ascertain what has changed in the repository.
- On the first scan, the entire repository is scanned.
- On subsequent scans, only content that is new or changed since the last scan is picked up.
- The scan is required to pick up content that arrives into the repository via non-monitored means.
- Content that arrives via a WebDAV PUT is automatically processed.
- Content that arrives via a Proxy Request is automatically processed.
The Scan Lifecycle.
- All content falls into 3 categories CONSUMED, IGNORED, UNKNOWN.
- CONSUMED content is content that is managed by Archiva.
- IGNORED content consists of generated content or transient content.
- UKNOWN content is what falls throught the cracks in the above 2 categories. Typically, this means the content doesn't conform to the repository structure, or is generally unknown.
The lifecycle of a scan is as follows.
- Perform a SCAN with an inclusion filter of
"*/"
and an exclusion filter containing those elements predetermined to be IGNORED. - On identification of a file, attempt to resolve it to an Artifact object.
- If a valid Artifact object is created, flag as CONSUMED, store it in the Database.
- If not able to convert to an Artifact object, flag as UNKNOWN, create report entry in ARTIFACT_HEALTH database table.
CONSUMED Files
Include Pattern |
Type |
Consumed By |
---|---|---|
**/*.pom |
MavenProject |
Convert to Project Model. |
**/*.jar |
Artifact (jar) |
Convert to Artifact Model. |
**/*.ear |
Artifact (ear) |
(same as jar) |
**/*.war |
Artifact (war) |
(same as jar) |
**/*.car |
Artifact (car) |
(same as jar) |
**/*.sar |
Artifact (sar) |
(same as jar) |
**/*.mar |
Artifact (mar) |
(same as jar) |
**/*.rar |
Artifact (rar) |
(same as jar) |
**/*.dtd |
Artifact (dtd) |
Convert to Artifact Model. |
**/*.tld |
Artifact (dtd) |
Convert to Artifact Model. |
**/*.tar.gz |
Artifact (distribution) |
Convert to Artifact Model. |
**/*.tar.bz2 |
Artifact (distribution) |
(same as *.tar.gz) |
**/*.zip |
Artifact (distribution) |
(same as *.tar.gz) |
**/*.sha1 |
Hashcode |
Report on Saved Hashcode to Actual Hashcode. |
**/*.md5 |
Hashcode |
Report on Saved Hashcode to Actual Hashcode. |
**/*.asc |
Signature |
Report on signature validation. |
**/maven-metadata.xml |
Repository Metadata |
Convert to Repository Model |
**/*\-site.xml |
Site Metadata |
Lucene file contents. |
**/*.xml |
Xml Content |
Lucene file contents. |
**/*.html |
Html Content |
Lucene file contents. |
**/*.block |
Auto-Xml/Text Content |
Lucene file contents. |
**/*.config |
Auto-Xml/Text Content |
Lucene file contents. |
**/*.xsd |
Xml Content |
Lucene file contents. |
**/*.txt |
Text Content |
Lucene file contents. |
**/*.TXT |
Text Content |
Lucene file contents. |
**/*.bar |
Binary Content |
- no direct consumption - |
**/*.nbm |
Binary Content |
- no direct consumption - |
IGNORED Content
Content in this category is never indexed, nor reported as bad or unknown. It exists on disk solely for the benefit of the client using Archiva.
Pattern |
Reason |
---|---|
**/.htaccess |
Web server specific content control mechanism. |
**/KEYS |
GPG Signatures File. Not used by Archiva directly. |
**/*.rb |
Ruby script file. |
**/*.sh |
Shell screipt file. |
**/.svn/**/* |
Subversion Control Directory. |
**/.DAV/**/* |
DAV Server Control Directory. |
UNKNOWN / BAD Content
Content that does not fit into the above categories are automatically placed into this category.
However, some UNKNWON / BAD Content is well understood, and can have a 'Quick Fix' associated with it.
Pattern |
Type |
Quick Fix Option |
---|---|---|
**/*.bak |
Backup File |
Remove from repository |
**/*~ |
Backup File |
Remove from repository |
**/*- |
Backup File |
Remove from repository |
**/*.distribution-tgz |
Distribution Artifact from M1 |
Rename to *.tar.gz |
**/*.distributino-zip |
Distribution Artifact from M1 |
Rename to *.zip |
**/*.plugin |
Plugin from M1 |
Rename to *.jar |