Repository Scanning and Indexing
Assumption:
-default-archiva.xml is used for the configuration
Classes
Below are some of the important classes of Repository Scanning:
Class |
Implements |
What Does it do? |
---|---|---|
DefaultRepositoryScanner |
RepositoryScanner |
Makes use of plexus-utils' DirectoryWalker to scan the repository |
RepositoryScannerInstance |
DirectoryWalkListener |
Listener that sets the trigger to start the consumers. |
RepositoryContentStatistics |
generated by modello |
Contains the stats (duration, no. of files discovered, etc.) of the respository scan. |
TriggerBeginScanClosure |
Closure (commons-collections) |
Signals to the consumer(s) that the repository scanning will begin. |
DefaultBidirectionalRepositoryLayout |
BidirectionalRepositoryLayout |
Default bidirectional layout used by m2 repositories. |
ArchivaArtifact |
|
Archiva artifact object |
ArchivaArtifactModel |
generated by modello |
Contains the detailed attributes of an archiva artifact sa groupId, artifactId, version, checksums, etc. |
FileContentRecord |
LuceneRepositoryContentRecord |
Contains the contents of the artifact to be indexed. |
Repository Content Consumers (KnownRepositoryContentConsumer)
This is configured in archiva.xml, under <repositoryScanning>.
Class |
Role Hint |
What does it do? |
---|---|---|
ValidateChecksumConsumer |
validate-checksum |
Validate checksum files. |
LegacyConverterArtifactConsumer |
artifact-legacy-to-default-converter |
Converts legacy artifacts to m2 artifacts. |
ArtifactMissingChecksumConsumer |
create-missing-checksums |
Creates checksum if it is missing. |
AutoRemoveConsumer |
auto-remove |
Removes files in the repository being scanned if the file type matches any of the configured file types to be removed. |
AutoRenameConsumer |
auto-rename |
|
ArtifactUpdateDatabaseConsumer |
update-db-artifact |
Save the artifact (in the form of ArchivaArtifact) to the database. |
IndexContentConsumer |
index-content |
Processes the artifact's content into a FileContentRecord that is used for indexing. |
RepositoryPurgeConsumer |
repository-purge |
Removes old snapshots from the repository either by the number of days old or by the retention count. (See Repository Purge section below) |
The Process
- User clicks 'Scan Repository Now' in the Repositories page.
- Repository scanning is triggered.
- Start scanning:
- DefaultRepositoryScanner gathers the consumers (KnownContentConsumers and InvalidContentConsumers) from the config file. RepositoryScannerInstance is added as a DirectoryWalkListener to the plexus-utils DirectoryWalker. Start of scan is fired.
- Every file discovered will be checked if it is in the includes or excludes patterns that is set. If it doesn't exist in both, then it would be excluded. If it is included, then it will be processed by the consumers. Each consumer performs a different action in its processFile(...) method.
- Saving the artifact to the database is performed in the ArtifactUpdateDatabaseConsumer. An ArchivaArtifact, which has an ArchivaArtifactModel attribute, is constructed. The attributes of the ArchivaArtifactModel are gathered from the artifact itself e.g. groupId, artifactId, version came from the artifact's filepath.
- Indexing the artifact happens in the IndexContentConsumer, wherein an index record which contains the details of the artifact plus its contents. Please note that in the default-archiva.xml, the bundled files are not included in the indexable-content fileType pattern.1
- Once the repository scanning is finished, the scan statistics (number of files discovered, the consumers used, duration of the scan, the repository scanned, etc.) is listed or displayed in the console.
- User performs a search:
- User types the query string and hits the Search button.
- Archiva then searches its indices for the query string and returns the search results.
- The user can click on an artifact to browse it. Actually, what the user browses is the pom. At the back-end, Archiva checks if the project model is already in the database. If it is not, then archiva constructs the ArchivaProjectModel object and saves it to the database.1 Once it is already in the database, the pom info or artifact is displayed.
1 This causes the problem of different values when the actual pom file is read. The pom file may be invalid (e.g. it might have different versions as in the case of commons-dbcp-1.0 in MRM-376) and wasn't detected when it was added to the database (MRM-409).
Finding an Artifact
- The user browses for an artifact he/she wants to locate in the repositories.
- Archiva calculates the checksum for the artifact to be searched.
- The database is searched for the matching checksum using the ArtifactsByChecksumConstraint (search all artifacts where the calculated checksum matches either a SHA1 or MD5 checksum of an artifact in the database)