You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Repository Scanning and Indexing


Registry Listeners

A RegistryListener (plexus-registry) is an interface that receives notification for every change in the Registry. There are a handlful of classes in Archiva that implements this and performs some processes every time there's a change in the configuration.

Class

What does it do?

DuplicateArtifactsConsumer

Looks or checks for duplicate artifacts using SHA1 checksum.

LocationArtifactsConsumer

Validates if the location of the artifact in the repository is correct based on the groupId, artifactId and version specified in the pom.

ArtifactMissingChecksumConsumer

Create missing checksum for the artifact.

ArtifactUpdateDatabaseConsumer

 

AutoRemoveConsumer

 

IndexArtifactConsumer

Index the artifact checksums for 'Find Artifact' functionality. It stores the data as hashcodes in the index (HashcodesRecord).

IndexContentConsumer

 

ProjectModelToDatabaseConsumer

Update database with project model info.

ActiveManagedRepositories

Provides a real-time listing of the active managed repositories within Archiva.

ConfigurationSynchronization

Synchronizes the repositories in the configuration file with the database.

DefaultArchivaConfiguration

Configuration holder that retrieves the configuration from the registry.

DefaultCrossRepositorySearch

Search across repositories in Lucene indices. It gets or filters which are the managed and indexed repositories.

DefaultRepositoryProxyConnectors

Handlers for potential repository proxy connectors.

DefaultArchivaTaskScheduler

Default scheduling component for Archiva.

BidirectionalRepositoryLayoutFactory

Creates a BidirectionalRepositoryLayout.

RepositoryProjectModelResolverFactory

Creates ProjectModelResolver objects.

RepositoryServlet

 




Database Scanning

The database is scanned and specific consumers process these artifacts.

Database Consumers

There are 2 types of database consumers:

  1. Unprocessed consumers - consumers for those artifacts already in the index that haven't been processed yet, meaning the details about the artifact are not yet processed and stored in the database
  2. Cleanup consumers - consumers for cleaning up the database

These consumers are configured in archiva.xml, under <databaseScanning>. Below are the different types of Database Consumers:

Class

Role Hint

Type

What does it do?

ProjectModelToDatabaseConsumer

updated-db-project

unprocessed consumer

Gets the details of the artifact from the pom and saves it into the database (as a project model)

DatabaseCleanupArtifactConsumer

not-present-remove-db-artifact

cleanup consumer

Cleans the database of artifacts that are no longer in the repository

DatabaseCleanupProjectConsumer

not-present-remove-db-project

cleanup consumer

Cleans the database of project models of artifacts that are no longer in the repository

DatabaseCleanupLuceneConsumer

not-present-remove-indexed

cleanup consumer

Cleans up the index of artifacts that are no longer in the repository




Repository Purge

Remove old snapshots from the managed repository based on a criteria: By Number of Days Old and By Retention Count. There is also the option to enable or disable the cleanup of released snapshots from the repository.

Classes

Below are the classes for Repository Purge:

Class

Implements

What Does it do?

RepositoryPurgeConsumer

KnownContentConsumer

Consumer for removing old snapshots from the managed repository

DaysOldRepositoryPurge

RepositoryPurge

Remove old snapshots by the number of days old.

RetentionCountRepositoryPurge

RepositoryPurge

Remove old snapshots but retaining a specific number of it.

CleanupReleasedSnapshotsRepositoryPurge

RepositoryPurge

Remove old snapshots that have already been released.

ArtifactFilenameFilter

FilenameFilter (java.io)

Filter the filenames from the directory listing by checking if it matches a specific filename.


Configuration (for Archiva Users)

  1. To enable repository purge, add "repository-purge" in the <knownContentConsumers> section of the archiva.xml. The RepositoryPurgeConsumer will be executed when repository scanning is started.
  2. The user can choose whether to purge the repository of snapshots older by a specific number of days OR to purge the repository of snapshots but retaining a specific number of that snapshot. This can be configured by specifying specific values in the "Repository Purge By Days Older Than" or "Repository Purge By Retention Count" fields in the Add/Edit Repository page. By default, these has "100" and "2" values respectively. If "Repository Purge By Days Older" is NOT EQUAL TO 0 (zero), then that would be the criteria used for the repository purge. Otherwise, if it is EQUAL TO 0 (zero) then the "Repository Purge By Retention Count" criteria is used instead.
  3. To enable/disable the cleanup of released snapshots in the repository, the user can opt to check or uncheck the "Delete Released Snapshots" option in the Add/Edit Repository page.

The Process

  1. RepositoryPurgeConsumer is executed during repository scanning. Only those "artifact" file types are consumed (<fileType> with "artifact" id in archiva.xml).
  2. The consumer will check the if the deleteReleasedSnapshots field (in RepositoryConfiguration) is enabled. If so, then it will execute CleanupReleasedSnapshotsRepositoryPurge.
    • CleanupReleasedSnapshotsRepositoryPurge will remove all released snapshots from the repository. For example: 1.2, 1.3-SNAPSHOT and 1.3 exists for artifactX in the repo. 1.3-SNAPSHOT will be removed since 1.3 already exists (therefore it has already been released). All metadata files are updated based on the remaining versions of the artifact in the repository.
  3. The consumer will also check the value of the daysOlder field in the configuration of the repository being scanned. If it is not set to 0 (zero), then the consumer will execute the DaysOldRepositoryPurge. Otherwise, it would execute the RetentionCountRepositoryPurge.
    • DaysOldRepositoryPurge checks when the discovered SNAPSHOT artifact was last modified and if it is older by X (daysOlder value) days then the artifact will be removed from the repository.
    • RetentionCountRepositoryPurge on the other hand, checks if the number of "unique versioned" snapshot artifacts in the directory where the discovered artifact resides is LESS THAN the retentionCount value. If the contents are greater than the retention count, then the oldest snapshot artifact (including associated poms, source jars, javadoc jars, etc.) are removed until the total # of unique versioned artifacts is EQUAL TO the retention count. For example, the discovered artifact is ../artifactX/2.0-SNAPSHOT/artifactX-2.0-SNAPSHOT.jar. RetentionCountRepositoryPurge will get a list of the files in ../artifactX/2.0-SNAPSHOT directory. Lets say, ../artifactX/2.0-SNAPSHOT has the ff. contents: artifactX-2.0-1111111-1.jar, artifactX-2.0-1111111-1.pom, artifactX-2.0-1111100-2.jar, artifactX-2.0-1111100-2.pom, artifactX-2.0-SNAPSHOT.jar and artifactX-2.0-SNAPSHOT.pom. If the retention count is 2, then artifactX-2.0-1111111-1.jar and artifactX-2.0-1111111-1.pom are removed from the repo and the 2 newest artifacts (and its associated files, in this case the poms) are retained.
  4. For all these RepositoryPurge implementations, all removed artifacts from the repository are also removed from the database.1

1 There is an open issue related to this, please see MRM-455. Aside from this, there is also an open issue regarding the index update after repo purge MRM-454.




Repository Browse

Browse artifacts in the repository...


Repository Search

Search for artifacts from the managed repositories...


Reporting

Repository problems report..


Repository Configuration (Managed and Remote Repository)




Proxy Connectors

Please see Archiva Proxy Connector page.


Network Proxies



  • No labels