Automated metadata discovery provides analysis of a data source to create metadata that is linked to the asset description in Apache Atlas.
The automated metadata discovery project is in its very early stages. The proposal is to bring the Open Discovery Framework (ODF) and selected plug-in components from IBM's InfoSphere suite to provide the basis for a automated metadata discovery capability in Apache Atlas. The approach is to move the existing code to a public Git repository and then selective contribute the code through the JIRA process.
Automated metadata discovery has a number of requirements:
- Each data source and organization that creates it has its own standards and approaches that help in providing context to the metadata discovery process. This means that it needs to support the plugin of components that encode this local knowledge.
- The work of one discovery component needs to feed into more sophisticated discovery components - this implies Apache Atlas need to control pipelines of discovery components
This work is being managed under ATLAS-1748 - Automated Metadata Discovery Open .