Atlas provides governance capabilities in Hadoop that use both a prescriptive and forensic models enriched by business taxonomical metadata. Atlas, at its core, is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements.
The core capabilities defined by the project include the following:
- Data Classification – to create an understanding of the data within Hadoop and provide a classification of this data to external and internal sources
- Centralized Auditing – to provide a framework for capturing and reporting on access to and modifications of data within Hadoop
- Search and Lineage – to allow pre-defined and ad-hoc exploration of data and metadata while maintaining a history of how a data source or explicit data was constructed
- Security and Policy Engine – to protect data and rationalize data access according to compliance policy.
The Atlas community plans to deliver those requirements with the following components:
- Flexible Knowledge Store,
- Advanced Policy Rules Engine,
- Agile Auditing,
- Support for specific data lifecycle management workflows built on the Apache Falcon framework, and
- Integration and extension of Apache Ranger to add real-time, attribute-based access control to Ranger’s already strong role-based access control capabilities.
Atlas targets a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop while ensuring integration with the whole data ecosystem. Apache Atlas is organized around two guiding principals:
- Metadata Truth in Hadoop: Atlas should provide true visibility in Hadoop. By using both a prescriptive and forensic model, Atlas provides technical and operational audit as well as lineage enriched by business taxonomical metadata. Atlas facilitates easy exchange of metadata by enabling any metadata consumer to share a common metadata store that facilitates interoperability across many metadata producers.
- Developed in the Open: Engineers from Aetna, JPMorgan Chase, Merck, SAS, Schlumberger, and Target are working together to help ensure Atlas is built to solve real data governance problems across a wide range of industries that use Hadoop. This approach is an example of open source community innovation that helps accelerate product maturity and time-to-value for the data-first enterprise.
Stay Tuned for More to Come
Video from Brussels Summit