Enrichments add additional context to the streaming message. For example, if a given message has an external IP an enrichment would be to tag geo data to that message. Another example would be if a message contains a domain name then we can tag a whois entry to that message. There are three primary benefits to adding context via enrichments to a message:
- Correlation: if you know which user and asset the message is intended to and where it's coming from it's easier to correlated it with other related messages
- ML: having full context via streaming allows scoring against ML models via real time as opposed to gathering the context in batch and then applying the model in batch
- Accuracy: the underlying enrichment information always changes (users sign on and off, machines change IPs, etc) and you want to enrich as close to the capture time as possible
- Investigation: having a full context for a given piece of metadata or alert means less consoles to fumble through and gets us closer to the 'single pane of glass' interface
Metron currently provides an extensible framework to plug in enrichments. Each enrichment has two components: an enrichment data source and and enrichment bolt.
Prior to enabling an enrichment capability within Metron the enrichment store (which for Metron is primarily Hbase) has to be loaded with enrichment data. Enrichment data can either be bulk loaded from HDFS or be streamed into enrichment store via pluggable loading framework. The enrichment loader transforms the enrichment into a JSON format that is understandable to Metron. The loading framework has additional capabilities for aging data out of the enrichment stores based on time. Once the stores are loaded an enrichment bolt that can interact with the enrichment store can be incorporated into the enrichment topology. Each enrichment bolt can enrich a specific field/tag within a Metron message. When a bolt recognizes that it is able to enrich a field it reaches into the enrichment store, pulls out the enrichment, and tags the message with the enrichment. The enrichment is then stored within the bolt's in-memory cache. Metron uses the underlying Storm routing capabilities to make sure that similar enrichment values are sent to the appropriate bolts that already have these values cached in-memory, thereby giving Metron it's superior scale and speed when compared to other big data streaming systems that do not have this capability.
The following list of enrichments is Currently supported in Metron:
|Enrichment||Description||Enrichment Store||Enrichment Source||Metron Message Field Name(s)||Loader Type||Refresh Rate||Metron Enrichment Architecture|
|GeoIP||Tags on GeoIP (lat-lon coordinates + City/State/Country) to any external IP address. This can be applied both to alerts as well as metadata telemetries to be able to map them to a geo location.||MySQL|
|src_ip, dst_ip||Bulk from HDFS||Once every 3 months||Geo Enrichment|
|Asset||Given an IP, figure out the host name of the asset. Then given the hostname of the asset tell me everything else about that asset that is known from LDAP, AD, or enterprise inventory stores||HBase||LDAP, AD, DNS logs, enterprise inventory stores||src_ip, dst_ip||Not yet provided. Roadmap item||Once every hour||Asset Enrichment|
|User||Given a session or an alert for a certain ip-application pair, tell me which user this session/alert belongs to||Hbase||LDAP, AD, proxy logs||src_ip + application||Not yet provided. Roadmap item||Once every 5 minutes||User Enrichment|
|More to come....|