Metron Tutorial - Fundamentals Part 6: Streaming Enrichment

As you saw in part 2, we can use HBase to easily enrich data. In that tutorial, you learned how to load data via a flat CSV file into HBase. Some data, however, is not static, but rather comes in a constant stream. For instance, user enrichment sources are often this way. Capturing login events and associating to source IPs is a good way to associate data coming across Metron with a user, which is a valuable piece of information.

For the purpose of demonstration, let's assume that we are ingesting a CSV file which indicates the username to IP association. From there, we want to use this mapping from within the enrichment topology. Because we are defining a streaming source, we will need to create a parser topology to handle the streaming data.

In order to do that, we will need to create a file in ${METRON_HOME}/config/zookeeper/parsers/user.json

{
 "parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
 ,"writerClassName" : "org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter"
 ,"sensorTopic":"user"
 ,"parserConfig":
 {
    "shew.table" : "enrichment"
   ,"shew.cf" : "t"
   ,"shew.keyColumns" : "ip"
   ,"shew.enrichmentType" : "user"
   ,"columns" : {
      "user" : 0
     ,"ip" : 1
                }
 }
}

As you can see, we are using a stock CSVParser implemented in Metron and a writer to write out to HBase in the key/value format suitable for use in the enrichment topology.

We configure both the parser and the writer in the parserConfig section and set up the table, column family. We also specify which columns are to be considered for the key, in our case we want to lookup based on the ip. Also, we specify what enrichment type we should use in the enrichment topology (see part 2 for more about the enrichment type). We also can configure the CSVParser to define the structure of the CSV being ingested with the first column being the "user" and the second column being "ip".

This fully defines our input structure and how that data can be used in enrichment. We can now associate IP addresses with usernames.

We can start this on our cluster by pushing this config to zookeeper and then starting a parser topology by running

${HDP_HOME}/kafka-broker/bin/kafka-topics.sh --create --zookeeper $ZOOKEEPER --replication-factor 1 --partitions 1 --topic user

${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i ${METRON_HOME}/config/zookeeper

${METRON_HOME}/bin/start_parser_topology.sh -s user -z $ZOOKEEPER

Now for the purpose of demonstration, we can create a simple CSV associating a set of users to IPs and push it to the kafka topic in a file called user.csv

mmiklavcic,192.168.138.158

And we can push data to the kafka topic via

cat user.csv | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic user

After a few moments you should see a new enrichment type automatically added to the enrichment_list table

[root@node1: ~]
# echo "scan 'enrichment_list'" | hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.2.2.6.5.1175-1, r897822d4dd5956ca186974c10382e9094683fa29, Thu Jun 20 17:08:24 UTC 2019

scan 'enrichment_list'
ROW                                         COLUMN+CELL
user                                       column=t:v, timestamp=1566598361319, value={}
whois                                      column=t:v, timestamp=1566586822992, value={}
2 row(s) in 0.4410 seconds

From here we have data flowing into the HBase table, but we need to ensure that the enrichment topology can be used to enrich data flowing past. We can do this by modifying one of the sensors to associate the ip_src_addr with the user enrichment. For this demo, let's modify bro by editing ${METRON_HOME}/config/zookeeper/enrichments/bro.json like so

{
 "enrichment" : {
   "fieldMap": {
     "geo": ["ip_dst_addr", "ip_src_addr"],
     "host": ["host"],

     "stellar" : {
       "config" : {
         "user" : "ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')"
       }
     }

 },
 "threatIntel": {
   "fieldMap": {
     "hbaseThreatIntel": ["ip_src_addr", "ip_dst_addr"]
   },
   "fieldToTypeMap": {
     "ip_src_addr" : ["malicious_ip"],
     "ip_dst_addr" : ["malicious_ip"]
   }
 }
}

Now we can push this config to zookeeper and have it pick it up after some time

${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i ${METRON_HOME}/config/zookeeper

Space shortcuts

Blog