Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Now that we have created a new telemetry we can see how we can add new enrichments to that telemetry.  In this exercise we will be looking at adding a whois enrichment to the Squid telemetry we setup in the previous entry.  Whois data is expensive so we will not be providing it.  Instead I wrote a basic whois scraper (out of context for this exercise) that produces a CSV format for whois data as follows:

 

google.com,

...

"Google

...

Inc.",

...

"US",

...

"Dns

...

Admin",874306800000
work.net,

...

"",

...

"US",

...

"PERFECT

...

PRIVACY,

...

LLC",788706000000
capitalone.com,

...

"Capital

...

One

...

Services,

...

Inc.",

...

"US",

...

"Domain

...

Manager",795081600000
cisco.com,

...

"Cisco

...

Technology

...

Inc.",

...

"US",

...

"Info

...

Sec",547988400000
cnn.com,

...

"Turner

...

Broadcasting

...

System,

...

Inc.",

...

"US",

...

"Domain

...

Name

...

Manager",748695600000
news.com,

...

"CBS

...

Interactive

...

Inc.",

...

"US",

...

"Domain

...

Admin",833353200000
nba.com,

...

"NBA

...

Media

...

Ventures,

...

LLC",

...

"US",

...

"C/O

...

Domain

...

Administrator",786027600000
espn.com,

...

"ESPN,

...

Inc.",

...

"US",

...

"ESPN,

...

Inc.",781268400000
pravda.com,

...

"Internet

...

Invest,

...

Ltd.

...

dba

...

Imena.ua",

...

"UA",

...

"Whois

...

privacy

...

protection

...

service",806583600000
hortonworks.com,

...

"Hortonworks,

...

Inc.",

...

"US",

...

"Domain

...

Administrator",1303427404000
microsoft.com,

...

"Microsoft

...

Corporation",

...

"US",

...

"Domain

...

Administrator",673156800000
yahoo.com,

...

"Yahoo!

...

Inc.",

...

"US",

...

"Domain

...

Administrator",790416000000
rackspace.com,

...

"Rackspace

...

US,

...

Inc.",

...

"US",

...

"Domain

...

Admin",903092400000
1and1.co.uk, "1 & 1 Internet Ltd","UK", "Domain Admin",943315200000

Please cut and paste this data into a file called "whois_ref.csv" on your virtual machine.

The schema of this enrichment is domain|owner|registeredCountry|registeredTimestamp.  The  Make sure you don't have an empty newline character as the last line of the CSV file, as that will result in a pull pointer exception. The first thing we need to do is setup the enrichment source.  In order to do this we first need to setup the extractor config as so:

...

iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json

 

And another config to load the zookeeper enrichment config. Be sure to replace the $ZOOKEEPER placeholder with your Zookeeper quorum list:

{

...

"zkQuorum"

...

:

...

"

...

$ZOOKEEPER"

...

,"sensorToFieldList"

...

:

...

{

...

"squid"

...

:

...

{

...

"type"

...

:

...

"ENRICHMENT"

...

,"fieldToEnrichmentTypes"

...

:

...

{

...

"domain_without_subdomains" : [ "whois"

...

]

...

}

...

}

...

}
}

Please cut and paste this file into a file called "enrichment_config_temp.json" on the virtual machine.  Because copying and pasting from this blog will include some non-ascii invisible characters, to strip them out please run 

...

Which means that the system will map the whois enrichment to the field URL.  Then execute the following command:

/usr/metron/0.1BETA/${METRON_HOME}/bin/flatfile_loader.sh -n enrichment_config.json -i whois_ref.csv -t enrichment -c t -e extractor_config.json

After this your enrichment data will be loaded in Hbase and a Zookeeper mapping will be established.  The data will be populated into Hbase HBase table called enrichment.  To verify that the logs were properly ingested into Hbase HBase run the following command

echo "scan 'enrichment'" | hbase shell

Note, you should also see a separate HBase table, enrichment_list, automatically populated with a single new enrichment type named "whois."

[root@node1(127.0.0.1 192.168.66.121): ~]

# echo "scan 'enrichment_list'" | hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 1.1.2.2.6.5.1175-1, r897822d4dd5956ca186974c10382e9094683fa29, Thu Jun 20 17:08:24 UTC 2019


scan 'enrichment_list'

ROW                                         COLUMN+CELL

 whois                                      column=t:v, timestamp=1566586822992, value={}

1 row(s) in 0.4950 seconds

You should see the table bulk loaded with data from the CSV file.  Now check if Zookeeper enrichment tag was properly populated:

/usr/metron/0.1BETA/${METRON_HOME}/bin/zk_load_configs.sh -z localhost:2181m DUMP -z $ZOOKEEPER -c ENRICHMENT -n squid

This spits out all of the configs to standard out. We provided a sensor name arg, so you should find see one named "squid."Now the url field should be enriched with the whois data.  If you have installed the elasticsearch head plugin, then you can go to

If you want to start with a fresh index for squid, you can delete the existing index by doing the following:

curl -XDELETE "http://node1:9200/

...

squid*"

Re-ingest the data (see previous blog post for more detail)

cat /var/log/squid/access.log | ${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list $BROKERLIST --topic squid

and the new messages should be automatically enriched. Using the ES Head browser plugin, the new message should look as follows:


Image Added

Notice the enrichments here (whois.owner, whois.domain_created_timestamp, whois.registrar, whois.home_country) 

...