Applies to release: Flume 1.2.0 as of 2012-08-12
Using Event Serializers
file_roll sinks support using event serializers. EventSerializer is an interface that allows arbitrary serialization of an event. While it is possible to implement this interface directly, many people will prefer to use one of the Avro serialization implementations built into Flume.
The recommended usage is to serialize your data as Avro. This is a great file format that has a lot of advantages over platform- and language-specific serialization formats. For Avro-serialized events you have two options in Flume:
avro_event which is built-in, and writing a custom subclass of AbstractAvroEventSerializer. The first option uses the built-in Flume event schema, while the second option allows you to specify your own avro schema.
Config file syntax
Use the following configuration file syntax to specify using an event serializer in your HDFS sink:
Example config using avro_event with HDFS sink
Examples for using AbstractAvroEventSerializer to write a custom schema
An example is provided as a unit test in Git at flume-ng-core/src/test/java/org/apache/flume/serialization/SyslogAvroEventSerializer.java
Additional unit tests / examples in Git at flume-ng-core/src/test/java/org/apache/flume/serialization/
In this case, you must specify the Builder of your type as the
serializer in the configuration file. For example:
This assumes that your Builder is an inner static class. NOTE that your Builder MUST have a public, no-arg constructor.