SerDe is short for Serializer/Deserializer. A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

For JSON files, Amazon has provided a JSON SerDe available at:
s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar

Hive uses the Serde interface for IO. The interface handles both serialization and deserialization and also interpreting the results of serialization as individual fields for processing.

Input processing

For UDFs the new GenericUDF abstract class provides the !ObjectInspector associated with the UDF arguments in the initialize() method. So the engine first initializes the UDF by calling this method. The UDF can then use these !ObjectInspectors to interpret complex arguments (for simple arguments, the object handed to the udf is already the right primitive object like LongWritable/IntWritable etc).

Output processing

Output is analogous to input. The engine passes the deserialized Object representing a record and the corresponding !ObjectInspector to Serde.serialize(). In this context serialization means converting the record object to an object of the type expected by the !OutputFormat which will be used to perform the write. To perform this conversion, the serialize() method can make use of the passed !ObjectInspector to get the individual fields in the record in order to convert the record to the appropriate type.

Additional notes

In short, Hive will automatically convert objects so that Integer will be converted to !IntWritable (and vice versa) if needed. This allows people without Hadoop knowledge to use Java primitive classes (Integer, etc), while hadoop users/experts can use !IntWritable which is more efficient.

Between map and reduce, Hive uses !LazyBinarySerDe and !BinarySortableSerDe 's serialize methods. SerDe can serialize an object that is created by another serde, using !ObjectInspector.