Tajo provides an example HTTP tablespace. This tablespace allows for Tajo to directly read and process JSON data on the web. This section briefly shows how the example tablespace is implemented. The full code can be found in the tajo-tablespace-example module.
ExampleHttpFileFragment is very simple and same with FileFragment.
ExampleHttpFileFragmentSerde simply serializes / deserializes ExampleHttpFileFragment into / from a protocol buffer message. The following code shows the protocol buffer message definition, and Serde class is omitted.
The one of most important methods of ExampleHttpFileTablespace is getSplits(). This method generally returns multiple fragments for distributed processing of a large data. However, in this example, it returns a single fragment for simplicity.
ExampleHttpJsonScanner is ommitted here because its implementation is almost same with DelimitedTextFileScanner. Appender is not provided for the example HTTP tablespace.
The example http tablespace is disabled for HTTP scheme by default because it is not proper for real applications, so it is preserved for user-defined tablespaces. To use this example tablespace, the following lines need to be added to storage-site.json.