You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Target release
Epic
Document statusDRAFT
Document owner

Joe Witt

Designer
Developers
QA

Goals

  • Provide a nice user experience and feature set for supporting dataflows involving Avro formatted data including the ability to easily view, edit, split, combine, and route such data.

Background and strategic fit

Usage of Avro in and around Big Data projects is increasingly common.  We should build a content viewer for Avro data, which allows a user to look at the content of a given Avro message based on the schema.  We should also provide a mechanism to manipulate the content of Avro messages to insert or update values but also to perform schema evolution or transformation.  Avro data can tend to arrive in bundles so splitting them is useful to enable individual message handling/routing cases.  The reverse then is also true which is it is useful to be able to merge Avro messages based on like-schema.  Finally, being able to run queries against avro data to make routing decisions is also valuable and given the JSON-based schema design is quite doable.

Assumptions

Requirements

#TitleUser StoryImportanceNotes
1Convert to AvroConvert common data formats to Avro, such as CSV and JSON.N/A
  • Existing functionality in kite-bundle.
2Convert from AvroConvert from Avro to common data formats, such as CSV, XML, and JSONMedium 
3Convert Between Avro SchemasConvert Avro records from original schema to a destination schema allowing for user defined field mappings.N/A
  • Existing functionality in kite-bundle.
4Merge Avro FilesMerge Avro records with compatible schemas into a single file so that appropriate sized files can be delivered to downstream systems such as HDFS. Support similar semantics to existing MergeContent processor, such as merging based on size, time, number of entries, etc.High
5Split Avro FilesSplit an Avro file with multiple records into individual files so that each record can be processed independently by downstream processors. An example of downstream processing would be routing based on the value of a field in a given record.High 
6Extract Schema FingerprintExtract the schema fingerprint of a given Avro file so that downstream processors can make decisions based on the schema, such as when merging together records of compatible schemas (i.e. the correlation attribute).Medium
7Evaluate Avro PathsEvaluate a set of Avro paths against an incoming file, and extract the results to FlowFile attributes, or to the content of the FlowFile, similar to EvaluateJson. This would allow downstream processors to easily make decisions based on values in an Avro record, such as RouteOnAttribute.High
8Update Avro RecordsModify Avro records by inserting, updating, or removing fields.Medium 
9Avro Content ViewerProvide the ability to view an Avro record based on it's schema when clicking to view the content from a provenance event.Medium 

User interaction and design

Questions

Below is a list of questions to be addressed as a result of this requirements document:

QuestionOutcome

Not Doing

  • No labels