uniVocity-parsers

Available as of Camel 2.15.0

This Data Format uses uniVocity-parsers for reading and writing 3 kinds of tabular data text files:

  • CSV (Comma Separated Values), where the values are separated by a symbol (usually a comma)
  • fixed-width, where the values have known sizes
  • TSV (Tabular Separated Values), where the fields are separated by a tabulation

Thus there are 3 data formats based on uniVocity-parsers.

If you use Maven you can just add the following to your pom.xml, substituting the version number for the latest and greatest release (see the download page for the latest versions).

<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-univocity-parsers</artifactId>
    <version>x.x.x</version>
</dependency>

Options

Most configuration options of the uniVocity-parsers are available in the data formats. If you want more information about a particular option, please refer to their documentation page.

The 3 data formats share common options and have dedicated ones, this section presents them all.

Common options, shared by all the data formats

Parameter nameTypeDescription
nullValueString

The string representation of a null value.

This option is null by default. When null, it uses the default uniVocity-parser value which is null.

skipEmptyLinesBoolean

Whether or not the empty lines must be ignored.

This option is null by default. When null, it uses the default uniVocity-parser value which is true.
ignoreTrailingWhitespacesBoolean

Whether or not the trailing white spaces must ignored.

This option is null by default. When null, it uses the default uniVocity-parser value which is true.

ignoreLeadingWhitespacesBoolean

Whether or not the leading white spaces must be ignored.

This option is null by default. When null, it uses the default uniVocity-parser value which is true.

headersDisabledboolean

Whether or not the headers are disabled. When defined, this option explicitly sets the headers as null which indicates that there is no header.

This option is false by default.

headersString[]

The headers to use.

This option is null by default. When null, it uses the default uniVocity-parser value which is null.

In the XML DSL, this option is configured using children <univocity-header> tags:

<univocity-csv>
    <univocity-header>first</univocity-header>
    <univocity-header>second</univocity-header>
</univocity-csv>

See other marshalling and unmarshalling examples for more information.

headersExtractionEnabledBoolean

Whether or not the header must be read in the first line of the test document

This option is null by default. When null, it uses the default uniVocity-parser value which is false.

numberOfRecordsToReadInteger

The maximum number of record to read.

This option is null by default. When null, it uses the default uniVocity-parser value which is -1.

emptyValueString

The String representation of an empty value

This option is null by default. When null, it uses the default uniVocity-parser value which is "".

lineSeparatorString

The line separator of the files

This option is null by default. When null, it uses the default uniVocity-parser value which is the platform line separator.

normalizedLineSeparatorCharacter

The normalized line separator of the files

This option is null by default. When null, it uses the default uniVocity-parser value which is "\n".

commentCharacter

The comment symbol.

This option is null by default. When null, it uses the default uniVocity-parser value which is '#'.

lazyLoadboolean

Whether the unmarshalling should produce an iterator that reads the lines on the fly or if all the lines must be read at one.

This option is false by default.

asMapboolean

Whether the unmarshalling should produce maps for the lines values instead of lists. It requires to have header (either defined or collected).

This options is false by default.

CSV format options

Parameter nameTypeDescription
quoteAllFieldsBoolean

Whether or not all values must be quoted when writing them.

This option is null by default. When null, it uses the default uniVocity-parser value which is false.

quoteCharacter

The quote symbol.

This option is null by default. When null, it uses the default uniVocity-parser value which is '"'.

quoteEscapeCharacter

The quote escape symbol.

This option is null by default. When null, it uses the default uniVocity-parser value which is '"'.

delimiterCharacter

The delimiter of values

This option is null by default. When null, it uses the default uniVocity-parser value which is ','.

Fixed-width format options

Parameter nameTypeDescription
fieldLengthsint[]

This options is required and defines the length of each values

In the XML DSL, this option is configured using children <univocity-header> tags:

<univocity-fixed>
    <univocity-header length="3"/>
    <univocity-header length="8"/>
</univocity-fixed>

See other marshalling and unmarshalling examples for more information.

skipTrailingCharsUntilNewlineBoolean

Whether or not the trailing characters until new line must be ignored.

This option is null by default. When null, it uses the default uniVocity-parser value which is false.

recordEndsOnNewLineBoolean

Whether or not the record ends on new line.

This option is null by default. When null, it uses the default uniVocity-parser value which is false.

paddingCharacter

The padding character.

This option is null by default. When null, it uses the default uniVocity-parser value which is ' ' (space).

TSV format options

Parameter nameTypeDescription
escapeCharCharacter

The escape character.

This option is null by default. When null, it uses the default uniVocity-parser value which is '\'.

Marshalling usages

The marshalling accepts either:

  • A list of maps (List<Map<String, ?>>), one for each line
  • A single map (Map<String, ?>), for a single line

Any other body will throws an exception.

Usage example: marshalling a Map into CSV format

Considering the following body
Map<String, Object> body = new HashMap<>();
body.put("A", "one");
body.put("B", "two");
body.put("C", "three");
and this Java route definition
from("direct:input")
    .marshal(new UniVocityCsvDataFormat())
    .to("mock:result");
or this XML route definition
<route>
    <from uri="direct:input"/>
    <marshal>
        <univocity-csv/>
    </marshal>
    <to uri="mock:result"/>
</route>
then it will produce
one,two,three

Usage example: marshalling a Map into fixed-width format

Considering the following body
Map<String, Object> body = new HashMap<>();
body.put("A", "one");
body.put("B", "two");
body.put("C", "three");
and this Java route definition
from("direct:input")
    .marshal(new UniVocityFixedWidthDataFormat()
        .setFieldLengths(new int[] {5, 5, 5})
        .padding('_')
    )
    .to("mock:result");
 
<route>
    <from uri="direct:input"/>
    <marshal>
        <univocity-fixed padding="_">
            <univocity-header length="5"/>
            <univocity-header length="5"/>
            <univocity-header length="5"/>
        </univocity-fixed>
    </marshal>
    <to uri="mock:result"/>
</route>
then it will produce
one__two__three

Usage example: marshalling a Map into TSV format

Considering the following body
Map<String, Object> body = new HashMap<>();
body.put("A", "one");
body.put("B", "two");
body.put("C", "three");
and this Java route definition
from("direct:input")
    .marshal(new UniVocityTsvDataFormat())
    .to("mock:result");
or this XML route definition
<route>
    <from uri="direct:input"/>
    <marshal>
        <univocity-tsv/>
    </marshal>
    <to uri="mock:result"/>
</route>
then it will produce
one two three

(with tabs separating the values)

Unmarshalling usages

The unmarshalling uses an InputStream in order to read the data.

Each row produces either:

  • a list with all the values in it (asMap option with false);
  • A map with all the values indexed by the headers (asMap option with true).

All the rows can either:

  • be collected at once into a list (lazyLoad option with false);
  • be read on the fly using an iterator (lazyLoad option with true).

Usage example: unmarshalling a CSV format into maps with automatic headers

Considering the following body
A,B,C
one,two,three
four,five,six
and this Java route definition
from("direct:input")
    .unmarshal(new UniVocityCsvDataFormat()
        .setAsMap(true)
        .setHeaderExtractionEnabled(true)
    )
    .to("mock:result");
or this XML route definition
<route>
    <from uri="direct:input"/>
    <unmarshal>
        <univocity-csv headerExtractionEnabled="true" asMap="true"/>
    </unmarshal>
    <to uri="mock:result"/>
</route>
then it will produce
[
    {A: 'one', B: 'two', C: 'three'},
    {A: 'four', B: 'five', C: 'six'}
]

Usage example: unmarshalling a fixed-width format into lists

Considering the following body
one  two  three
four five six
and this Java route definition
from("direct:input")
    .unmarshal(new UniVocityFixedWidthDataFormat()
        .setFieldLengths(new int[] {5,5,5})
    )
    .to("mock:result");
or this XML route definition
<route>
    <from uri="direct:input"/>
    <unmarshal>
        <univocity-fixed>
            <univocity-header length="5"/>
            <univocity-header length="5"/>
            <univocity-header length="5"/>
        </univocity-fixed>
    </unmarshal>
    <to uri="mock:result"/>
</route>
then it will produce
[
    ['one', 'two', 'three'],
    ['four', 'five', 'six']
]
  • No labels