uniVocity-parsers
Available as of Camel 2.15.0
This Data Format uses uniVocity-parsers for reading and writing 3 kinds of tabular data text files:
- CSV (Comma Separated Values), where the values are separated by a symbol (usually a comma)
- fixed-width, where the values have known sizes
- TSV (Tabular Separated Values), where the fields are separated by a tabulation
Thus there are 3 data formats based on uniVocity-parsers.
If you use Maven you can just add the following to your pom.xml, substituting the version number for the latest and greatest release (see the download page for the latest versions).
<dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-univocity-parsers</artifactId> <version>x.x.x</version> </dependency>
Options
Most configuration options of the uniVocity-parsers are available in the data formats. If you want more information about a particular option, please refer to their documentation page.
The 3 data formats share common options and have dedicated ones, this section presents them all.
Common options, shared by all the data formats
Parameter name | Type | Description |
---|---|---|
nullValue | String | The string representation of a This option is |
skipEmptyLines | Boolean | Whether or not the empty lines must be ignored. This option isnull by default. When null , it uses the default uniVocity-parser value which is true . |
ignoreTrailingWhitespaces | Boolean | Whether or not the trailing white spaces must ignored. This option is |
ignoreLeadingWhitespaces | Boolean | Whether or not the leading white spaces must be ignored. This option is |
headersDisabled | boolean | Whether or not the headers are disabled. When defined, this option explicitly sets the headers as This option is |
headers | String[] | The headers to use. This option is In the XML DSL, this option is configured using children <univocity-csv> <univocity-header>first</univocity-header> <univocity-header>second</univocity-header> </univocity-csv> See other marshalling and unmarshalling examples for more information. |
headersExtractionEnabled | Boolean | Whether or not the header must be read in the first line of the test document This option is |
numberOfRecordsToRead | Integer | The maximum number of record to read. This option is |
emptyValue | String | The String representation of an empty value This option is |
lineSeparator | String | The line separator of the files This option is |
normalizedLineSeparator | Character | The normalized line separator of the files This option is |
comment | Character | The comment symbol. This option is |
lazyLoad | boolean | Whether the unmarshalling should produce an iterator that reads the lines on the fly or if all the lines must be read at one. This option is |
asMap | boolean | Whether the unmarshalling should produce maps for the lines values instead of lists. It requires to have header (either defined or collected). This options is |
CSV format options
Parameter name | Type | Description |
---|---|---|
quoteAllFields | Boolean | Whether or not all values must be quoted when writing them. This option is |
quote | Character | The quote symbol. This option is |
quoteEscape | Character | The quote escape symbol. This option is |
delimiter | Character | The delimiter of values This option is |
Fixed-width format options
Parameter name | Type | Description |
---|---|---|
fieldLengths | int[] | This options is required and defines the length of each values In the XML DSL, this option is configured using children <univocity-fixed> <univocity-header length="3"/> <univocity-header length="8"/> </univocity-fixed> See other marshalling and unmarshalling examples for more information. |
skipTrailingCharsUntilNewline | Boolean | Whether or not the trailing characters until new line must be ignored. This option is |
recordEndsOnNewLine | Boolean | Whether or not the record ends on new line. This option is |
padding | Character | The padding character. This option is |
TSV format options
Parameter name | Type | Description |
---|---|---|
escapeChar | Character | The escape character. This option is |
Marshalling usages
The marshalling accepts either:
- A list of maps (L
ist<Map<String, ?>>
), one for each line - A single map (
Map<String, ?>
), for a single line
Any other body will throws an exception.
Usage example: marshalling a Map into CSV format
Considering the following body | Map<String, Object> body = new HashMap<>(); body.put("A", "one"); body.put("B", "two"); body.put("C", "three"); |
and this Java route definition | from("direct:input") .marshal(new UniVocityCsvDataFormat()) .to("mock:result"); |
or this XML route definition | <route> <from uri="direct:input"/> <marshal> <univocity-csv/> </marshal> <to uri="mock:result"/> </route> |
then it will produce | one,two,three |
Usage example: marshalling a Map into fixed-width format
Considering the following body | Map<String, Object> body = new HashMap<>(); body.put("A", "one"); body.put("B", "two"); body.put("C", "three"); |
and this Java route definition | from("direct:input") .marshal(new UniVocityFixedWidthDataFormat() .setFieldLengths(new int[] {5, 5, 5}) .padding('_') ) .to("mock:result"); |
<route> <from uri="direct:input"/> <marshal> <univocity-fixed padding="_"> <univocity-header length="5"/> <univocity-header length="5"/> <univocity-header length="5"/> </univocity-fixed> </marshal> <to uri="mock:result"/> </route> | |
then it will produce | one__two__three |
Usage example: marshalling a Map into TSV format
Considering the following body | Map<String, Object> body = new HashMap<>(); body.put("A", "one"); body.put("B", "two"); body.put("C", "three"); |
and this Java route definition | from("direct:input") .marshal(new UniVocityTsvDataFormat()) .to("mock:result"); |
or this XML route definition | <route> <from uri="direct:input"/> <marshal> <univocity-tsv/> </marshal> <to uri="mock:result"/> </route> |
then it will produce | one two three (with tabs separating the values) |
Unmarshalling usages
The unmarshalling uses an InputStream
in order to read the data.
Each row produces either:
- a list with all the values in it (
asMap
option withfalse
); - A map with all the values indexed by the headers (
asMap
option withtrue
).
All the rows can either:
- be collected at once into a list (
lazyLoad
option withfalse
); - be read on the fly using an iterator (
lazyLoad
option withtrue
).
Usage example: unmarshalling a CSV format into maps with automatic headers
Considering the following body | A,B,C one,two,three four,five,six |
and this Java route definition | from("direct:input") .unmarshal(new UniVocityCsvDataFormat() .setAsMap(true) .setHeaderExtractionEnabled(true) ) .to("mock:result"); |
or this XML route definition | <route> <from uri="direct:input"/> <unmarshal> <univocity-csv headerExtractionEnabled="true" asMap="true"/> </unmarshal> <to uri="mock:result"/> </route> |
then it will produce | [ {A: 'one', B: 'two', C: 'three'}, {A: 'four', B: 'five', C: 'six'} ] |
Usage example: unmarshalling a fixed-width format into lists
Considering the following body | one two three four five six |
and this Java route definition | from("direct:input") .unmarshal(new UniVocityFixedWidthDataFormat() .setFieldLengths(new int[] {5,5,5}) ) .to("mock:result"); |
or this XML route definition | <route> <from uri="direct:input"/> <unmarshal> <univocity-fixed> <univocity-header length="5"/> <univocity-header length="5"/> <univocity-header length="5"/> </univocity-fixed> </unmarshal> <to uri="mock:result"/> </route> |
then it will produce | [ ['one', 'two', 'three'], ['four', 'five', 'six'] ] |