CsvDataContext
The Apache MetaModel CSV module is one of the most advanced implementations there is, compared to how simple a file format CSV is. The implementation's main features are:
- Full implementation of DataContext and UpdateableDataContext.
- Streaming query support without memory leaks, tested on billion-record data sets.
- Support for parallelized row-parsing when multiline values are turned OFF. In these cases the Row objects served for queries have not yet been parsed, making this a potential parallel data consumption activity.
- Support for sample-based COUNT queries when the query's COUNT select item has the "allow function approximation" flag set. This means that applications can get a quick approximation of the amount of rows, even in a really big file.
Creating from plain old java code - CsvDataContext
This is really simple:
Code Block | ||
---|---|---|
| ||
Resource csvResource = new FileResource("/path/to/my/file.csv"); CsvConfiguration configuration = new CsvConfiguration( // arguments here to fit the resource ); DataContext dataContext = new JdbcDataContext(resource, configuration); |
Creating from properties - CsvDataContextFactory
If you wish to construct your CSV DataContext from properties, this is also possible. For instance:
...
Property | Example value | Required | Description |
---|---|---|---|
type | csv | Must be set to 'csv' or else another type of DataContext will be constructed. | |
resource | /data/stuff.csv | Must reference the resource path to read/write CSV data from/to. | |
quote-char | " | The enclosing quote character to use for values in the CSV file. | |
separator-char | , | The separator character to use for separating values in the CSV file. | |
escape-char | \ | The escape character to use for escaping CSV parsing of special characters. | |
encoding | UTF-8 | The character set encoding of the data. | |
column-name-line-number | 1 | The line-number which holds column names / headers. | |
fail-on-inconsistent-row-length | true | Whether or not to fail (throw exception) on inconsistent row lengths, or to suppress these parsing issues. | |
multiline-values | false | Whether or not the data contains values spanning multiple lines (if this never happens, a faster parsing approach can be applied). |
Updating CSV data
Modifying CSV data is done just like with any other MetaModel module - by means of implementing your an update script that is then submitted to the UpdateableDataContext's executeUpdate(...)
method. This approach guarantees isolation and coherence in all update operations. Here is a simple example:
...