Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

  1. as a consumer, if it's a file, it just reads the file, otherwise if it represents a directory it scans all the file under the path satisfying the configured pattern. All the files under that directory must be of the same type.
  2. as a producer, if at least one split strategy is defined, the path is considered a directory and under that directory the producer creates a different file per split named using the configured UuidGenerator.
Note

When consuming from hdfs then in normal mode, a file is split into chunks, producing a message per chunk. You can configure the size of the chunk using the chunkSize option. If you want to read from hdfs and write to a regular file using the file component, then you can use the fileMode=Append to append each of the chunks together.

 

Options

Div
classconfluenceTableSmall

Name

Default Value

Description

overwrite

true

The file can be overwritten

append

false

Append to existing file. Notice that not all HDFS file systems support the append option.

bufferSize

4096

The buffer size used by HDFS

replication

3

The HDFS replication factor

blockSize

67108864

The size of the HDFS blocks

fileType

NORMAL_FILE

It can be SEQUENCE_FILE, MAP_FILE, ARRAY_FILE, or BLOOMMAP_FILE, see Hadoop

fileSystemType

HDFS

It can be LOCAL for local filesystem

keyType

NULL

The type for the key in case of sequence or map files. See below.

valueType

TEXT

The type for the key in case of sequence or map files. See below.

splitStrategy

 

A string describing the strategy on how to split the file based on different criteria. See below.

openedSuffix

opened

When a file is opened for reading/writing the file is renamed with this suffix to avoid to read it during the writing phase.

readSuffix

read

Once the file has been read is renamed with this suffix to avoid to read it again.

initialDelay

0

For the consumer, how much to wait (milliseconds) before to start scanning the directory.

delay

0

The interval (milliseconds) between the directory scans.

pattern

*

The pattern used for scanning the directory

chunkSize

4096

When reading a normal file, this is split into chunks producing a message per chunk.

connectOnStartup

true

Camel 2.9.3/2.10.1: Whether to connect to the HDFS file system on starting the producer/consumer. If false then the connection is created on-demand. Notice that HDFS may take up till 15 minutes to establish a connection, as it has hardcoded 45 x 20 sec redelivery. By setting this option to false allows your application to startup, and not block for up till 15 minutes.

owner

 

Camel 2.13/2.12.4: The file owner must match this owner for the consumer to pickup the file. Otherwise the file is skipped.

...