MongoDB GridFS

Camel MongoDB GridFS component

Available as of Camel 2.17

Maven users will need to add the following dependency to their pom.xml for this component:

<dependency>
    <groupId>org.apache.camel</groupId>
    <artifactId>camel-mongodb-gridfs</artifactId>
    <version>x.y.z</version>
    <!-- use the same version as your Camel core version -->
</dependency>

URI format

mongodb-gridfs:connectionBean?database=databaseName&bucket=bucketName[&moreOptions...]

URI format ( camel < 2.19 )

gridfs:connectionBean?database=databaseName&bucket=bucketName[&moreOptions...]

Endpoint options

GridFS endpoints support the following options, depending on whether they are acting like a Producer or as a Consumer (options vary based on the consumer type too).

Name	Default Value	Description
`database`	none	Required. The name of the database to which this endpoint will be bound. All operations will be executed against this database.
`bucket`	fs	The name of the GridFS bucket within the Database. The default is the GridFS.DEFAULT_BUCKET value ("fs").
`operation`	create	The id of the operation this endpoint will execute. Pick from the following: Query operations: `findOne`, `listAll`, `count` Write operations: `create` Delete operations: `remove`
query	none	Combined with the query strategy parameters to create the query used to search for new files.
queryStrategy	TimeStamp	The strategy that is used to find new files. Can be one of: TimeStamp - files that are uploaded after the Consumer starts are processed PersistentTimestamp - Like TimeStamp, but the last timestamp used is persisted to a collection so when the Consumer restarts, it can resume where it left off FileAttribute - finds files that do NOT have the give attribute. After processing, it adds the attribute. TimestampAndFileAttribute - finds files that are newer than the TimeStamp and are missing the attribute PersistentTimestampAndFileAttribute
persistentTSCollection	camel-timestamps	When using persistent timestamps, this is the Collection that the timestamp is stored into.
persistentTSObject	camel-timestamp	When using persistent timestamps, this is the object ID for the timestamp object. Each consumer can have it's own timestamp ID stored in a common Collection
fileAttributeName	camel-processed	When using FileAttribute, this is the name of the attribute that is used. When a file is about to be processed, the attribute is set to "processing" and then set to "done" when the file processing is done.
delay	500 (ms)	The delay between polling GridFS for new files
initialDelay	1000 (ms)	The initial delay before the first poll

Configuration of database in Spring XML

The following Spring XML creates a bean defining the connection to a MongoDB instance.

<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
    <bean id="mongoBean" class="com.mongodb.Mongo">
        <constructor-arg name="host" value="${mongodb.host}" />
        <constructor-arg name="port" value="${mongodb.port}" />
    </bean>
</beans>

Sample route

The following route defined in Spring XML executes the operation findOne on a collection.

Get a file from GridFS

<route>
  <from uri="direct:start" />
  <!-- using bean 'mongoBean' defined above -->
  <to uri="mongodb-gridfs:mongoBean?database=${mongodb.database}&amp;operation=findOne" />
  <to uri="direct:result" />
</route>

GridFS operations - producer endpoint

count

Returns the total number of file in the collection, returning an Integer as the OUT message body.

// from("direct:count").to("mongodb-gridfs?database=tickets&operation=count");
Integer result = template.requestBodyAndHeader("direct:count", "irrelevantBody");
assertTrue("Result is not of type Long", result instanceof Integer);

You can provide a filename header to provide a count of files matching that filename.

Map<String, Object> headers = new HashMap<String, Object>();
headers.put(Exchange.FILE_NAME, "filename.txt");
Integer count = template.requestBodyAndHeaders("direct:count", query, headers);

listAll

Returns an Reader that lists all the filenames and their IDs in a tab separated stream.

// from("direct:listAll").to("mongodb-gridfs?database=tickets&operation=listAll");
Reader result = template.requestBodyAndHeader("direct:listAll", "irrelevantBody");

filename1.txt	1252314321
filename2.txt	2897651254

findOne

Finds a file in the GridFS system and sets the body to an InputStream of the content. Also provides the metadata has headers. It uses Exchange.FILE_NAME from the incoming headers to determine the file to find.

// from("direct:findOne").to("mongodb-gridfs?database=tickets&operation=findOne");
Map<String, Object> headers = new HashMap<String, Object>();
headers.put(Exchange.FILE_NAME, "filename.txt");
InputStream result = template.requestBodyAndHeaders("direct:findOne", "irrelevantBody", headers);

create

Creates a new file in the GridFs database. It uses the Exchange.FILE_NAME from the incoming headers for the name and the body contents (as an InputStream) as the content.

// from("direct:create").to("mongodb-gridfs?database=tickets&operation=create");
Map<String, Object> headers = new HashMap<String, Object>();
headers.put(Exchange.FILE_NAME, "filename.txt");
InputStream stream = ... the data for the file ...
template.requestBodyAndHeaders("direct:create", stream, headers);

remove

Removes a file from the GridFS database.

// from("direct:remove").to("mongodb-gridfs?database=tickets&operation=remove");
Map<String, Object> headers = new HashMap<String, Object>();
headers.put(Exchange.FILE_NAME, "filename.txt");
template.requestBodyAndHeaders("direct:remove", "", headers);

GridFS Consumer

The GridFS component will poll GridFS periodically for new files to process. The two parameters that control this behavior are the delay and initialDelay parameters. The delay parameter specifies how long the background tread will sleep between polling attempts. The default is 500ms. The initialDelay parameter specifies how long the consumer will wait after starting before polling the first time. This is useful if the backend service needs a bit longer to become available.

The Consumer has several strategies for determining which files within the grid have not been processed yet:

TimeStamp - (default) when the consumer starts up, it uses the current time as the starting point. Any files currently in the grid are ignored, only files added after the consumer start are processed. After polling, the consumer updates it's timestamp with the timestamp of the newest file processed.
PersistentTimestamp - when the consumer starts up, it queries the collection specified by the persistentTSCollection parameter for the object given by the persistentTSObject parameter to use as the starting timestamp. If the object doesn't exist, it uses the current time and creates the object. After each file processed, the timestamp in the collection is updated.
FileAttribute - instead of timestamps, the consumer will query gridfs for files that don't have the attribute given by the fileAttributeName parameter. When the file starts to be processed by the consumer, the attribute is added to the file in the gridfs.
TimestampAndFileAttribute - finds files that are newer than the TimeStamp and are missing the attribute. Adds the attribute to the file when processing.
PersistentTimestampAndFileAttribute - finds files that are newer than the TimeStamp and are missing the attribute. Adds the attribute to the file when processing and updates the persistent timestamp.

from("mongodb-gridfs?database=tickets&queryStrategy=FileAttribute").process(.....);
 
from("mongodb-gridfs?database=myData&queryStrategy=PersistentTimestamp&persistentTSCollection=CamelTimestamps&persistentTSObject=myDataTS").process(...)

Child pages