You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

What is Flume NG?

Flume NG is a branch of Flume that aims to be significantly simpler, smaller, and easier to deploy. In doing so, we do not commit to maintaining backward compatibility. At this time, NG is experimental and not supported in production environments. We're currently soliciting feedback from those who are interested in testing this branch for correctness, ease of use, and potential integration with other systems.

Danger! Warnung! Advertencia! Avertissement! Varning!

Flume NG is still experimental and absolutely not meant for production usage. No excessive testing has yet been performed and many features are not yet implemented. Some features and functionality (including APIs) may change or disappear prior to release.

What's Changed?

Flume NG is a huge departure from Flume OG (original generation, or "original gangsta," if you prefer) in its implementation although many of the original concepts are the same. If you're already familiar with Flume, here's what you need to know.

Since NG is still in flux, don't let anything here scare you. Features that are in OG may not yet be in NG so don't take the absence or presence of anything here to mean anything other than this is the current state. If you do or do not want something here to be true, let us know.

  • You still have sources and sinks and they still do the same thing. They are now connected by channels.
  • Channels are pluggable and dictate durability. Flume NG ships with an in-memory channel for fast, but non-durable event delivery and a JDBC-based channel for durable event delivery. We have plans for a file-based durable channel.
  • There's no more logical or physical nodes. We call all physical nodes agents and agents can run zero or more sources and sinks.
  • There's no master and no ZooKeeper dependency anymore. At this time, Flume runs with a simple file-based configuration system.
  • Just about everything is a plugin, some end user facing, some for tool and system developers. (Specifically, sources, sinks, channels, configuration providers, lifecycle management policies, input and output formats, compression, source and sink channel adapters, and the kitchen sink.)
  • Tons of things are not yet implemented. Please file JIRAs and / or vote for features you deem important.

Getting Flume NG

Given the rapid pace of development on the NG branch, the best place from which to get it is to check it out from subversion or git and build it yourself.

Building From Source

To build Flume NG from source, you'll need either subversion or git, the Sun JDK 1.6, Apache Maven 3.x, and an Internet connection.

1. Check out the source

For those that prefer subversion:

$ svn checkout https://svn.apache.org/repos/asf/incubator/flume/branches/flume-728/

If you're more of a git person:

$ git clone git://git.apache.org/flume.git
$ git checkout flume-728

Note: The git repo is a read-only mirror of the subversion repo.

2. Compile the project

# Build the code and run the tests
$ mvn package
# ...or build the code without running the tests
$ mvn package -DskipTests

Currently this produces an uber-jar containing all of Flume NG and all of its dependencies. This, of course, is not ideal, but is useful during the testing phases. You should copy the jar file to the top level directory.

$ cp flume-ng-dist/target/flume-ng-dist-0.9.5-SNAPSHOT-jar-with-dependencies.jar flume-ng.jar

3. Configure and Run Flume NG

After you've configured Flume NG (see below), you can run it with the bin/flume-ng executable. This script has a number of arguments and modes. If you're just interested in trying Flume and you're following all directions, feel free to skip to the configuration section and follow on to the examples.

# Flume NG executable help
esammer-2:~/Documents/Code/flume-asf esammer$ ./bin/flume-ng 
Error: Unknown or unspecified command ''
usage: ./bin/flume-ng [help | node] [--no-env]

commands:
  help                display this help text
  node                run a Flume node
  avro-client         run an avro Flume client

global options

  --conf,-c <conf>    use configs in <conf> directory
  --no-env,-E         do not source the flume-env.sh file
  --classpath,-C <cp> override the classpath completely

node options

  --data,-d <dir>     store internal flume data in <dir>

avro-client options

  --host,-H <host>  hostname to which events will be sent
  --port,-p <port>  port of the avro source

Note that the conf directory is always included in the classpath.

The flume-ng executable lets you run a Flume NG node or an Avro client which is useful for testing and experiments. No matter what, you'll need to specify a command (e.g. node or avro-client), a classpath (--classpath <path to flume jar>) and a conf directory (--conf <conf dir>). All other options are command-specific.

flume-ng node options

When given the node command, a Flume NG node will be started with a given configuration file (required).

Option

Description

-f <filename>

Indicate which configuration file you want to run with (required)

flume-ng avro-client options

Run an Avro client that sends either a file or data from stdin to a specified host and port where a Flume NG Avro Source is listening.

Option

Description

-H <hostname>

Specifies the hostname of the Flume node (may be localhost)

-p <port>

Specifies the port on which the Avro source is listening

-F <filename>

Send each line of <filename> to Flume (optional)

The Avro client treats each line (terminated by \n, \r, or \r\n) as an event. Think of the avro-client command as cat for Flume. For instance, the following creates one event per Linux user and sends it to Flume's avro source on localhost:41414.

$ ./bin/flume-ng avro-client --classpath flume-ng.jar --conf ./conf -H localhost -p 41414 -F /etc/passwd

Configuration

Flume uses a Java property file based configuration format. It is required that you tell Flume which file to use by way of the -f <file> option (see above) when running a node. The file can live anywhere, but historically - and in the future - the conf directory will be the correct place for config files.

Let's start with a basic example.

# Define a memory channel called ch1 on host1
host1.channels.ch1.type = memory

# Define an Avro source called avro-source1 on host1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
host1.sources.avro-source1.type = avro
host1.sources.avro-source1.bind = 0.0.0.0
host1.sources.avro-source1.port = 41414
host1.sources.avro-source1.channels = ch1

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
host1.sinks.log-sink1.type = logger
host1.sinks.log-sink1.channel = ch1

# Finally, now that we've defined all of our components, tell
# host1 which ones we want to activate.
host1.sources = avro-source1
host1.sinks = log-sink1
host1.channels = ch1

This example creates a memory channel (i.e. an unreliable or "best effort" transport), an Avro RPC source, and a logger sink and connects them together. Any events received by the Avro source are routed to the channel ch1 and delivered to the logger sink. It's important to note that defining components is the first half of configuring Flume; they must be activated by listing them in the <agent>.sources, <agent>.sinks, and <agent>.channels sections. Multiple sources, sinks, and channels may be listed, separated by a space.

For full details, please (please, please, please) see the javadoc for the org.apache.flume.conf.properties.PropertiesFileConfigurationProvider class.

Providing Feedback

For help building, configuring, and running Flume (NG or otherwise), the best place is always the user mailing list. Send an email to flume-user-subscribe@incubator.apache.org to subscribe and flume-user@incubator.apache.org to post once you've subscribed. The archives are available at http://mail-archives.apache.org/mod_mbox/incubator-flume-user/ as well.

If you believe you've found a bug or wish to file a feature request or improvement, don't be shy. Go to https://issues.apache.org/jira/browse/FLUME and file a JIRA for the version of Flume. For NG, please set the "Affects Version" to the appropriate milestone / release. Just leave any field you're not sure about blank. We'll bug you for details if we need them. Note that you must create an Apache JIRA account and log in before you can file issues.

  • No labels