This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Overview

This is the top level section for all Flume NG documentation. Flume NG is a refactoring of Flume and was originally tracked in FLUME-728. From the JIRA's description:

...

  • Event
    An event is a singular unit of data that can be transported by Flume NG. Events are akin to messages in JMS and similar messaging systems and are generally small (on the order of a few bytes to a few kilobytes). Events are also commonly single records in a larger dataset. An event is made up of headers and a body; the former is a key / value map and the latter, a arbitrary byte array

    Wiki Markup{footnote}In the future, an event body may be a Java {{ByteBuffer}}.{footnote}

  • Source
    A source of data from which Flume NG receives data. Sources can be pollable or event driven. Pollable sources, like they sound, are repeatedly polled by Flume NG source runners where as event driven sources are expected to be driven by some other force. An example of a pollable source is the sequence generator which simple generates events whose body is a monotonically increasing integer. Event driven sources include the Avro source which accepts Avro RPC calls and converts the RPC payload into a Flume event and the netcat source which mimics the nc command line tool running in server mode. Sources are a user accessible API extension point.

...

  • Agent
    Flume NG generalizes the notion of an agent. An agent is any physical JVM running Flume NG. Flume OG users should discard previous notions of an agent and mentally connect this term to Flume OG's "physical node." NG no longer uses the physical / logical node terminology from Flume OG. A single NG agent can run any number of sources, sinks, and channels between them.

    Wiki Markup{footnote}Subject to available CPU, memory, blah blah blah.{footnote}

  • Configuration Provider
    Flume NG has a pluggable configuration system called the configuration provider. By default, Flume NG ships with a Java property file based configuration system that is both simple and easy to generate programmatically. Flume OG has a centralized configuration system with a master and ZooKeeper for coordination and we recognize this is very appealing to some users where as others see it as overhead they simply don't want. We opted to make this a pluggable extension point and ship a basic implementation that would let many users get started quickly and easily. There's almost certainly enough desire for a similar implementation to that of Flume OG, but it isn't yet implemented. Users may also implement arbitrary plugins to integrate with any type of configuration system (JSON files, a shared RDBMS, a central system with ZooKeeper, and so forth). We see this as something more interesting to system integrators.

...

Data Delivery Semantics

TODO.

Wiki Markup
{display-footnotes}

WarningEverything below this line is outdated. In the process of updating things now... -esammer

Notes

...

These are esammer's raw notes while hacking on NG. There's no guarantee they match the code exactly as they're taken in vim during development and then dropped here for reference. Ideally, they wil be refined over time and integrated into a developer handbook for Flume.

...

Critical Features

Having spoken to a large number of both potential and current Flume users, the following features seem to be the most important (beyond "transfer this data").

...

In the existing Flume config language there's an incongruence in how logical nodes and sources / sinks are presented. This is largely the side effect of a Way-Back Decision(tm) so I won't go into details. Here's the issue.

User defines:

Code Block

agent     : someSource | agentE2ESink('hostname2', 12345);
collector : collectorSource(12345) | someSink;

...

The problem: The user has to know that collector must be hostname2 or the config fails. Autochains were meant to solve this but the notion of flow is so under-documented it's not clear how to separate sets of compatible agents and collectors. Let's use a naming convention people get.

...

The same config in "groups" (in pseudo-new-config-slash-old-config):

Code Block

host A {
  someSource --> collector-client { group: "a" };

  collector-server { port: 12345; group: "a"; } --> someSink;
}

host B {
  collector-server { port: 12345; group: "a"; } --> someSink;
}

...

Groups probably also support a notion of mode. A mode is one of round-robin, fan-out, or least-loaded. This becomes both fan-out and load balancing across active-active collectors.

...

Possible filter-ish thing?

Code Block

header-filter: "value-of(x) matches "^foo-.*$""

...

Diagrams

TODO.