Skip to end of metadata
Go to start of metadata

Syslog Performance Test 2012-04-30

Who ran the test: Mike Percy <mpercy at cloudera dot com>

Test Setup

Overview
The Flume NG agent was run on its own physical machine in a single JVM. A separate client machine generated load against the Flume box in syslog format. Flume stored data onto a 9-node HDFS cluster configured on its own separate hardware. No virtual machines were used in this test.

Hardware specs
CPU: Intel Xeon L5630 2 x quad-core with Hyper-Threading @ 2133MHz (8 physical cores)
Memory: 48GB
OS: SLES 11sp1 (SuSE Linux 64-bit)

Flume configuration
Java version: 1.6.0u26 (Server Hotspot VM)
Java heap size: 2GB
Num. agents: 1
Num. parallel flows: varies (see results)
Source: SyslogTcpSource
Channel: MemoryChannel
Sink: HDFSEventSink with avro_event serialization and snappy serializer compression

Fragment of flume.conf config file

Hadoop configuration
The HDFS sink was connected to a 9-node Hadoop cluster running CDH3u3 with MIT Kerberos v5 security enabled.

Visualization of test setup

Data description
Syslog entries containing sequentially increasing integers plus padding
Event size: 300 bytes

Results

Throughput summary

Num flows

Min aggregate events/sec

Max aggregate events/sec

Min avg. single-flow events/sec

Max avg. single-flow events/sec

6

41982.34

54538.92

6997.06

9089.82

7

45639.21

51646.33

6519.89

7378.05

8

64748.63

66095.53

8093.58

8261.94

9

57358.73

65506.95

6373.19

7278.55

10

58557.15

66324.04

5855.72

6632.40

11

59519.33

62419.89

5410.85

5674.54

12

60105.21

69164.94

5008.77

5763.74

13

69450.87

70590.71

5342.37

5430.05

14

62674.97

64030.08

4476.78

4573.58

15

64499.65

72783.06

4303.64

4852.20

16

65064.07

72714.94

4066.50

4544.68

Conclusions

  1. Flume appears to be capable of achieving approx. 70,000 events/sec on a single machine at the time of the test with no data loss
  2. The optimal number of parallel flows is nearly achieved by creating one flow per CPU core. Additional flows may be added with marginal benefit, likely up to 2x the number of physical cores available on the system, if hyper-threading is available.

N.B. These various performance tests were somewhat brief and not completely automated, so these min/max numbers were only across 3 runs per flow combination, and the min/max was a snapshot in time after a minute or two of allowing the traffic to "settle" (and Hotspot to kick in). Because of the low number of samples, some significant variance is to be expected. Nevertheless, some general trends are apparent in the data.

Labels
  • No labels
  1. gv

    Mike,

    Have you done any bench marks on the FileChannel? If you have can you please publish it?

    Thanks much
    Gv