Num of Nodes | METRIC_RECORD (MB) | METRIC_RECORD _MINUTE (MB) | METRIC_RECORD _HOURLY (MB) | METRIC_RECORD _DAILY (MB) | METRIC_AGGREGATE (MB) | METRIC_AGGREGATE _MINUTE (MB) | METRIC_AGGREGATE _HOURLY (MB) | METRIC_AGGREGATE _DAILY (MB) | TOTAL (GB) |
50 | 5120 | 2700 | 245 | 10 | 1500 | 305 | 28 | 1 | 10 |
100 | 10240 | 5400 | 490 | 20 | 1500 | 305 | 28 | 1 | 18 |
300 | 30720 | 16200 | 1470 | 60 | 1500 | 305 | 28 | 1 | 49 |
500 | 51200 | 27000 | 2450 | 100 | 1500 | 305 | 28 | 1 | 81 |
800 | 81920 | 43200 | 3920 | 160 | 1500 | 305 | 28 | 1 | 128 |
NOTE
- The above guidance has been derived from looking at AMS disk utilization in actual clusters.
- The ACTUAL numbers have been obtained by observing an actual cluster with the basic services (HDFS, YARN, HBase) installed along with Storm, Kafka and Flume.
- Kafka and Flume generate metrics only while a job is running. If those services are being used heavily, additional disk space is recommended. We ran sample jobs with STORM and KAFKA while deriving these numbers to make sure there is some contribution.
Actual disk utilization data
Num of Nodes | METRIC_RECORD (MB) | METRIC_RECORD _MINUTE (MB) | METRIC_RECORD _HOURLY (MB) | METRIC_RECORD _DAILY (MB) | METRIC_AGGREGATE (MB) | METRIC_AGGREGATE _MINUTE (MB) | METRIC_AGGREGATE _HOURLY (MB) | METRIC_AGGREGATE _DAILY (MB) | TOTAL (GB) |
2 | 120 | 175 | 17 | 1 | 545 | 136 | 16 | 1 | 1 |
3 | 294 | 51 | 3.4 | 1 | 104 | 26 | 1.8 | 1 | 0.5 |
10 | 1024 | 540 | 49 | 2 | 1433.6 | 305 | 28 | 1 | 3.3 |