Overview

Metrics are gauge, meter, counter, and histogram for monitoring Tajo components. Tajo internally maintain various metrics and provide them to external monitoring applications through various ways, such as Ganglia, file, log4j, and JMX.

Metric Name and Metric Hierarchy

Each metric name can be divided into three parts: group name, context name, and item name. Group and context names are category. Especially, a group name means the system component or the topmost logical category. Its hierarchy is as follows:

MASTER           (= TajoMaster metrics)
  |- CLUSTER     (Aggregated metrics about cluster stats and cluster resources)
  |- QUERY       (Aggregated metrics about submitted queries and scheduler)
 
NODE (= Node metrics)
  |- TASKS       (Metrics about TaskManager and task executions in each node)
  |- QUERYMASTER (Metrics about QueryMaster and its manager in each node)
 
${COMPONENT}-JVM (= Each component's JVM metrics in each node)
  |- MEMORY      (JVM Heap, Direct memory, ...)
  |- FILE        (File opened)
  |- GC          (GC)
  |- THREAD      (about thread)
  |- LOG         (logging events)

 

Metric List

MASTER.CLUSTER

Full Context NameItem NameData TypeUnitDescriptionExample





MASTER.CLUSTER






UPTIMElongMillisecondsHow long take the duration after Tajo cluster starts up 

TOTAL_NODES

intNumber of nodesThe total number of cluster nodes 

ACTIVE_NODES

intNumber of nodesThe active number of cluster nodes 

LOST_NODES

intNumber of nodesThe lost number of cluster nodes 
TOTAL_MEMORYintMega bytesTotal resource memory of cluster nodes 
FREE_MEMORYintMega bytesAvailable resource memory of cluster nodes 
TOTAL_VCPUintNumber of virtual CPU coresTotal virtual CPU cores of cluster nodes 
FREE_VCPUintNumber of virtual CPU coresAvailable virtual CPU cores of cluster nodes 

MASTER.QUERY

Full Context NameItem NameData TypeUnitDescriptionExample

 

 

 

 

MASTER.QUERY


 

 

SUBMITTEDintNumber of queriesHow many queries are submitted 

COMPLETED

intNumber of queriesHow many queries are completed 

RUNNING

intNumber of queriesHow many queries are running 

ERROR

intNumber of queriesHow many queries are canceled due to errors 
FAILEDintMega bytesHow many queries are failed after run 
KILLEDintNumber of queriesHow many queries are killed by users 
MAX_IO_THROUGHPUTintMega bytesMaximum aggregated IO throughput per query in cluster 
AVG_IO_THROUGHPUTintMega bytesAverage aggregated IO throughput per query in cluster 

NODE.QUERYMASTER

Full Context nameItem NameData TypeUnitDescriptionExample

NODE.QUERYMASTER

RUNNING_QMintNumber of running query mastersHow many query masters are running in the node 

NODE.TASKS

Full Context nameItem NameData TypeUnitDescriptionExample

NODE.TASKS

RUNNING_TASKSintNumber of running tasksHow many tasks are running in the node 

<COMPONENT>-JVM

All Tajo components like Master (TajoMaster) and Node (TajoWorker) have a number of JVM metrics. The metrics have a group name <component name>-JVM. For example, TajoMaster basically has MASTER-JVM group, and TajoWorker basically has NODE-JVM group. The contexts and items are all the same for all JVM metric groups.

Context nameItem NameData TypeDescriptionExample

 

 

GC


PS-MarkSweep.timeint  

PS-MarkSweep.count

int  
PS-Scavenge.timeint  
PS-Scavenge.countint  

 

 

 

 

 

 

 

 

 

 

MEMORY

















pools.Code-Cache.usage

   

pools.PS-Survivor-Space.usage

   

pools.PS-Eden-Space.usage

   

pools.PS-Perm-Gen.usage

   

pools.PS-Old-Gen.usage

   

heap.init

   

heap.usage

   

heap.used

   

heap.committe

   

heap.max

   

non-heap.init

   

non-heap.usage

   

non-heap.used

   

non-heap.committed

   

non-heap.max

   

total.init

   

total.used

   

total.committed

   

total.max

   

 

 

LOG


Info   
Fatal   
Error   
Warning   

 

 

 

 

 

THREAD








terminated.count

   
timed_waiting.count   
count   
blocked.count   
deadlock.count   
new.count   
deadlocks   
runnable.count   
daemon.count   
waiting.count   

Configuration

You should put tajo-metrics.properties in <tajo install dir>/conf. The property example is as follows:

reporter.ganglia=org.apache.tajo.util.metrics.reporter.GangliaReporter
reporter.file=org.apache.tajo.util.metrics.reporter.MetricsFileScheduledReporter

MASTER.reporters=ganglia,file
MASTER.ganglia.server=localhost
MASTER.ganglia.port=8649
MASTER.ganglia.period=10
MASTER.file.filename=/Users/hyunsik/master-metrics.log
MASTER.file.period=10

MASTER-JVM.reporters=ganglia,file
MASTER-JVM.ganglia.server=localhost
MASTER-JVM.ganglia.port=8650
MASTER-JVM.ganglia.period=60
MASTER-JVM.file.filename=/Users/hyunsik/master-jvm-metrics.log
MASTER-JVM.file.period=60

NODE.reporters=ganglia,file
NODE.ganglia.server=localhost
NODE.ganglia.port=8653
NODE.ganglia.period=10
NODE.file.filename=/Users/hyunsik/node-metrics.log
NODE.file.period=5

NODE-JVM.reporters=ganglia,file
NODE-JVM.ganglia.server=localhost
NODE-JVM.ganglia.port=8654
NODE-JVM.ganglia.period=60
NODE-JVM.file.filename=/Users/hyunsik/node-jvm-metrics.log
NODE-JVM.file.period=60
  • No labels