Logging Guidelines for Cassandra 2.2+

CASSANDRA-10241 created debug.log in addition to the traditional system.log on Cassandra 2.2+. While system.log will continue to output the most relevant cluster status information (INFO, WARN and ERROR levels), the new debug.log (DEBUG in addition to the previous levels) will log things that help during troubleshoting, such as intermediate protocol steps and more detailed operational information. The examples below serve as a general guideline to help the developer decide what goes into each logging level.

INFO: General cluster status, operations overview. At this level a beginner user or operator should be able to understand most messages. Examples:

  • Node startup and shutdown information
  • User or system triggered operations overview
    • Repair start and finish state
    • Cleanup start and finish state
    • Bootstrap start and finish state
    • Index rebuild start and finish state

DEBUG: Low frequency state changes or message passing. Non-critical path logs on operation details, performance measurements or general troubleshooting information. At this level an advanced operator or system developer will have elements to investigate or detect erroneous conditions or performance bottlenecks, extract reproduction steps or inspect advanced operational information. Examples:

  • SSTable flushing
  • Compactions in progress
  • Gossip or schema state changes
  • Operations intermediate steps
    • Repair steps
    • Stream session message exchanges

WARN: Use of suboptimal parameters or deprecated options, detection of degraded performance, capability limitations or missing dependencies. General optimization tips. At this level, an operator should be able to detect an eminent error condition, use of suboptimal parameters or non-critical configuration errors. Examples:

  • Use of chunk_length_in_kb property instead of chunk_length
  • GC above treshold warnings
  • OpenJDK not recommended notice
  • Small sstable size warning (Testing done for CASSANDRA-5727 indicates that performance improves up to 160MB)

ERROR: A expected error condition has ocurred. Non-critical, transient or recovered errors might be reported at DEBUG level instead so they don't pollute system.log. Examples:

  • critical errors in general (corrupted disk, read error, etc)
  • leak detection

TRACE: High frequency state changes or message passing, critical path logs, testing or development information. This level is disabled by default, so everything that does not fit in the previous levels and highly verbose stuff must be kept at TRACE level. Examples:

  • No labels