Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Google Doc: <If the design in question is unclear or needs to be discussed and reviewed, a Google Doc can be used first to facilitate comments from others.>

Motivation

  1. Support for tracing. Telemetry data traces, metrics, and logs are often known as the three pillars of observability. Currently, Doris lacks traces telemetry data collection, which makes it difficult to locate slow queries and troubleshoot system bottlenecks. With OpenTelemetry, traces data can be collected to effectively monitor the process of request execution and greatly improve system observability.
  2. Doris currently does not implement a uniform open standard for telemetry data collection, which is not conducive to exporting telemetry data to third-party systems for analysis. OpenTelemetry implements a set of open source standard semantic conventions, provides vendor-independent instrumentation libraries, and supports multiple programming languages for telemetry data collection and easy export of telemetry data to different back-end nodes (including Zipkin, Jaeger, Prometheus, etc.).
  3. Associate traces, metrics and logs. The telemetry data currently collected by Doris is not correlated with each other, and it is impossible to quickly locate one kind of telemetry data to another. By introducing OpenTelemetry, traces, metrics, logs can be correlated. For example, we can inject
  4. traceid and spanid
  5. trace_id and span_id into metrics through exemplars to correlate traces and metrics, and inject
  6. traceid and spanid
  7. trace_id and span_id into logs to correlate traces and logs, so as to quickly locate all telemetry data of the problem.
  8. export profile. the current query profile output text format, query each stage of the time consuming display is not intuitive, and not persistent. Tagging the profile to the trace allows the trace to override the profile and facilitates analysis of slow queries.

Related Research

1. Telemetry

...