Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

FLIP-384 is adding TraceReporter interface. However with FLIP-384 alone, Log4jTraceReporter would be the only available implementation of TraceReporter interface, which is not very helpful.

In this FLIP I’m proposing to contribute both MetricExporter and TraceReporter implementation using OpenTelemetry.

Public Interfaces / Proposed Changes

This flip proposes to add OpenTelemetryMetricReporterFactory and OpenTelemetryTraceReporterFactory factories, that would be both available in the newly added plugin flink-metrics/flink-metrics-otel (similar structure as pre-existing MetricReporters like flink-metrics-jmx).

The only custom configuration will be:

  • exporter.endpoint - url of the OpenTelemetry endpoint

  • exporter.timeout - timeout when reporting to the endpoint (Default value

Both metrics and traces reporters will also support scope.variables.additional

Example configuration:

metrics.reporters: otel
metrics.reporter.otel.factory.class: org.apache.flink.common.metrics.OpenTelemetryMetricReporterFactory
metrics.reporter.otel.exporter.endpoint: http://127.0.0.1:1337
metrics.reporter.otel.scope.variables.additional: region:eu-west-1,environment:local-pnowojski-test,flink_runtime:1.17.1

traces.reporters: otel
traces.reporter.otel.factory.class: org.apache.flink.common.metrics.OpenTelemetryTraceReporterFactory
traces.reporter.otel.exporter.endpoint: http://127.0.0.1:1337
traces.reporter.otel.scope.variables.additional: region:eu-west-1,environment:local-pnowojski-test,flink_runtime:1.17.1

Compatibility, Deprecation, and Migration Plan

There will be no impact on existing users, there is no need for any migration.

Test Plan

On top of automated tests, this feature is already used and tested inside Confluent. Before committing the final accepted version would be tested inside Confluent again.

Rejected Alternatives

None