Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Motivation
FLIP-384 is adding TraceReporter interface. However with FLIP-384 alone, Log4jTraceReporter would be the only available implementation of TraceReporter interface, which is not very helpful.
In this FLIP I’m proposing to contribute both MetricExporter and TraceReporter implementation using OpenTelemetry.
Public Interfaces / Proposed Changes
This flip proposes to add OpenTelemetryMetricReporterFactory and OpenTelemetryTraceReporterFactory factories, that would be both available in the newly added plugin flink-metrics/flink-metrics-otel (similar structure as pre-existing MetricReporters like flink-metrics-jmx).
The only custom configuration will be:
exporter.endpoint - url of the OpenTelemetry endpoint
exporter.timeout - timeout when reporting to the endpoint (Default value
Both metrics and traces reporters will also support scope.variables.additional
Example configuration:
metrics.reporters: otel metrics.reporter.otel.factory.class: org.apache.flink.common.metrics.OpenTelemetryMetricReporterFactory metrics.reporter.otel.exporter.endpoint: http://127.0.0.1:1337 metrics.reporter.otel.scope.variables.additional: region:eu-west-1,environment:local-pnowojski-test,flink_runtime:1.17.1 traces.reporters: otel traces.reporter.otel.factory.class: org.apache.flink.common.metrics.OpenTelemetryTraceReporterFactory traces.reporter.otel.exporter.endpoint: http://127.0.0.1:1337 traces.reporter.otel.scope.variables.additional: region:eu-west-1,environment:local-pnowojski-test,flink_runtime:1.17.1
Compatibility, Deprecation, and Migration Plan
There will be no impact on existing users, there is no need for any migration.
Test Plan
On top of automated tests, this feature is already used and tested inside Confluent. Before committing the final accepted version would be tested inside Confluent again.
Rejected Alternatives
None