Status
Motivation
- Ability to get time taken for request/reponse coming from the client and the frequency at various time intervals at the service level
- Ability the get the time and frequency of request/responses to the backend service component.
- The number of open connections to the backend service component.
- API to add/extend the metrics capabilities.
- Ability to report the metrics to reporting engines like Graphite and Ambari Metrics Service.
Design
public interface MetricsService extends Service { <T> T getInstrumented(T instanceClass); <T> T getInstrumented(Class<T> clazz); }
Plugging in a new Instrumented Class
public interface InstrumentationProvider<T> { T getInstrumented(MetricsContext metricsContext); T getInstrumented(T instanceClass, MetricsContext metricsContext); }
and
public interface InstrumentationProviderDescriptor { Map<Class<?>, InstrumentationProvider> providesInstrumentation(); }
The service loader mechanism will look up InstrumentationProviderDescriptor classes so a provider-configuration file needs to be provided for this implementation in a resource directory META-INF/services.
Reporting
public interface MetricsReporter { String getName(); void init(GatewayConfig config) throws MetricsReporterException; void start(MetricsContext metricsContext) throws MetricsReporterException; void stop() throws MetricsReporterException; boolean isEnabled(); }
Plugging in a new Reporter
Initial metrics and naming
The initial instrumentation implementation leverages dropmetrics API as well as some built-in functionality for gathering metrics in the gateway's Dispatch code as well as incoming requests from the client. The Dispatch code uses Apache's httpclient library so if metrics is enabled we can switch out the regular httpclient for an instrumented one. Thus gathering timing, count and connection information for all our dispatch requests.
Similarly the GatewayFilter class has an instrumented version to collect data on incoming requests from the clients.
The naming of these metrics has to be a bit elaborate to allow for enough namespacing. The client requests coming into the gateway all start with 'client' and the dispatch requests to a backing service all start with 'service'. After that it is the request URI followed by the request method (GET, PUT, POST etc).
Below is a screen shot of some of the attributes and names provided as seen in a JMX viewer.
In the example above a curl request like this:
curl -iku guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=GETHOMEDIRECTORY'
results in two mbeans:
- client./gateway/sandbox/webhdfs/v1/.GET-requests
- service./gateway/sandbox/webhdfs/v1/.GET-requests
The attributes provided are:
Mean
StdDev
DurationUnit
50thPercentile
75thPercentile
98thPercentile
99thPercentile
999thPercentile
Min
Max
FifteenMinuteRate
FiveMinuteRate
OneMinuteRate
MeanRate
RateUnit
Count
Graphite+Grafana
Config
gateway.metrics.enabled
Turns on or off the metrics, default is 'true'
gateway.jmx.metrics.reporting.enabled
Turns on or off the jmx reporter, default is 'true'
gateway.graphite.metrics.reporting.enabled
Turns on or off the graphite reporter, default is 'false'
gateway.graphite.metrics.reporting.host
gateway.graphite.metrics.reporting.port
gateway.graphite.metrics.reporting.frequency
The above are the host, port and frequency of reporting (in seconds) parameters for the graphite reporter.