You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

 

Status


Current stateProposed

 

 

Motivation

The JIRA and discussions have the various requirements captured for the ability to gather metrics of during the gateway’s processing pipeline and request/response flow. The initial attempt to satisfy some of the requirements is to provide a very simple abstraction to hide the details of the dropwizard metrics library and expose some of the basic and most requested metrics. As a side note, all the requirements in KNOX-643 are not going to be captured by this KIP and the associated work. The following requirements are going to be addressed:
  1. Ability to get time taken for request/reponse coming from the client and the frequency at various time intervals at the service level
  2. Ability the get the time and frequency of request/responses to the backend service component.
  3. The number of open connections to the backend service component.
  4. API to add/extend the metrics capabilities.
  5. Ability to report the metrics to reporting engines like Graphite and Ambari Metrics Service.


Design

The dropwizard metrics library was selected after some comparative analysis of similar libraries that allow for instrumenting code so that metrics can be gathered at runtime. This document is leaving out that analysis and asserting the result that the dropwizard metrics library was essentially the easiest to use API that provided the functionality we were looking for, had the appropriate licensing and was most frequently used in other Apache projects.
The desire however as always is to provide a layer of abstraction, leaving the possibility open to future changes or adoption of other libraries. The pattern used in the API design is of being able to provide instrumented versions of a class or an interface and not yet exposing the detailed measuring instruments like Guages, meters etc.
The MetricsService API therefore looks like this:

 

public interface MetricsService extends Service {

  <T> T getInstrumented(T instanceClass);

  <T> T getInstrumented(Class<T> clazz);

}

 

The MetricsService is implemented as a Gateway Service, the details of which can be found in the dev guide. It is therefore accessible to all Topology deployments so that per topology metrics can be done and of course aggregation can be done as well at the gateway level.

Plugging in a new Instrumented Class


The standard ServiceLoader mechanism is used to find and load classes that can provide instrumented classes. The two interfaces involved are :

 
public interface InstrumentationProvider<T> {

  T getInstrumented(MetricsContext metricsContext);

 T getInstrumented(T instanceClass, MetricsContext metricsContext);

}

 

and 

 

public interface InstrumentationProviderDescriptor {

  Map<Class<?>, InstrumentationProvider> providesInstrumentation();
}

 

 

The service loader mechanism will look up InstrumentationProviderDescriptor classes so a provider-configuration file needs to be provided for this implementation in a resource directory META-INF/services.



Reporting


 
public interface MetricsReporter {

  String getName();

  void init(GatewayConfig config) throws MetricsReporterException;

  void start(MetricsContext metricsContext) throws MetricsReporterException;

  void stop() throws MetricsReporterException;

  boolean isEnabled();
}


Plugging in a new Reporter


The ServiceLoader pattern is used again here so all you need to do to add a new reporter is to implement the MetricsReporter interface and provide a provider-configuration file in a resource directory META-INF/services.

Since the GatewayConfig is passed to the reporter instance on init, any configuration that needs to be provided to the reporter needs to be wired up through the GatewayConfig interface. This essentially means that configuration can be provided in the gateway-site.xml file.

Graphite+Grafana


Config


The initial configuration is mainly for turning on or off the metrics collection and then enabling reporters with their required config. The two initial reporters implemented are JMX and Graphite.


gateway.metrics.enabled 

Turns on or off the metrics, default is 'true'
 

gateway.jmx.metrics.reporting.enabled

Turns on or off the jmx reporter, default is 'true'


gateway.graphite.metrics.reporting.enabled

Turns on or off the graphite reporter, default is 'false'

gateway.graphite.metrics.reporting.host
gateway.graphite.metrics.reporting.port
gateway.graphite.metrics.reporting.frequency

The above are the host, port and frequency of reporting (in seconds) parameters for the graphite reporter.

 




Future work

From the list of requirements one of the glaring holes is that of getting more metrics out of Knox’s Shiro/LDAP provider. I believe this requirement comes from more of a debugging mindset when faced with issues in the field, but may have a broader appeal so need some validation.
The other main task which is possibly a near future item is to provide for additional reporters. Specifically of interest would be a reporter that sends data to the Ambari Metrics Service. This would provide a convenient solution for viewing the metrics when in a  hadoop deployment that has Ambari available. 
  • No labels