I encountered several issues when implementing the Stream Oscilloscope (StreamScope) runtime facility. Those issues are captured here so we can discuss them and reach consensus on a course of action.
Stream Oscilloscope feature / use summary:
Just to set some context...
- A StreamScope (Stream Oscilloscope) captures tuples. StreamScopes can be enabled/disabled/paused/resumed, they have count and time based captured tuple retention policy, they have count, time, or Predicate based tuple selection controls. By default each streamScope is disabled. A Sample is created for each captured tuple containing the tuple (not copied) and capture timestamps. The control mbean (below) makes the captured samples available as JSON.
- The DevelopmentProvider adds StreamScope peekers to streams (akin to Metrics being added).
- StreamScope's are registered with a new StreamScopeRegistry runtime service. The registration occurs as part of a StreamScope oplet's initialize().
- StreamScopeRegistryMXBean and StreamScopeMXBean control mbeans are defined and registered in the (development provider JMX) ControlService. The registry mbean lazy creates / registers StreamScope mbeans. -Tests for all of the above were added.
Notes for the Console:
- The Console will lookup a stream's StreamScopeMXBean by its streamId via the StreamScopeRegistryMXBean. A servlet/StreamScopeUtil helper is supplied.
- A StreamScope is registered by its output stream's streamId, not the streamId of the "origin stream" it was created to monitor. That's one of the issues alluded to below. The util class forms the streamId.
- StreamScope oplets are added to the graph (all the smarts are in the StreamScope Consumer<T>function). StreamScope is a subclass of Peek. The console needs to adjust its presentation of these oplets to diminish their presence. I suggest not presenting them at all and instead just including a "show tuples" link on a stream's hover popup. (I'd also omit presenting metric oplets and instead add their info to the stream's hover).
Functions can't really use runtime services / ControlService
I started by developing a StreamScope Consumer<T> for use as a seeker oplet function. This function has all of the smarts of the tuple capture code.
The function needs to register at runtime with a “streamId” which, as a result of provider-wide services, needs to include the jobId. It also needs the opletId. These values aren’t known until later in the submission processing (later than when the graph is augmented with StreamScope seekers).
We don’t have a way for functions to access
runtime services (e.g., to get the StreamScopeRegistry), nor job/oplet context.
<dan> Topology.getRuntimeServiceSupplier() provides access to runtime services.
Hence I was forced to create a StreamScope subtype of Peek. It’s initialize() has access to the necessary info.
Is it appropriate to require any such use/needs to subclass an oplet to gain access to this sort of info? Note, ControlService is doc’d as being expected to be usable by applications.
I’ve run across this need in other situations as well. e.g., a function that wanted to create a thread pool / Executor but couldn’t get access to the ThreadFactoryService.
It feels like we want analogous Function level support. e.g., an Initializable interface (with initialize(FunctionContext)) that functions that need this info to gain access to it. Agreed?
ControlService Unregistration woes
Related: JobMXBeans aren't getting unregistered with the net effect of being a memory leak. Right now, StreamScopeRegistry has a similar affliction. TStream.poll() PeriodMXBean also isn't getting unregistered. <dlaboss: looks like PR-153 is addressing the Job bean issue>
Poll also has a bug in that it's not registering the control with a provider-wide control alias. Hence there can be collisions among multiple topologies/jobs in a provider. I don't think it's reasonable to expect TStream.poll().alias(alias) clients to come up with provider-wide alias values.
- TStream.alias() already docs they need to provide unique-to-topology values - is that reasonable / OK?
- poll() needs to be fixed to register the control with a jobId-scoped alias value right?
At a high level I wasn't initially aware that Topology.getRuntimeSourceSupplier() returned a supplier for provider-wide services - i.e. that the ControlService service wasn't topology/job-specific.
- improve that method's javadoc to make this more apparent?
- should there be a TopologyProvider.getRuntimeServicesSupplier()?
Provider-wide services are good for some things (e.g., AppService bean, maybe Job beans) but not particularly scalable / helpful for per-job controls. Maybe just something to keep in mind.
When I started to write some code to unregister the StreamScopeRegistryMBean, I had to store away the controlId returned by ControlService.registerControl() because there’s no ControlService.unregister(type,alias). <dlaboss: looks like PR-153 is adding getControlId()>
Similarly my StreamScopeRegistryBean has to store away the controlId for StreamScopeMXBeans that it registers so that it can unregister them (ultimately via a ServiceContainer.addCleaner() hook. Of course it would have to store something regardless as the cleaner hook only provides <jobId, opletId> and a oplet could have multiple StreamScope instances (one per oport) to unregister. Still, the omission of unregister(type,alias) means that even more info must be stored.
- add ControlService.unregister(type,alias)? if one can lookup by <type,alias> seems like they should be able to unregister with that info. <dlaboss: looks like PR-153 is adding getControlId()>
While there’s a ServiceContainer.addCleaner() hook, it’s not at all apparent to ControlService clients that that exists.
- need to enhance CS doc?
- maybe CS should hook into it and perform its own job and/or oplet unregistrations, in conjunction with with registerControl(jobId, ...), registerControl(jobId, opletId, ...)? See the related "streamId, ControlService registration" topic below.
- some other hook for controls that aren't job specific (e.g., AppService, StreamScopeRegistry)?
- another tid bit... since StreamScopeRegistry bean isn't getting unregistered, an no where to hook that, in conjunction with the JMXControlService, which has an effectively "static" / singleton instance of the platform JMX service", initially my tests were failing even though new provider instances were created, with new JMXControlService instances, because trying to register the new provider's new StreamScopeRegistry bean ultimately encountered the previous instance's registration in the JMX service. I worked around this by creating a singleton StreamScopeRegistry / bean, mirroring the singleton-ness of the underlying platform JMX service.
IotProvider can't support Console
The Console is hardcoded to use JMX APIs. The IotProvider uses JsonControlService, which doesn’t publish to JMX. Note, Android doesn’t support JMX. Probably other “edge device” environments don’t either.
There are a couple of ways to look at this. One way…
- the Console should not directly use JMX, it should use ControlService. It would lookup the ControlService service by being provided the RuntimeServiceSupplier for the provider that it was created for.
- Metrics registration need to occur via ControlService and not directly use JMX.
- DevelopmentProvider should also just use “JsonControlService”
- Is there any real benefit to using JMX / JMXControlService today? If so, maybe there’s value in two different DevelopmentProvider configs? If not, put JMXControlService on the shelf until there’s a need. e.g., maybe in the future, a Console that could manage a remote application (via JMX controls) will be valuable.
- I think ControlService would also have to support some query capability for the Console to be able to query for JobMXBeans present. Or a more general JobRegistry mbean might be needed. Ditto for Metrics… or maybe not if Metrics registered themselves with the streamId of the “origin stream” they were created to instrument.
JSON based control service request handling
IMO json encoded control service request handling is a separate concern from control service mbean registration & lookup. Hence seems like:
- JsonControlService.controlRequest() should be split out to a provider and ControlService impl independent class.
- JsonControlService should be renamed BasicControlService - there’s really nothing JSON specific about it once controlRequest() is removed.
JsonControlService.controlRequest() is too restrictive wrt legal MBean constraints
The StreamScopeRegistryMXBean has a lookup() method that returns a StreamScopeMXBean. JsonControlService doesn’t support that.
org.apache.edgent.execution.services.Controls.isControlServiceMBean(), used by JsonControlService but thankfully not JMXControlService, restricts mbean method parameter and return types to primitive types.
It, like JMX, should support Arrays,List/Set Collections, Maps, and MXBeans. Agreed?
Instrumentation controls can’t register with the id for it’s “origin stream”
I think we want to make instrumentation control services (StreamScope, Metrics?) more transparent to the user in the Console. e.g., the user should be able to hover on an “application stream” connecting two application oplets, even when there are one or more instrumentation oplets between them, and the Console generated hover should provide something like a “Show tuples” link and metrics info for the application stream.
It’s may not be required that that instrumentation oplet controls register themselves with the “streamId” of the “origin stream” they were created for but it strikes me that it might be easier for the Console if the controls were able to register that way.
At instrumentation oplet graph insertion time (via Graph.peekAll()), the instrumentation oplets don’t really have the info they need.
It’s easy enough to add a peekAllFn(BiFunction<Vertex,Integer/*oport*/, Peek>) so the fn creating the instrumentation oplet has access to the “origin stream”. One can get the oplet from the Vertex (getInstance() - rename to getOplet?). However, Oplet doesn’t know its opletId. It’s present in its OpletContext, given to it later in Oplet.initialize(OpletContext).
I’m uneasy with the idea of an instrumentation oplet having to walk back through its source stream to find the “origin stream”. Rather I think it would be better if the instrumentation oplet could, for example, retain a reference to the “origin stream” oplet and then in its start() (initialize() is too soon?), be able to do something like originOplet.getContext() to be able to get the opletId. I can imagine that might imply too many constraints on a provider impl but you get the idea.
What’s the best way to be able to make instrumentation oplet controls register themselves with the origin stream’s id? Or… ???
streamId, ContrtolService registration, mbean object viewing / hierarchy
When I was looking at registrations via JConsole, noticed that StreamScope from multi-jobs are all in a single folder. JMXControlService can’t form a more hierarchical ObjectName jobId=J, opletId=O, oport=n for streams, etc given the limited registerControl(type,alias,bean). Well… not unless there’s a some additional wk structure to aliases that it can parse (e.g., j[jobId].c[controlId] j[jobId].op[opletId].c[controlId] j[jobId].op[opletId].o[oportIndex].c[controlId])
Related, my StreamScopeRegistry has its own mkStreamId(jobId, opletId, oportIndex) to form keys to register a StreamScope with the StreamScopeRegistry. (this was definitely a necessity pre-StreamScope oplet where it might be able to leverage OpletContext.uniquify()). Sort of feels like we might benefit from some higher level utility for generating various common ids?