Introduction

Stellar has 126 verbs today, and seems only likely to continue growing.  Furthermore, we expect Stellar to be extended by users, and probably grow into having one or more Registry/ Repositories, etc.  All this suggests that we should start viewing Stellar itself as a component, and make sure it is maintainable and has clean interfaces to the rest of the system.  And that will be easier if we extract it into its own module, both in the code tree and in maven.

This is a combination proposal / discussion about how to extract Stellar from its current deep embed in Metron.  Comments are welcome, and encouraged.  

Work items are in sub-tasks of https://issues.apache.org/jira/browse/METRON-876 .

 

Goals

The goals of the change are all aspects of Better Modularization:

1.  Easier maintenance and enhancement as the language continues to grow.

2.  Allows us to think better about the language as an asset.

3.  Clarify the distinction between the language infrastructure vs the semantics of individual operators.

4.  Easier to think about extensibility issues (registry, repository, deployment) if the whole language is a set of plug-ins rather than “allowing” plug-ins.

5.  Makes it easier to extract and define clear extensibility APIs.

Parts of the Implementation

The Stellar implementation consists of the following parts:

  1. Core implementation:

    1. Operator resolver and executor

    2. Common operators

    3. The Stellar REPL, or “Shell”

  2. User Defined Function (UDF) support:

    1. Expression pre-parse/compile, store, and execute

    2. Stateful executor for expressions with context

  3. Additional stand-alone operators from other functionality groups, such as Enrichment

  4. Time Window Selector DSL (independent of Profiler functionality)

  5. Add-on custom operator loading from HDFS

  6. There will need to be new deployment support: Maven modularization, RPM generation, and MPack support.

What’s not part of Stellar?

The uses of Stellar, to invoke operators or UDFs, are clearly client to the the above described material, and should not be extracted.  This includes many additional Stellar operators defined in Metron that are not stand-alone, and must be considered part of other functionality groups.  The fact that they define a Stellar operator is more in the nature of providing an API, than being “part of” Stellar.  These should, instead, be packaged as the sets of extensions to Stellar.  The Stellar “component” should not need to have any dependencies on the rest of Metron, but multiple parts of Metron will have dependencies on Stellar.

The following sections discuss the extraction of each of the above-mentioned parts.

 

Core implementation

 

The majority of the core Stellar implementation and unit tests is in these code paths:

  • metron-platform/metron-common/src/main/java/org/apache/metron/common/stellar/

  • metron-platform/metron-common/src/test/java/org/apache/metron/common/stellar/

  • metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/ (includes core set of Stellar operators)

  • metron-platform/metron-common/src/test/java/org/apache/metron/common/dsl/functions/


I’ve moved these paths to, respectively:

  • metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/common/

  • metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/common/

  • metron-stellar/stellar-common/src/main/java/org/apache/metron/stellar/dsl/

  • metron-stellar/stellar-common/src/test/java/org/apache/metron/stellar/dsl/functions/


In terms of java package names, I’ve moved

  • org.apache.metron.common.stellar  → org.apache.metron.stellar.common

  • org.apache.metron.common.dsl  → org.apache.metron.stellar.dsl


This code includes:

  • The Stellar infrastructure:

    • Stellar resolver and executor, and the extensions loader

    • The Antlr specs for the lexer and parser

    • Stellar REPL

    • The APIs that allow Stellar operators to be called from Java

    • The APIs that allow Stellar extensions by creation of custom operators

  • Some of the Stellar documentation

  • Core self-contained Stellar operators and functional primitives, and their implementations, that aren’t specific to Metron, such as type convertors

The last bullet (Stellar operators) needs to be reviewed to exclude any operators that should not be in the Stellar module; see discussion below.  Review will help find these.

The next question is, what else should also be moved into the Stellar module?  What are the “cut lines” for the separation, that will give a clean and easy-to-use API?

 

User Defined Function (UDF) Support

 

Another core functionality is embodied in the APIs that allow Stellar expressions to be created as UDFs, optionally pre-parsed/”compiled”, assigned to Java reference variables, called from Java, and called from Stellar as lambda references.

  • Some of this is also in common and has been moved to metron.stellar.common.

  • Some of this is in org.apache.metron.profiler.stellar.StellarExecutor and its siblings.  I renamed this StellarStatefulExecutor, and moved it to metron.stellar.common.

 

Additional stand-alone operators from other functionality groups

 

There are a lot of individual Stellar operator definitions and their implementations scattered around the code, in paths:

  • metron-analytics/metron-profiler-client/src/main/java/org/apache/metron/profiler/client/stellar/ (3 files)

  • metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/ (4 files)

  • metron-platform/metron-common/src/main/java/org/apache/metron/common/field/ (7 files)

  • metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/stellar/ (2 files)

  • metron-platform/metron-management/src/main/java/org/apache/metron/management/ (9 files)

Many of these have to do only with Stellar, and do not depend on any other Metron-specific code.  This includes:

  • Any pieces of Stellar infrastructure I may have missed.

    • In particular, any bits that relate to storing and using Stellar expressions as UDFs in Metron structures like Enrichers and Profilers.  Is that API clean?

  • Other Stellar operators with implementations that are self-contained, and not dependent on Metron APIs.  

    • There are many such in Enrichment / Threat Intel, and may be in other functionality groups as well.

    • FileSystemFunctions and ShellFunctions (under metron.management)

    • Some material in metron.common.utils?

      • StellarProcessorUtils (Test) already moved to metron.stellar.common.utils.  How about SerDeUtilsTest? Others?

  • Some Stellar operators are dependent on Metron only as regards expected data structures as input and output.  We may be able to augment Stellar with a “schema” model to make these operators also independent of Metron.

All the above items should, in my opinion, be moved into the Stellar module.

 

Time Window Selector DSL (independent of Profiler functionality)

 

While the Time Window Selector DSL is sort of stand-alone within Stellar, it is a powerful tool for addressing the kinds of queries against TSDB that Metron wants to do.  We should include it in Stellar and extract it.

Most of the functionality is currently in metron.profiler.client.window (path: metron-analytics/metron-profiler-client).  It is currently tightly integrated with the profiler, which has an idiosyncratic way of expressing its sampling timestamps (as HBase rowkeys).

I think there is already an intermediate form generated during processing of the DSL, as an array of time intervals, which can then be efficiently turned into the list of HBase rowkeys needed by the Profiler, but is not directly dependent on Profiler usages.  I propose this intermediate form as the API to the Time Window Selector DSL in an extracted Stellar.  The package would be metron.stellar.timewindow.

 

Add-on custom operator loading from HDFS

 

This functionality is implemented mostly in:

  • metron-platform/metron-common/src/main/java/org/apache/metron/common/dsl/functions/resolver/ClasspathFunctionResolver.java

so is already extracted via item 1 above.  It will be important to continue working with the setupStellarStatically() method in

  • metron-platform/metron-common/src/main/java/org/apache/metron/common/configuration/ConfigurationsUtils.java

 

New deployment support: Maven modularization, RPM generation, and MPack support.

TBD

What’s not part of Stellar: Metron-dependent Stellar usages

This section contains thoughts on the issue rather than a plan for any specific actions.


The following items should not be extracted as part of Stellar:

  • Things which are really Metron-specific in their semantics and implementation, but have a Stellar function as, essentially, an API, or as “glue”, or as extension mechanism.  Most of the “management” functions (Config, Enrichment, Threat Triage, and Grok) are in this category.  

    • To me, this is the most critical “cut line”.

    • If the underlying functionality is entirely Metron-specific, and the addition of Stellar functionality is just a convenience, or is purely a client of the UDF-calling APIs, then it should stay embedded in Metron.

      • We should consider packaging the convenience Stellar functions into extension packages as examples for end-user extension.

    • If the underlying functionality strongly depends on Stellar for its usefulness (other than established use of UDF API already considered), then an API is needed, with careful consideration of what pieces are Stellar vs non-Stellar.


Candidates for consideration include:

  • Material under metron.common.field?

    • Where do the Field transformer/validators fall, 1 or 2?

  • metron.common.aggregator?

    • Are these functions ever used outside of Stellar?


There are some neither-fish-nor-fowl things, like the Kafka interface functions.  Where do they fit best?


The Profiler is a client of the Stellar UDF capability, but also has spawned a large number of Stats-related operators (STATS, MAD, HLLP).  These functions are extremely useful, but not clear if they are part of Stellar, or part of a TSDB functionality that may be separately extracted.  We might even look at a component like Apache Druid, which has many similar TSDB capabilities.


The BLOOM operators are similar; very powerful but not clearly part of Stellar.


Which of the operators (Stellar functions) are key for streaming apps?


  • No labels