Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Analytics & Processing Examples

These examples are here for drawing out higher-level goals for Distill's functionality. This section can be removed once the goals have been solidified.

 

Here is a model data pipeline for SENSSOFT: RAW DATA>QUERY>FILTER/Q&A>TRANSFORMATION>PRIMITIVE FEATURE EXTRACTION>TRAINED MODELING>DERIVED FEATURE EXTRACTION

There are a few different classes of libraries that Distill might include in support of this pipeline; they have different consequences for workflows with in larger analytic pipelines.

  1. QUERY: 
  2. FILTERING: Elimination of data from query return, when that data can't be eliminated by query alone because some pattern to be filtered is fully nested within some query index.
    1. EX: Filter out specific save events from osquery object access data that do not coincide with click/keyboard activity with KM Logger.
    2. EX: Random resampling of km-logger events time-series–random sample every 1/min interval
  3. FILTERING: 

 

  • Build intervals from matching sequences of raw events
  • Filter out unwanted events
    • Noisy/irrelevant events
      • May be conditional on neighboring events
    • "dangling" events (e.g. a stop event with no corresponding start)
  • Collapse duplicate events into a single event (when is this preferable to creating an interval?)
  • Create "sandwiches" (a set of events bookended by, e.g., a related start and stop event)
  • Replace some logs/data with other logs/data

...