...
Analytics & Processing Examples
These examples are here for drawing out higher-level goals for Distill's functionality. This section can be removed once the goals have been solidified.
Here is a model data pipeline for SENSSOFT: RAW DATA>QUERY>FILTER/Q&A>TRANSFORMATION>PRIMITIVE FEATURE EXTRACTION>TRAINED MODELING>DERIVED FEATURE EXTRACTION
There are a few different classes of libraries that Distill might include in support of this pipeline; they have different consequences for workflows with in larger analytic pipelines.
- QUERY:
- FILTERING: Elimination of data from query return, when that data can't be eliminated by query alone because some pattern to be filtered is fully nested within some query index.
- EX: Filter out specific save events from osquery object access data that do not coincide with click/keyboard activity with KM Logger.
- EX: Random resampling of km-logger events time-series–random sample every 1/min interval
- FILTERING:
- Build intervals from matching sequences of raw events
- Filter out unwanted events
- Noisy/irrelevant events
- May be conditional on neighboring events
- "dangling" events (e.g. a stop event with no corresponding start)
- Noisy/irrelevant events
- Collapse duplicate events into a single event (when is this preferable to creating an interval?)
- Create "sandwiches" (a set of events bookended by, e.g., a related start and stop event)
- Replace some logs/data with other logs/data
...