Distill 0.2.0

Target release	0.2.0
Version
Document status	DRAFT
Document owner	Joshua C. Poore
Designer	Michelle Beard, Laura Mariano
Developers	Todd Nelling, Josie Yip
QA	Michelle Beard

Goals

Allow for scale-ability in analytics framework for SENSSOFT
- Distill 0.2.0 will allow us to grow the incumbent analytical/modeling capability of Distill, including:
  - Pre-packaged preprocessing methods for filtering, sequencing and packaging time-series event data, with meta, as portable python Dictionaries
  - Pre-packaged graph and time-series modeling methods
  - Limited packaging of statistical and data processing python packages (e.g., NumPy, SciPy, Pandas, etc.)
Allow for customizable user generated python content within Distill
- Distill 0.2.0 will allow users to generate their own libraries for Distill
Allow for processed user log data portability to different environments (e.g., visualization, other analytic environments, i.e., anaconda)

Background and strategic fit

Assumptions

Distill 0.2.0 will operate as a RESTful API
Distill 0.2.0 will require

Requirements

#	Title	Importance
1	Must be able to use custom analytics with Distill	MUST HAVE
2	Must be able to call Distill from server side (for automation) and IDE	MUST HAVE
3	Must be able to accomodate different data streams (beside UserALE), either by design or through instructions for how to build custom schemas	MUST HAVE
4	Libraries must supported through pip (limited or no support for other distros in 0.2.0)
5	Support wheels, eggs for build support on Windows x64 (NO x32)
6	Requires Python 3.6	MUST HAVE
7	OAuth token passing for data endpoint access

Questions

Below is a list of questions to be addressed as a result of this requirements document:

Question	Outcome
How do we accommodate different data schema that allow for multiple data stream?
2. Does Distill require a specific backend (Elastic) or can it go to Solr/Lucene	Underlying data store needs to support key value pairs
3. How do we support Windows Users?	Investigate whether we are using packages that don't build in Windows Integrate testing across platforms
4. How do we provide the "average" data scientist enough out of the box packages, modules to be minimally viable out of the box?
5. Roadmap for supporting packages and Anaconda distribution
6. Migrate to Django from Flask?

Analytics & Processing Examples

These examples are here for drawing out higher-level goals for Distill's functionality. This section can be removed once the goals have been solidified.

Build intervals from matching sequences of raw events
Filter out unwanted events
- Noisy/irrelevant events
  - May be conditional on neighboring events
- "dangling" events (e.g. a stop event with no corresponding start)
Collapse duplicate events into a single event (when is this preferable to creating an interval?)
Create "sandwiches" (a set of events bookended by, e.g., a related start and stop event)

Page tree