Goals
- Allow for scale-ability in analytics framework for SENSSOFT
- Distill 0.2.0 will allow us to grow the incumbent analytical/modeling capability of Distill, including:
- Pre-packaged preprocessing methods for filtering, sequencing and packaging time-series event data, with meta, as portable python Dictionaries
- Pre-packaged graph and time-series modeling methods
- Limited packaging of statistical and data processing python packages (e.g., NumPy, SciPy, Pandas, etc.)
- Distill 0.2.0 will allow us to grow the incumbent analytical/modeling capability of Distill, including:
- Allow for customizable user generated python content within Distill
- Distill 0.2.0 will allow users to generate their own libraries for Distill
- Allow for processed user log data portability to different environments (e.g., visualization, other analytic environments, i.e., anaconda)
Background and strategic fit
Assumptions
- Distill 0.2.0 will operate as a RESTful API
- Distill 0.2.0 will require
Requirements
# | Title | User Story | Importance | Notes |
---|---|---|---|---|
1 | Must be able to use custom analytics with Distill | MUST HAVE | ||
2 | Must be able to call Distill from server side (for automation) and IDE | MUST HAVE | ||
3 | Must be able to accomodate different data streams (beside UserALE), either by design or through instructions for how to build custom schemas | MUST HAVE | ||
4 | Libraries must supported through pip (limited or no support for other distros in 0.2.0) | |||
5 | Support wheels, eggs for build support on Windows x64 (NO x32) | |||
6 | Requires Python 3.6 | MUST HAVE | ||
7 | OAuth token passing for data endpoint access |
Questions
Below is a list of questions to be addressed as a result of this requirements document:
Question | Outcome |
---|---|
| |
2. Does Distill require a specific backend (Elastic) or can it go to Solr/Lucene | Underlying data store needs to support key value pairs |
3. How do we support Windows Users? |
|
4. How do we provide the "average" data scientist enough out of the box packages, modules to be minimally viable out of the box? | |
5. Roadmap for supporting packages and Anaconda distribution | |
6. Migrate to Django from Flask? |
Analytics & Processing Examples
These examples are here for drawing out higher-level goals for Distill's functionality. This section can be removed once the goals have been solidified.
- Build intervals from matching sequences of raw events
- Filter out unwanted events
- Noisy/irrelevant events
- May be conditional on neighboring events
- "dangling" events (e.g. a stop event with no corresponding start)
- Noisy/irrelevant events
- Collapse duplicate events into a single event (when is this preferable to creating an interval?)
- Create "sandwiches" (a set of events bookended by, e.g., a related start and stop event)