You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Target release0.2.0
Version

 

Document status

DRAFT

Document owner
Designer
Developers

Todd Nelling, Josie Yip

QAMichelle Beard

Goals

  • Allow for scale-ability in analytics framework for SENSSOFT
    • Distill 0.2.0 will allow us to grow the incumbent analytical/modeling capability of Distill, including:
      • Pre-packaged preprocessing methods for filtering, sequencing and packaging time-series event data, with meta, as portable python Dictionaries
      • Pre-packaged graph and time-series modeling methods
      • Limited packaging of statistical and data processing python packages (e.g., NumPy, SciPy, Pandas, etc.)
  • Allow for customizable user generated python content within Distill
    • Distill 0.2.0 will allow users to generate their own libraries for Distill
  • Allow for processed user log data portability to different environments (e.g., visualization, other analytic environments, i.e., anaconda) 

Background and strategic fit

Assumptions

  • Distill 0.2.0 will operate as a RESTful API
  • Distill 0.2.0 will require 

 

Requirements

#TitleUser StoryImportanceNotes
1Must be able to use custom analytics with Distill 

MUST HAVE

 
2Must be able to call Distill from server side (for automation) and IDE MUST HAVE 
3Must be able to accomodate different data streams (beside UserALE), either by design or through instructions for how to build custom schemas MUST HAVE 
4Libraries must supported through pip (limited or no support for other distros in 0.2.0)   
5Support wheels, eggs for build support on Windows x64 (NO x32)   
6Requires Python 3.6 MUST HAVE 
7OAuth token passing for data endpoint access   

Questions

Below is a list of questions to be addressed as a result of this requirements document:

QuestionOutcome
  1. How do we accommodate different data schema that allow for multiple data stream?
 

2. Does Distill require a specific backend (Elastic) or can it go to Solr/Lucene

Underlying data store needs to support key value pairs
3. How do we support Windows Users?
  • Investigate whether we are using packages that don't build in Windows
  • Integrate testing across platforms
4. How do we provide the "average" data scientist enough out of the box packages, modules to be minimally viable out of the box? 
5. Roadmap for supporting packages and Anaconda distribution 
6. Migrate to Django from Flask? 

Analytics & Processing Examples

These examples are here for drawing out higher-level goals for Distill's functionality. This section can be removed once the goals have been solidified.

  • Build intervals from matching sequences of raw events
  • Filter out unwanted events
    • Noisy/irrelevant events
      • May be conditional on neighboring events
    • "dangling" events (e.g. a stop event with no corresponding start)
  • Collapse duplicate events into a single event (when is this preferable to creating an interval?)
  • Create "sandwiches" (a set of events bookended by, e.g., a related start and stop event)

Not Doing

  • No labels