Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Description

DataLab is an essential toolset for analytics. It is a self-service Web Console, used to create and manage exploratory environments. It allows teams to spin up analytical environments with best of breed open-source tools just with a single click of the mouse. Once established, environment can be managed by an analytical team itself, leveraging simple and easy-to-use Web Interface.

Livesearch
spaceKeyDLAB
sizelarge
additionalpage excerpt
placeholderSearch for a solution
typepage

Features

  1. Self-management Web Interface (Control Panel) to allowing Data Science teams to:
  2. Ability to authenticate into DataLab (through KeyCloak and AWS (customizable))
  3. Easily create gateway node (also stop and rectreate), to establish tunnel to a secured analytical environment
  4. Ability to manage (add/stop/terminate) analytical environment on AWS/GCP/MS Azure based on following templates:
  • Deep Learning (Jupyter + MXNet, Caffe2, TensorFlow, CNTK, Theano, PyTorch and Keras)
  • Jupyter (with pyspark2, pyspark3, scala, sparkR interpreters)
  • Zeppelin (with pyspark2, pyspark3, scala, sparkR interpreters)
  • RStudio (with sparkR)
  • RStudio with TensorFlow (implemented on AWS)
  • Jupyter with TensorFlow
  • JupyterLab
  • Superset (implemented on GCP)

       5. Role based access – allowing certain groups of users can see analytical templates applicable to them only

       6. Ability to manage (add/terminate) computational resources by connecting analytical tools with Data Engine (Standalone Apache Spark cluster) and Data Engine Services (EMR for AWS/Dataproc for GCP)

  • Support Data Engine for all templates except JupyterLab and Superset
  • Support Data Engine Service for all templates except RStudio with TensorFlow/Jupyter with TensorFlow/Deep Learning
  • Ability create Data Engine Service based on spot instances/preemptible node

       7. Bidirectional sync of environment status between DataLab and AWS/MS Azure and GCP

       8. Billing functionality – ability to see the costs for my analytical environment

       9. Audit functionality - ability to view change history, which have been done by any user

      10. Multiple Cloud Endpoints - ability to connect any of Cloud endpoints: AWS, GCP, Azure 

      11. Scheduler - scheduler component allows to automatically schedule Start and Stop triggers for a Notebook



Mailing lists:

  • dev@datalab.incubator.apache.org
  • commits@datalab.incubator.apache.org
  • private@datalab.incubator.apache.org


Browse by topic

Labels List
excludedLabelskb-how-to-article,kb-troubleshooting-article

Recently updated articles

Recently Updated
typespage,blogpost
hideHeadingtrue