Description
DataLab is an essential toolset for analytics. It is a self-service Web Console, used to create and manage exploratory environments. It allows teams to spin up analytical environments with best of breed open-source tools just with a single click of the mouse. Once established, environment can be managed by an analytical team itself, leveraging simple and easy-to-use Web Interface.
Livesearch | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Features
- Self-management Web Interface (Control Panel) to allowing Data Science teams to:
- Ability to authenticate into DataLab (through KeyCloak and AWS (customizable))
- Easily create gateway node (also stop and rectreate), to establish tunnel to a secured analytical environment
- Ability to manage (add/stop/terminate) analytical environment on AWS/GCP/MS Azure based on following templates:
- Deep Learning (Jupyter + MXNet, Caffe2, TensorFlow, CNTK, Theano, PyTorch and Keras)
- Jupyter (with pyspark2, pyspark3, scala, sparkR interpreters)
- Zeppelin (with pyspark2, pyspark3, scala, sparkR interpreters)
- RStudio (with sparkR)
- RStudio with TensorFlow (implemented on AWS)
- Jupyter with TensorFlow
- JupyterLab
- Superset (implemented on GCP)
5. Role based access – allowing certain groups of users can see analytical templates applicable to them only
6. Ability to manage (add/terminate) computational resources by connecting analytical tools with Data Engine (Standalone Apache Spark cluster) and Data Engine Services (EMR for AWS/Dataproc for GCP)
- Support Data Engine for all templates except JupyterLab and Superset
- Support Data Engine Service for all templates except RStudio with TensorFlow/Jupyter with TensorFlow/Deep Learning
- Ability create Data Engine Service based on spot instances/preemptible node
7. Bidirectional sync of environment status between DataLab and AWS/MS Azure and GCP
8. Billing functionality – ability to see the costs for my analytical environment
9. Audit functionality - ability to view change history, which have been done by any user
10. Multiple Cloud Endpoints - ability to connect any of Cloud endpoints: AWS, GCP, Azure
11. Scheduler - scheduler component allows to automatically schedule Start and Stop triggers for a Notebook
Apache pages:
Useful materials:
Status:
Mailing lists:
- dev@datalab.incubator.apache.org
- commits@datalab.incubator.apache.org
- private@datalab.incubator.apache.org
Browse by topic
Labels List | ||
---|---|---|
|
Recently updated articles
Recently Updated | ||||
---|---|---|---|---|
|