Abstract

Celeborn is an intermediate data service for big data compute engines (i.e. ETL, OLAP, and Streaming engines) to boost performance, stability, and flexibility. Intermediate data typically include shuffle and spilled data.

Proposal

Celeborn provides a high-performance intermediate data service for big data compute engines. Instead of writing to local disks, compute engines could push shuffle and spilled data to Celeborn workers, freeing compute nodes from the dependency on large local disks. In addition, Celeborn re-arranges the layout of the intermediate data to be more IO-friendly, thus improving performance and stability.

Background

Intermediate data typically include shuffle and spilled data, and it's normally a major cause of inefficiency, instability, and inflexibility in the lifecycle of a distributed job.

Shuffle is one of the most resource-consuming operators in big-data processing. The existing pull-style implementations have several drawbacks:

Rely on local disks, prohibiting the adoption of disaggregated architecture, and limiting elasticity.
Massive random access to disks, easily saturates IOPS capability, causing inefficiency and instability.

Spilled data is another source of intermediate data. Taking Spark as an example, it spills in-memory data to disks whenever memory limitation is reached (i.e. sort, aggregation). It's necessary to consider spilled data to support disaggregation and free compute nodes from large local disks.

To solve the above problems, Celeborn takes over both kinds of intermediate data by providing push-style APIs, re-arranging data layout to be more IO friendly, and leveraging DFS and object store to enable massive storage space.

Rationale

Disaggregated architecture has several benefits like flexible provision of computing and storage nodes separately, better elasticity, and more cost-effectiveness. However, in a lot of cases, intermediate data depends on large local disks, which prohibits shifting to such architecture. Thus, taking over all kinds of intermediate data, i.e. shuffle and spilled data is one of the key motivations of Celeborn.

Another key motivation of Celeborn is to solve inefficiency and instability issues caused by the existing pull-style shuffle, as described in the former section.

Celeborn solves the problems through the following core designs:

Push-based shuffle plus partition data aggregation to turn random IO access into sequential access.
FileSystem-like API to support writing spilled data.
Hierarchical storage from memory to DFS/object store to enable fast cache and massive storage space.
Engine-irrelevant APIs for easy integrating to various engines.
Extended fault tolerance and data replication(optional) to increase reliability.

Celeborn is currently adopted in the production environment at both Alibaba and many other companies, serving petabytes of data per day, which demonstrates its capability of resolving real-world problems.

Initial Goals

Although many of the core features of Celeborn have been developed and verified in the production environment (mainly Spark jobs), there is still a lot to evolve, such as the following requirements from users:

Extension: support more engines of various categories, i.e. Streaming engines (Apache Flink), OLAP engines, and engines written in python and native languages.
Completeness: implement the support for spilled data and possibly other kinds of intermediate data.
Enhancement: rolling upgrade, multi-tenant, layered storage, columnar shuffle, enhanced performance and fault tolerance, etc.

Current Status

Meritocracy

Celeborn was started at Alibaba in 2020 with the project name "RemoteShuffleService" and open-sourced in 2021. Since then, Celeborn has gained strong interest from dozens of companies and individuals. Roadmaps, issues, and design docs are accessible to everyone and discussed across developers. Celeborn already has active contributors from different organizations.

We encourage community members to participate, and those who invest time and talent in the project will have privileges, authority, and influence over decisions. We will try our best to build an environment that supports meritocracy. We believe the community and project will grow better if we run in the Apache Way.

Community

Celeborn has been building a community around users and contributors since it was open-sourced. We have organized several online meetups with data-infra teams from various well-known companies, and we have a dozen of contributors out of Alibaba, some of them are among the most active contributors. We frequently have online discussions with our contributors. By bringing Celeborn to Apache, we believe we can grow the base of contributors by inviting and encouraging more contributors in The Apache Way.

Users

Celeborn is adopted as one of the components of the E-MapReduce (EMR) product in Alibaba Cloud, and a dozen of customers are using it. Beyond that, we have even more users irrelevant to EMR who got in touch with us through the community, including Shopee, NetEase, Bilibily, BOSS, and Synnex, to name a few. Most of our users have made contributions to the project.

Core Developers

Keyong Zhou. He is the founder of this project, from Alibaba. (GitHub ID: waitinfuture)
Ethan Feng. He loves open source and coding, from Alibaba. (GitHub ID: FMX)
Kerwin Zhang. He is a developer of the project, from Alibaba. (GitHub ID: kerwin-zk)
AngersZhuuuu. He is Apache Spark/YARN/HADOOP Contributor, an individual open-source enthusiast, from Shopee. (GitHub ID: AngersZhuuuu)
Cheng Pan. He is the Apache Kyuubi (Incubating) PPMC member, Apache Iceberg/Spark Contributor, from NetEase. (GitHub ID: pan3793)

Alignment

Celeborn aims to store and serve intermediate data for big-data compute engines, and many of the projects are from Apache, such as Apache Spark, Apache Flink, Apache Hive, etc. Celeborn currently mainly supports Spark, and we plan to support Flink and others in the near future. Celeborn is already under Apache License 2.0, and many of the core developers have experience on Apache projects.

Known Risks

Project Name

Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy stories. "Celeb" is the sindarin for "silver", while "orn" is that for "tree", and we think it's cool to introduce this fantasy tree to the Apache world. Based on our search results, the term Celeborn is not used as a trademark under any class, so it is legal to use it as our project name.

Orphaned products

Celeborn is vastly adopted in Alibaba and is a built-in component in Alibaba Cloud's E-MapReduce product. Developers in Alibaba will keep improving the project to ensure it will fulfill the current and to-emerge requirements. Many organizations outside Alibaba are also using Celeborn to enhance their production jobs, and several of them have already participated in. We are now actively operating the community and will continue to increase the vitality of the community to attract more contributors to the community.

Inexperience with Open Source

Many of the Celeborn contributors have experience working on open source projects, and by working with our mentors and the Apache community we believe we will be able to conduct ourselves in accordance with Apache Incubator guidelines.

Homogenous Developers

The current contributors are across various organizations, including Alibaba, Shopee, NetEase, Xiaomi, Bilibily, BOSS, Xiaohongshu, etc. We are committed to recruiting additional committers based on their contributions to the project.

Reliance on Salaried Developers

Most of the developers are paid by their employers to contribute to this project. Given some volunteer developers and the committers' sense of ownership of the code, the project could continue even if no salaried developers contributed to the project.

Relationships with Other Apache Products

Celeborn was originally built to enhance Apache Spark and uses Apache Ratis for high availability, and Apache Hadoop for layered storage. Celeborn also uses Apache Commons and Apache Logging. In the future, as Celeborn supports more engines, it will cooperate with Apache Flink, Apache Hive, etc.

An Excessive Fascination with the Apache Brand

We believe the Apache way, not the brand, will help Celeborn grow and persist. We hope to make sure that a very inclusive, diverse, and meritocratic community is built outside the umbrella of a single company.

Documentation

Documentation can be found on the GitHub Pages.

Initial Source

The initial source code for Celeborn is hosted at https://github.com/alibaba/RemoteShuffleService.

The project name is still not renamed and we will rename it from RemoteShuffleService to Celeborn after the incubation proposal is approved.

Source and Intellectual Property Submission Plan

As soon as Celeborn is approved to join Apache Incubator, our initial committers will submit iCLA(s), SGA, and CCLA(s). The codebase is already licensed under Apache License 2.0.

We will also deprecate the initial source repository and redirect it to the new incubator project repository after approved.

External Dependencies

Apache Licence 2.0

com.fasterxml.jackson.core:jackson-annotations
com.fasterxml.jackson.core:jackson-core
com.fasterxml.jackson.core:jackson-databind
com.fasterxml.jackson.jaxrs:jackson-jaxrs-base
com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider
com.fasterxml.jackson.module:jackson-module-jaxb-annotations
com.fasterxml.woodstox:woodstox-core
com.github.stephenc.jcip:jcip-annotations
com.google.code.findbugs:jsr305
com.google.code.gson:gson
com.google.guava:guava
com.nimbusds:nimbus-jose-jwt
com.squareup.okhttp:okhttp
com.squareup.okio:okio
commons-beanutils:commons-beanutils
commons-cli:commons-cli
commons-codec:commons-codec
commons-collections:commons-collections
commons-io:commons-io
commons-net:commons-net
io.dropwizard.metrics:metrics-core
io.dropwizard.metrics:metrics-graphite
io.dropwizard.metrics:metrics-jvm
io.netty:netty-all
io.netty:netty-buffer
io.netty:netty-codec
io.netty:netty-codec-dns
io.netty:netty-codec-haproxy
io.netty:netty-codec-http
io.netty:netty-codec-http2
io.netty:netty-codec-memcache
io.netty:netty-codec-mqtt
io.netty:netty-codec-redis
io.netty:netty-codec-smtp
io.netty:netty-codec-socks
io.netty:netty-codec-stomp
io.netty:netty-codec-xml
io.netty:netty-common
io.netty:netty-handler
io.netty:netty-handler-proxy
io.netty:netty-resolver
io.netty:netty-resolver-dns
io.netty:netty-resolver-dns-classes-macos
io.netty:netty-transport
io.netty:netty-transport-classes-epoll
io.netty:netty-transport-classes-kqueue
io.netty:netty-transport-native-unix-common
io.netty:netty-transport-rxtx
io.netty:netty-transport-sctp
io.netty:netty-transport-udt
net.minidev:accessors-smart
net.minidev:json-smart
org.apache.avro:avro
org.apache.commons:commons-compress
org.apache.commons:commons-configuration2
org.apache.commons:commons-crypto
org.apache.commons:commons-lang3
org.apache.commons:commons-math3
org.apache.commons:commons-text
org.apache.curator:curator-client
org.apache.curator:curator-framework
org.apache.curator:curator-recipes
org.apache.hadoop:hadoop-annotations
org.apache.hadoop:hadoop-auth
org.apache.hadoop:hadoop-client
org.apache.hadoop:hadoop-common
org.apache.hadoop:hadoop-hdfs-client
org.apache.hadoop:hadoop-mapreduce-client-common
org.apache.hadoop:hadoop-mapreduce-client-core
org.apache.hadoop:hadoop-mapreduce-client-jobclient
org.apache.hadoop:hadoop-yarn-api
org.apache.hadoop:hadoop-yarn-client
org.apache.hadoop:hadoop-yarn-common
org.apache.hadoop.thirdparty:hadoop-shaded-guava
org.apache.hadoop.thirdparty:hadoop-shaded-protobuf_3_7
org.apache.htrace:htrace-core4:
org.apache.httpcomponents:httpclient
org.apache.httpcomponents:httpcore
org.apache.kerby:kerb-admin
org.apache.kerby:kerb-client
org.apache.kerby:kerb-common
org.apache.kerby:kerb-core
org.apache.kerby:kerb-crypto
org.apache.kerby:kerb-identity
org.apache.kerby:kerb-server
org.apache.kerby:kerb-simplekdc
org.apache.kerby:kerb-util
org.apache.kerby:kerby-asn1
org.apache.kerby:kerby-config
org.apache.kerby:kerby-pkix
org.apache.kerby:kerby-util
org.apache.kerby:kerby-xdr
org.apache.kerby:token-provider
org.apache.ratis:ratis-client
org.apache.ratis:ratis-common
org.apache.ratis:ratis-proto
org.apache.ratis:ratis-thirdparty-misc
org.codehaus.jackson:jackson-core-asl
org.eclipse.jetty:jetty-client
org.eclipse.jetty:jetty-http
org.eclipse.jetty:jetty-io
org.eclipse.jetty:jetty-security
org.eclipse.jetty:jetty-servlet
org.eclipse.jetty:jetty-util
org.eclipse.jetty:jetty-util-ajax
org.eclipse.jetty.websocket:websocket-api
org.eclipse.jetty.websocket:websocket-client
org.eclipse.jetty.websocket:websocket-common
org.scala-lang:scala-library
org.scala-lang:scala-reflect
org.xerial.snappy:snappy-java
org.lz4:lz4-java
org.apache.ratis:ratis-grpc
org.apache.ratis:ratis-metrics
org.apache.ratis:ratis-netty
org.apache.ratis:ratis-server
org.apache.ratis:ratis-server-api
commons-codec:commons-codec
org.apache.commons:commons-compress
org.apache.commons:commons-math3
org.apache.commons:commons-text
org.apache.curator:curator-framework
org.apache.curator:curator-recipes
org.xerial.snappy:snappy-java
org.apache.logging.log4j:log4j-1.2-api
org.apache.logging.log4j:log4j-api
org.apache.logging.log4j:log4j-core
org.apache.logging.log4j:log4j-slf4j-impl
org.roaringbitmap:RoaringBitmap
org.roaringbitmap:shims

BSD

org.codehaus.woodstox:stax2-api
org.jline:jline

BSD 3-clause

com.google.protobuf:protobuf-java
org.fusesource.leveldbjni:leveldbjni-all
org.codehaus.janino:commons-compiler

BSD 2-clause

dnsjava:dnsjava

CDDL 1.1

javax.xml.bind:jaxb-api

CDDL + GPLv2

javax.servlet:javax.servlet-api

Eclipse Distribution License 1.0

jakarta.activation:jakarta.activation-api
jakarta.xml.bind:jakarta.xml.bind-api

MIT License

org.slf4j:slf4j-api
org.slf4j:jcl-over-slf4j
org.slf4j:jul-to-slf4j

The Go License

com.google.re2j:re2j

Cryptography

Celeborn does not currently include any cryptography-related code.

Required Resources

Mailing lists

Git Repositories:

github.com/apache/incubator-celeborn

Issue Tracking

The community would like to continue using GitHub Issues (but will be moved to github.com/apache/).

Other Resources

The community has already chosen GitHub actions as continuous integration tools.

Initial Committers

Keyong Zhou (waitinfuture@gmail.com)
Ethan Feng (ethan.aquarius.fmx@gmail.com)
Kerwin Zhang (kerwin.libra.zk@gmail.com)
WU Wei (woo.wei@gmail.com)
AngersZhuuuu (angers.zhu@gmail.com)
Cheng Pan (chengpan@apache.org)

Page tree

Abstract

Proposal

Background

Rationale

Initial Goals

Current Status

Meritocracy

Community

Users

Core Developers

Alignment

Known Risks

Project Name

Orphaned products

Inexperience with Open Source

Homogenous Developers

Reliance on Salaried Developers

Relationships with Other Apache Products

An Excessive Fascination with the Apache Brand

Documentation

Initial Source

Source and Intellectual Property Submission Plan

External Dependencies

Cryptography

Required Resources

Mailing lists

Git Repositories:

Issue Tracking

Other Resources

Initial Committers

Sponsors

Champion

Nominated Mentors

Sponsoring Entity

Page tree

CelebornProposal

Abstract

Proposal

Background

Rationale

Initial Goals

Current Status

Meritocracy

Community

Users

Core Developers

Alignment

Known Risks

Project Name

Orphaned products

Inexperience with Open Source

Homogenous Developers

Reliance on Salaried Developers

Relationships with Other Apache Products

An Excessive Fascination with the Apache Brand

Documentation

Initial Source

Source and Intellectual Property Submission Plan

External Dependencies

Cryptography

Required Resources

Mailing lists

Git Repositories:

Issue Tracking

Other Resources

Initial Committers

Sponsors

Champion

Nominated Mentors

Sponsoring Entity