Abstract
Linkis builds a computation middleware layer to decouple the upper applications and the underlying data engines, provides standardized interfaces (REST, JDBC, WebSocket etc.) to easily connect to various underlying engines (Spark, Presto, Flink, etc.), while enables cross engine context sharing, unified job& engine governance and orchestration.
Linkis codebase: https://github.com/WeBankFinTech/Linkis
Proposal
Linkis is designed to solve computation governance problems in complex distributed environments (typically in a big data platform), where you have to deal with different types, versions, or clusters of underlying data engines and hundreds of diversified engine clients at the upper application layer as well.
Linkis acts as a proxy between the upper applications layer and underlying engines layer. By abstracting and implementing the 3 common phases of a job/request for submit, prepare and execute, Linkis is able to facilitate the connectivity, governance and orchestration capabilities of different kind of engines like OLAP, OLTP (developing), Streaming, and handle all these "computation governance" affairs in a standardized reusable way.
We are actively operating the Linkis community and we are looking forward to increase community activity continuously.
We propose to contribute the Linkis codebase to the Apache Software Foundation. We believe that bringing Linkis into Apache Software Foundation and following the COMMUNITY-LED DEVELOPMENT "APACHE WAY" could continuously improve project quality and community vitality.
Background
In today's complex and distributed environment, the communication, coordination and governance of application services have developed mature solutions from SOA to micro-services, and many practices from ESB to Service Mesh to decouple different services.
However, things go different while an application service needs to communicate with the underlying engines. Engines are isolated from each other, and the client-server tight coupling pattern goes everywhere. Each and every upper application has to directly connect to and access various underlying engines in a tightly coupled way, and solves the "computation governance" problems on its own, including maintaining different client environments, submiting the job, monitoring job status, fetching the output, handling large number of concurrent client instances, watching the bad jobs, adapt to engine version changes, etc.
It lacks a common layer of "computation middleware" between the numerous upper-layer applications and the countless underlying engines to handle all these "computation governance" affairs in a standardized reusable way, that's why we started the Linkis project.
Firstly, Linkis could reduce the complexity of connectivity. Instead of maintaining a variety of engine client environments, users now only need to install the Linkis client, or even just HTTP client while using the REST interface. Routing query to desired clusters could be done by simply providing a tag.
Secondly, Links provides governance capabilities such as multi-tenancy, concurrency control, resource management, query validation, privilege enhancement and auditing.
Meanwhile, Linkis enables orchestration strategies such as routing, load-balance, active-active and hybrid computation across engines (some still under development).
Rationale
Linkis is built on distributed microservice architecture with great scalability and extendibility. The enhancements of high concurrency and fault tolerance make it more stable and reliable. It has already supported many production environments with large number of daily jobs over a long term.
Linkis's microservices are divided into 3 groups: Computation Governance Services, Public Enhancement Services, and Microservice Governance Services.
Computation Governance Services(CGS) group is responsible for the core process of job/request submission, preparation and execution, lifecycle management, resource management, validation and orchestration.
Public enhancement Services(PES) group provides basic public functions including job context sharing, material management and data source management, to serve other Linkis services and upper application systems.
Microservice Governance Services(MGS) group includes customized Spring Cloud Gateway, Eureka and Open Feign, to provide basic functions like routing, service registration and discovery, and RPC framework.
By providing capabilities of multi-tenant, high concurrency, job dispatching/management policies, unified resource control and orchestration, Linkis makes the submission, preparation and execution of computation jobs more flexible, reliable and controllable, and successfully return the results. It could greatly reduce the overall development, operation and maintenance costs, and the architecture complexity.
Based on Linkis the computation middleware, new upper layer applications could be quickly developed by reusing the Linkis computation governance functions, as what’s done in the open source big data platform suite “WeDataSphere” (https://github.com/WeBankFinTech/WeDataSphere).
Linkis currently mainly supports OLAP and Streaming engines, and we are planning to support OLTP engines better. Containerization is also one of the important development directions of Linkis.
Initial Goal
- Migrate the existing codebase, website, and documentation to Apache-hosted infrastructure.
- Work with the infrastructure team to implement and approve our code review, build, and testing workflows in the context of the ASF.
- Incremental development and release under Apache guidelines.
- Grow and diversify the Linkis community in the Apache Way.
Current Status
Meritocracy
Linkis project was started at WeBank and has been an open-source project on GitHub since July 2019. Linkis has been quickly adopted by many organizations, more than 500 organizations have tested Linkis based on our sandbox application records, dozens of them have introduced Linkis into production based on the users’ spontaneous feedbacks, distributed in various industries including banking, telecommunications, insurance, manufacturing, education, internet, etc.
Linkis already has contributors and users from different companies. We’ve set up the Committer team and we’re constantly seeking for potential new committer. New Contributors are always highly welcomed and guided by existed committers. Users could get timely support from community IM groups and GitHub.
Community
Linkis now has 15 committers from 6 companies including WeBank, China Telecom, Kanzhun Ltd., iQIYI Inc., HONOR Mobile Phone, and Samoyed Digital. We have a developer IM group for more than 100 people from different organizations, and 9 user IM groups for more than 4,500 people.
Core Developers
The core developers of Linkis are working in the big data team of different companies, mainly in WeBank since the project was initiated there.
- Shuai Di (WeBank)
- Qiang Yin (WeBank)
- Heping Wang (WeBank)
- Yongkun Yang (WeBank)
- Zhiyue Yang (WeBank)
- You Liu (WeBank)
- XiaoGang Wang (China Telecom)
- Hui Zhu (Kanzhun)
- Zheng Wang (iQiyi)
- Rong Zhang (Honor)
Releases
Linkis has released multiple versions as listed here: https://github.com/WeBankFinTech/Linkis/releases
We will follow the ASF guidelines more closely, and adopt the ASF source release process upon joining the incubator.
Code Reviews
Linkis’s code reviews are currently public on Github: https://github.com/WeBankFinTech/Linkis/pulls .
Alignment
As Linkis was built to address connectivity and other computation governance issues with various underlying engines, it depends on multiple ASF projects such as Spark, Flink, Hive and Hadoop. Linkis’s Engine Connector Manager service will start different Engine Connectors to connect to different underlying engines, providing computation governance abilities which benefits the usage and maintenance of these engines. Linkis will continue to expand the types of engines it supports in ASF projects, such as HBase, Kylin, and more.
Known Risks
Orphaned Products
The risk of Linkis becoming an orphan product is very low, because it’s already been the core infrastructure component in the production environments of dozens of companies' big data platforms, including large companies like WeBank, China Telecom, Ping An Insurance Company, Hikvision, etc. Hundreds of thousands of computation jobs are performed through Linkis in these companies everyday. Developers from these companies are increasingly joining the Linkis community as contributors.
Linkis has 12 major releases so far, and received 355 PRs from contributors, which indicates the activity and vitality of the Linkis community. Linkis is also the core component of the open source big data platform suite “WeDataSphere”, even more users and developers are already active in this larger community. We are looking forward to further expand and diversify the community by joining Apache. We are also futher improving the adherence to the Community-Led development pattern, and the standardization and transparency of community governance.
Inexperience with Open Source
Linkis’s core developers have been running Linkis as a community-oriented open source project for a period of time, some of them already have experience working with other open source communities. The current Linkis user group scale of more than 4500 people is also a proof of our commitment and passion for operating the open source community.
Meanwhile, we’ve begun to refine our community governance efforts under the guidance of Apache mentors, and we’ll learn more about how to operate the open source community effectively and properly by following the Apache way in our incubator journey.
Homogenous Developers
Most of the current core developers work at WeBank where the Linkis project started. We also had developers from China Telecom, Kanzhun, iQiyi and Honor Mobile Phone elected to the committer group, and already have led the release of several versions of Linkis. Samoyed Digital has the latest nominated committer because of their solid contributions to Linkis data source management module.
Though Linkis community may not be diverse enough yet, we are constantly looking for new contributors and potential committers to enhance the diversity of the community and the vitality of the project.
An Excessive Fascination with the Apache Brand
We acknowledge that the Apache brand would add a lot of value and reputation to Linkis, and will benefit the cooperation and promotion at the global scale. However, our primary purpose is to build a more diverse and viable community and to gain stability for long-term development as submitting Linkis to Apache. We will also strictly follow the ASF's rules and policies under the guidance of the Incubator PMC.
Documentation
Documentation about Linkis can be found at https://github.com/WeBankFinTech/Linkis-Doc . Following links provide more information:
- Codebase at Github: https://github.com/WeBankFinTech/Linkis
- Issue Tracking: https://github.com/WeBankFinTech/Linkis/issues
- Releases: https://github.com/WeBankFinTech/Linkis/releases
Initial Source
https://github.com/WeBankFinTech/Linkis
External Dependencies
Back-end:
Dependencies | License | Comment |
caffeine | Apache 2.0 | |
cglib | Apache 2.0 | |
commons-beanutils | Apache 2.0 | |
commons-codec | Apache 2.0 | |
commons-collections | Apache 2.0 | |
commons-dbcp | Apache 2.0 | |
commons-exec | Apache 2.0 | |
commons-io | Apache 2.0 | |
commons-lang3 | Apache 2.0 | |
commons-math3 | Apache 2.0 | |
commons-net | Apache 2.0 | |
commons-text | Apache 2.0 | |
dozer-core | Apache 2.0 | |
druid | Apache 2.0 | |
fastjson | Apache 2.0 | |
gson | Apache 2.0 | |
guava | Apache 2.0 | |
hadoop-auth | Apache 2.0 | |
hadoop-client | Apache 2.0 | |
hadoop-common | Apache 2.0 | |
hadoop-hdfs | Apache 2.0 | |
hadoop-yarn-client | Apache 2.0 | |
hive-common | Apache 2.0 | |
hive-exec | Apache 2.0 | |
hive-jdbc | Apache 2.0 | |
httpclient | Apache 2.0 | |
httpmime | Apache 2.0 | |
jackson-annotations | Apache 2.0 | |
jackson-databind | Apache 2.0 | |
jackson-module-scala | Apache 2.0 | |
javacsv | LGPL | |
jaxrs-ri | CDDL, GPL 1.1 | will remove |
jersey-container-servlet | CDDL, GPL 1.1 | will remove |
jersey-container-servlet-core | CDDL, GPL 1.1 | will remove |
jersey-entity-filtering | CDDL, GPL 1.1 | will remove |
jersey-json | CDDL, GPL 1.1 | will remove |
jersey-media-json-jackson | CDDL, GPL 1.1 | will remove |
jersey-media-multipart | CDDL, GPL 1.1 | will remove |
jersey-server | CDDL, GPL 1.1 | will remove |
jersey-servlet | CDDL, GPL 1.1 | will remove |
jersey-spring3 | CDDL, GPL 1.1 | will remove |
jetty-server | Apache 2.0, EPL 1.0 | |
jetty-webapp | Apache 2.0, EPL 1.0 | |
json4s-jackson | Apache 2.0 | |
jsp-api | CDDL, GPL 2.0 | will remove |
junit | EPL 1.0 | |
libthrift | Apache 2.0 | |
log4j-1.2-api | Apache 2.0 | |
log4j-api | Apache 2.0 | |
log4j-core | Apache 2.0 | |
log4j-slf4j-impl | Apache 2.0 | |
mockito-all | MIT | |
mybatis-plus-boot-starter | Apache 2.0 | |
mysql-connector-java | GPL 2.0 | will remove |
netty-all | Apache 2.0 | |
pagehelper | MIT | |
poi-ooxml | Apache 2.0 | |
protostuff-api | Apache 2.0 | |
protostuff-core | Apache 2.0 | |
protostuff-runtime | Apache 2.0 | |
py4j | BSD 2-clause | |
reactor-netty | Apache 2.0 | |
reflections | BSD 2-clause | |
scalacheck | BSD 3-clause | |
scalacheck-shapeless | Apache 2.0 | |
scala-compiler | Apache 2.0 | |
scala-library | Apache 2.0 | |
scalamock-scalatest-support | MIT | |
scalap | Apache 2.0 | |
scala-reflect | Apache 2.0 | |
scalatest | Apache 2.0 | |
slf4j-api | MIT | |
spark-core | Apache 2.0 | |
spark-hive | Apache 2.0 | |
spark-repl | Apache 2.0 | |
spark-sql | Apache 2.0 | |
spark-testing-base | Apache 2.0 | |
spoiwo | MIT | |
spring-boot | Apache 2.0 | |
spring-boot-actuator-autoconfigure | Apache 2.0 | |
spring-boot-starter | Apache 2.0 | |
spring-boot-starter-actuator | Apache 2.0 | |
spring-boot-starter-aop | Apache 2.0 | |
spring-boot-starter-cache | Apache 2.0 | |
spring-boot-starter-jetty | Apache 2.0 | |
spring-boot-starter-log4j2 | Apache 2.0 | |
spring-boot-starter-quartz | Apache 2.0 | |
spring-boot-starter-reactor-netty | Apache 2.0 | |
spring-boot-starter-web | Apache 2.0 | |
spring-cloud-commons | Apache 2.0 | |
spring-cloud-config-client | Apache 2.0 | |
spring-cloud-context | Apache 2.0 | |
spring-cloud-gateway-core | Apache 2.0 | |
spring-cloud-starter | Apache 2.0 | |
spring-cloud-starter-config | Apache 2.0 | |
spring-cloud-starter-gateway | Apache 2.0 | |
spring-cloud-starter-netflix-eureka-client | Apache 2.0 | |
spring-cloud-starter-netflix-eureka-server | Apache 2.0 | |
spring-cloud-starter-openfeign | Apache 2.0 | |
spring-core | Apache 2.0 | |
spring-jdbc | Apache 2.0 | |
spring-security-crypto | Apache 2.0 | |
spring-test | Apache 2.0 | |
spring-tx | Apache 2.0 | |
spring-web | Apache 2.0 | |
websocket-client | Apache 2.0, EPL 1.0 | |
websocket-server | Apache 2.0, EPL 1.0 | |
xlsx-streamer | Apache 2.0 | |
xstream | BSD 3-clause |
Front-end:
axios | MIT | |
highlight.js | BSD-3-Clause | |
iview | MIT | |
lodash | MIT | |
moment | MIT | |
monaco-editor | MIT | |
sql-formatter | MIT | |
svgo | MIT | |
vue | MIT | |
vue-i18n | MIT | |
vue-router | MIT | |
vuedraggable | MIT | |
vuescroll | MIT |
Required Resources
Mailing List
Currently Linkis has no mailing list. The usual mailing lists are expected to be set up when entering incubation:
- private@linkis.incubator.apache.org for PPMC discussions;
- dev@linkis.incubator.apache.org for development discussions;
- notification@linkis.incubator.apache.org for user notifications, and notifications from GitHub.
Git Repositories
Upon entering incubation, we request to move the existing repository from https://github.com/WeBankFinTech/Linkis to Apache infrastructure like https://github.com/apache/Incubator-Linkis.
Issue Tracking
The Linkis community would like to continue using GitHub Issues if possible.
Other Resources
Apache Jenkins
Source and Intellectual Property Submission Plan
Most of the current code is Apache 2.0 licensed and the copyright is assigned to WeBank. If the project enters incubator, WeBank will transfer the source code & trademark ownership to ASF via a Software Grant Agreement.
Initial Committers
- Shuai Di (shuaidi1024@gmail.com)
- Qiang Yin (enjoyyin91@gmail.com)
- Heping Wang (wpeace1212@gmail.com)
- Yongkun Yang (wimkunkun@gmail.com)
- Zhiyue Yang (zjyzy19920513@gmail.com)
- You Liu (liuyou181020@gmail.com)
- Deyi Hua (david_hua1996@hotmail.com)
- Le Bai (blgg931026@gmail.com)
- Xiaogang Wang (Adamyuanyuan@gmail.com)
- Hui Zhu (huashuizhuhui@gmail.com)
- Zhen Wang (wangzhen077@gmail.com)
- Rong Zhang (brian.rongzhang@gmail.com)
- Xiaohua Yi (yixiaohuamax@gmail.com)
- Ke Zhou (bleachzk@gmail.com)
- Jian Xie (Jackyxxie@gmail.com)
Affiliations
Shuai Di, Qiang Yin, Heping Wang, Yongkun Yang, Zhiyue Yang, You Liu, Deyi Hua, Le Bai, Ke Zhou and Jian Xie of the initial committers are employees of WeBank.
Xiaogang Wang of the initial committers is an employee of China Telecom.
Hui Zhu of the initial committers is an employee of Kanzhun.
Zhen Wang of the initial committers is an employee of iQiyi.
Rong Zhang of the initial committers is an employee of HONOR Mobile Phone.
Xiaohua Yi of the initial committers is an employee of Samoyed Digital.
Sponsors
Champion
Junping_Du (ASF Member, IPMC Member), junping_du@apache.org
Nominated Mentors
Duo Zhang (ASF Member, IPMC Member), zhangduo@apache.org
Jerry Shao (ASF Member, IPMC Member), jshao@apache.org
Junping_Du (ASF Member, IPMC Member), junping_du@apache.org
Lidong Dai (IPMC Member), lidongdai@apache.org
Shao Feng Shi (ASF Member, IPMC Member), shaofengshi@apache.org
Sponsoring Entity
We request the Apache Incubator to sponsor this project.