Abstract
DevLake is a development data platform, providing the data infrastructure for developer teams to analyze and improve their engineering productivity.
Proposal
DevLake helps developer teams integrate and analyze software development data throughout the software development life cycle (SDLC).
Specifically, DevLake provides:
- High-quality connectors that integrate data from various DevOps tools;
- A comprehensive and unified data model and metrics that enable developers to easily query data and answer questions regarding their development process;
- A plugin system that allows customizable data connectors and enrichers;
- An unified ETL management module that orchestrates data collection and analysis.
Background
The initial development of DevLake was done at Merico, which specializes in deep code analysis and engineering intelligence tools for developer teams. DevLake's codebase was recently open sourced on Github. This proposal is for DevLake to join the Apache Incubator.
Rationale
A developer team can be seen as a complex "distributed system" with developers being nodes, communicating and coordinating their efforts to achieve the team goals. When debugging a distributed system, developers often gather and analyze logs from various sources to understand the system and identify the root cause. DevLake aims to be the "debugging tool" for developer teams by bringing all the DevOps data that is scattered in multiple sources into one practical, personalized, and extensible view.
Current Status
Meritocracy
We will build an active, vibrant community. The product roadmap, issues, tech docs will all be accessible to everyone, well organized and openly discussed. By keeping information transparent, we hope to encourage community members to participate.
Contributors who invest time, energy, and talent in the project will have privileges, authority and influence over decisions.
Community
DevLake has 1.3k stars and over 100 forks in GitHub. The first batch of users include development teams from leading tech companies such as PingCAP and Tencent. We have started building our Discord server, meetups and conferences to extend and diversify our community.
Core Developers
DevLake is currently being developed by a team of five engineers at Merico. The total number of code contributors is currently 22. The full contributor list can be found here.
Alignment
DevLake is a component of the big data ecosystem where many projects come from Apache. The DevLake system involves engineering data collection, ETL, storage, and visualization, while the Apache ecosystem contains complete stacks, such as Apache Spark, Apache Hive, Apache Superset, etc.
We are looking forward to working in conjunction with Apache community, to empower development teams with insights from data.
Known Risks
Orphaned Products
First, the core developers will continue to work on DevLake. Additionally, DevLake is already used by PingCAP, Tencent, and all other Merico customers. A large number of user requirements are inspiring DevLake to grow. Therefore, the risk of DevLake being deprecated is minimal.
Inexperience with Open Source
Initial DevLake committers have varying levels of experience using and contributing to Open Source projects, however by working with our mentors and the Apache community we believe we will be able to conduct ourselves in accordance with Apache Incubator guidelines.
Homogenous Developers
The initial commiters are from China, the US and Canada, and are very culturally diverse despite being in the same organization. We expect that once approved for incubation the project will attract contributors from more organizations.
Reliance on Salaried Developers
Initial DevLake committers are salaried developers at Merico. They and other salaried developers will continue with the development of DevLake, and we will make all efforts to attract diverse contributors and volunteers.
Relationships with Other Apache Products
DevLake has a generic data storage layer design that can support Apache Spark, Apache Hive, etc.
An Excessive Fascination with the Apache Brand
We believe the Apache way, not the brand, will help DevLake grow and persist. We hope to make sure that a very inclusive, diverse and meritocratic community is built outside the umbrella of a single company.
Documentation
This proposal exists online as [http://wiki.apache.org/incubator/DevLakeProposal]. Basic build instructions and user documentation are included in the existing GitHub repository.
Initial Source
The DevLake codebase is currently hosted on Github: https://github.com/merico-dev/lake.
Current project website: https://devlake.io/
Source and Intellectual Property Submission Plan
The project is under Apache License Version 2.0. Merico will provide SGA and all committers will sign ICLA after DevLake is accepted into the Incubator.
External Dependencies
No. | Dependency | License |
1 | github.com/bndr/gojenkins v1.1.0 | Apache-2.0 |
2 | github.com/cayleygraph/quad v1.2.4 | Apache-2.0/BSD-2-Clause |
4 | github.com/faabiosr/cachego v0.15.0 | MIT |
5 | github.com/fastwego/feishu v1.0.0-beta.4 | Apache-2.0 |
6 | github.com/gin-contrib/cors v1.3.1 | MIT |
7 | github.com/gin-gonic/gin v1.7.4 | MIT |
8 | MIT | |
9 | MIT | |
10 | github.com/libgit2/git2go/v33 v33.0.6 | MIT |
11 | BSD-3-Clause, BSD-2-Clause | |
12 | MIT | |
13 | MIT | |
14 | github.com/robfig/cron/v3 v3.0.0 | MIT |
15 | github.com/sirupsen/logrus v1.8.1 | MIT |
16 | github.com/spf13/cobra v1.2.1 | MIT,Apache-2.0 |
17 | github.com/spf13/pflag v1.0.6-0.20200504143853-81378bbcd8a1 | BSD-3-Clause |
18 | github.com/spf13/viper v1.8.1 | MIT |
19 | github.com/stretchr/testify v1.7.0 | MIT |
20 | gorm.io/datatypes v1.0.1 | MIT |
21 | gorm.io/driver/mysql v1.1.2 | MIT |
22 | gorm.io/gorm v1.21.13 | MIT |
23 | Apache-2.0 | |
24 | github.com/davecgh/go-spew v1.1.1 | ISC |
25 | github.com/fsnotify/fsnotify v1.5.1 | BSD-3-Clause |
26 | github.com/gin-contrib/sse v0.1.0 | MIT |
27 | github.com/go-playground/locales v0.14.0 | MIT |
28 | MIT | |
29 | MPL-2.0 | |
30 | github.com/gobuffalo/envy v1.7.1 | MIT |
31 | github.com/gobuffalo/logger v1.0.1 | MIT |
32 | github.com/gobuffalo/packd v0.3.0 | MIT |
33 | MIT, BSD-3-Clause, Apache-2.0, BSD-2-Clause | |
34 | github.com/golang/protobuf v1.5.2 | BSD-3-Clause |
35 | github.com/hashicorp/errwrap v1.0.0 | MPL-2.0 |
36 | MPL-2.0 | |
37 | github.com/hashicorp/hcl v1.0.0 | MPL-2.0 |
38 | Apache-2.0 | |
39 | github.com/jinzhu/inflection v1.0.0 | MIT |
40 | github.com/jinzhu/now v1.1.2 | MIT |
41 | github.com/joho/godotenv v1.3.0 | MIT |
42 | github.com/json-iterator/go v1.1.11 | MIT |
43 | github.com/leodido/go-urn v1.2.1 | MIT |
44 | github.com/mattn/go-isatty v0.0.13 | MIT |
45 | github.com/moby/sys/symlink v0.1.0 | Apache-2.0, BSD-3-Clause |
46 | github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd | Apache-2.0 |
47 | Apache-2.0 | |
48 | github.com/pelletier/go-toml v1.9.3 | Apache-2.0 OR MIT, MIT, Apache-2.0 |
49 | github.com/pkg/errors v0.9.1 | BSD-2-Clause |
50 | BSD-3-Clause | |
51 | BSD-3-Clause | |
52 | github.com/spf13/afero v1.6.0 | Apache-2.0 |
53 | github.com/spf13/cast v1.4.1 | MIT |
54 | MIT | |
55 | github.com/stretchr/objx v0.2.0 | ISC, MIT, BSD-3-Clause |
56 | github.com/subosito/gotenv v1.2.0 | MIT |
57 | github.com/ugorji/go/codec v1.2.6 | MIT |
58 | go.uber.org/atomic v1.7.0 | MIT |
59 | golang.org/x/crypto v0.0.0-20210921155107-089bfa567519 | BSD-3-Clause, BSD-3-Clause OR MIT |
60 | golang.org/x/net v0.0.0-20211013171255-e13a2654a71e | BSD-3-Clause, BSD-2-Clause |
61 | golang.org/x/sync v0.0.0-20210220032951-036812b2e83c | BSD-3-Clause |
62 | golang.org/x/sys v0.0.0-20211013075003-97ac67df715c | BSD-3-Clause |
63 | golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1 | BSD-3-Clause |
64 | golang.org/x/text v0.3.7 | BSD-3-Clause, BSD-3-Clause OR X11, X11, ISC |
65 | golang.org/x/tools v0.1.5 | BSD-3-Clause, MIT |
66 | google.golang.org/protobuf v1.27.1 | BSD-3-Clause |
67 | gopkg.in/ini.v1 v1.62.0 | Apache-2.0 |
68 | gopkg.in/yaml.v2 v2.4.0 | Apache-2.0,MIT |
69 | gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b | Apache-2.0 OR MIT, Apache-2.0, MIT |
Cryptography
The proposal does not include cryptographic code.
Required Resources
Mailing lists
Git Repositories
Git is the preferred source control system, we're assuming https://github.com/apache/incubator-devlake based on the naming scheme.
Issue Tracking
DevLake currently uses GitHub to track issues. Would like to continue to do so while we discuss migration possibilities with the ASF Infra committee.
Initial Committers
- Klesh Wong, https://github.com/klesh
- Julien Chinapen, https://github.com/e2corporation
- Liang Zhang, https://github.com/mindlesscloud
- Yingchu Chen, https://github.com/warren830
- Jonathan O'Donnell, https://github.com/e2corporation
- Hezheng Yin, https://github.com/hezyin
- Maxim Wheatley, https://github.com/MaximDub
We nominate the initial committers mainly according to their code contributions measured by this paper. Below are the original top five code contributors in the past six months.
We added Hezheng to the list because he is responsible for everything technical, and Maxim because he is responsible for user interviews and has made a lot of non-technical contributions.
Affiliations
Merico and other companies
Sponsors
Champion
- Willem Ning Jiang, ningjiang AT apache.org
Nominated Mentors
- Felix Cheung, felixcheung AT apache.org
- Liang Zhang, zhangliang AT apache.org
- Lidong Dai, lidongdai AT apache.org
- Sijie Guo, sijie AT apache.org
Jean-Baptiste Onofré, jbonofre AT apache.org
- Willem Ning Jiang, ningjiang AT apache.org
Sponsoring Entity
Apache Incubator