Apache Kylin : Analytical Data Warehouse for Big Data

Page tree

Welcome to Kylin Wiki.

Authors

Q1. What are you trying to do? Articulate your objectives using absolutely no jargon.

As a member of the maintainer team of Apache Kylin, in order to ensure the completeness of new features and reduce the risk of regression bugs, integration test (IT) is now relied on to ensure the correctness of query engine and build engine functions of Kylin's core functions when merging patch contributed by the community and releasing new versions. However, for advanced functions that cannot be covered by IT, maintainer team needs to manually deploy environment, construct data and verify it. This is time consuming, laborious and error prone.

We hope to try to sort out and define test scenarios through the system-level automated test framework, automate and standardize the test process. Firstly, Kylin packaging is realized by using containers, without installing a specific development environment (Maven/NPM). And use Docker-Compose to start Hadoop cluster and Kylin instance; Finally, we use our testing framework to interact with Kylin instances through API/JDBC/CLI to test the functional completeness of Kylin.

Q2. What problem is this proposal NOT designed to solve?

This system testing framework can't and isn't prepared to replace the coverage of kylin-IT. It should be used as a supplement to the IT to cover the scope that the IT can't meet. It should interact with Kylin instance through REST-API/JDBC/CLI, etc., and verify the features by verifying whether the response/return result meets the expectation.

Q3. How is it done today, and what are the limits of current practice?

When verifying and merging patches contributed by the community, Kylin's maintainer team needs to analyze the influence scope of this patch, and then make a test plan. Most of the time, it is not enough to only run IT, but we need to test manually. When releasing the new version, we need to write a test plan called the main story to cover the core usage scenarios of Kylin; At the same time, in order to ensure the compatibility of Hadoop versions, we need to manually execute these test cases in each Hadoop distribution to ensure that there is no regression bugs in the main features of kylin RC package. However, for more complex functions, such as CubePlanner or read-write separation deploy, testing is difficult to cover.

Secondly, the current IT depends on the deployment environment of HDP2.4. For developers who can't get this environment, it becomes particularly difficult to complete the IT of Kylin. Therefore, Kylin contributors can only run unit tests before submitting PR, which is not enough.

Q4. What is new in your approach and why do you think it will be successful?

Automation

The whole system testing process is automated, including packaging, deployment and run test case. which can easily cooperate with Jenkins to realize regular code update and automatic test.

Reduce dependency

Hadoop clusters do not need to be packaged and deployed in advance, and only rely on a few dependencies, such as Docker and Python.

Cover advanced features

Kylin instance level testing can better verify complex scenarios, such as connectivity with BI, Kafka data source and RDBMS data source.

HTML report

Check the test results clearly by outputting HTML reports.

Q5. Who cares? If you are successful, what difference will it make?

Kylin User

We believe that Kylin users can also learn the deployment mode of Kylin through this framework, and can do some learning and verification according to their own scenarios (PoC).

Kylin Developer

Kylin developers can easily verify whether their patches meet the requirements when submitting patches to reduce the risk of introducing regression bugs.

Kylin Maintainer

Manual testing can be reduced when releasing new version to improve the efficiency (DRY) and ensure the software quality and speed of the released new version.

Q6. What are the risks?

Because the code of added part is relatively isolated from Kylin's code, the overall risk is low.

Q7. How long will it take?

At present, we have tried to package Kylin through container, and completed the deployment of Hadoop/Kylin cluster through docker-compose (Hadoop2.8.5+HBase 1.1 has been verified at present), and completed an early version of test framework based on Gauge. Finally, we provided a simple test case sample.

Future work is expected to include:

  • Support more Hadoop versions
  • Add more data source components (Kafka/PostgreSQL/SqlServer)、BI components (superset)、other components (Memcached/Ngnix/ES), to verify data source, BI compatibility.
  • Sort out, design and implement test cases

The remaining expected time is about several months.

Q8. How it works?

1. On the machine which to run test cases, we need to install Docker, Docker-Compose, Python3 and Guage in advance. We don't need to install and deploy Hadoop in advance.

2. By run docker/build_cluster_images.sh, the image construction of Hadoop components can be completed:

3. By run build/CI/run-ci.sh, you can package Kylin, deploy Hadoop clusters, deploy Kylin instances, and run test cases in turn. The following is the Docker container of Hadoop cluster and Kylin instance started under Hadoop 2.8.5 version:

4. The following is the HTML report obtained after the execution of the test case:

Reference


  • No labels