Apache Kylin : Analytical Data Warehouse for Big Data
Welcome to Kylin Wiki.
In this pages, I would like to list all new feature and break changes. Some features are in the status of IN PROGRESS, means Kylin team is going to implement this feature, and we will obey its priority. So if you have your suggestion/opinion on current priority, please let us know.
IN PROGRESS SOURCE 4.1.0 IN PROGRESS SOURCE P1 4.1.0 IN PROGRESS SOURCE P2 4.1.0 IN PROGRESS SOURCE P2 4.1.0 DELETED BUILD ENGINE READY BUILD ENGINE P0 4.0.0-ALPHA DELETED BUILD ENGINE DELETED STORAGE ENGINE READY STORAGE ENGINE P0 4.0.0-ALPHA READY QUERY ENGINE P0 4.0.0-ALPHA READY MEASURE P0 4.0.0-ALPHA READY MEASURE P0 4.0.0-ALPHA IN PROGRESS MEASURE P0 4.0.0-BETA READY MEASURE 4.0.0-ALPHA READY QUERY ENGINE BEFORE 4.0 DEPRECATED METASTORE P2 BEFORE 4.0 READY METASTORE P0 BEFORE 4.0 IN PROGRESS TOOL P1 4.0.0-BETA READY TOOL P0 4.0.0-ALPHA READY SOURCE P1 4.0.0-ALPHA IN PROGRESS QUERY ENGINE P0 4.0.0-BETA DELETED BUILD ENGINE DELETED BUILD ENGINE READY BUILD ENGINE P0 4.0.0-ALPHA IN PROGRESS ADVANCED P0 4.0.0-BETA IN PROGRESS ADVANCED P0 4.0.0-BETA READY ADVANCED P0 4.0.0-ALPHA READY QUERY ENGINE P0 4.0.0-ALPHA DELETED ADVANCED DELETED ADVANCED Support deploy Kylin on EMR5.x, EMR 6.x . Support Glue. IN PROGRESS ENV P0 4.0.0-BETA READY ENV P0 4.0.0-ALPHA Going to support Hadoop3 + Hive2 in 2020-Q4. Not sure when to suppoort Hive 3. IN PROGRESS ENV P0 4.0.0-BETA IN PROGRESS ENV P2 FUTURE IN PROGRESS ENV P2 4.1.0 IN PROGRESS BUILD ENGINE P1 4.1.0 IN PROGRESS BUILD ENGINE P1 4.1.0 IN PROGRESS QUERY ENGINE P1 4.1.0Release Plan
Release version Expected Date Comment Release Detail 4.0.0-alpha 2020-09 Release core features, including new build engin & query engine.s'c 4.0.0-beta 2020-12 ~ 2021-01 Implement other important features. 4.0.0-gamma 2021-04 Bug fix & Promotion 4.0.0 2021-07 GA (Ready for production) 4.1.0 Far future ... Features
Feature Description Comment Status Component Priority Arrival(Expected) Kafka Source(NRT) Ingest streaming data in batch way In design phase, did not have a conclusion of how to implement. Kafka Source(Real-time OLAP) Ingest streaming data in stream/micro-batch way In design phase, did not have a conclusion of how to implement. JDBC Source(Original version) Ingest data via JDBC contract In design phase, did not have a conclusion of how to implement. JDBC Source(Datasource SDK) Ingest data via JDBC contract In design phase, did not have a conclusion of how to implement. MapReduce Build Engine Build pre-calculated cuboid data by Hadoop Mapreduce This feature maybe useless Spark Build Engine Build pre-calculated cuboid data by Apache Spark New implementation provided Flink Build Engine Build pre-calculated cuboid data by Apache Flink Support in Kylin 3.1 HBase Storage Use HBase to store pre-calculated cuboid data. Discussion in mailing list Parquet Storage Use Parquet to store pre-calculated cuboid data. Discussion in mailing list Distributed Query Engine / Sparder Use calcite&catayst(a.k.a. Spark SQL) to parse/analyse/excute a SQL query. New implementation provided Measure - Bitmap Precise count distinct. N/A Measure - HLL Non-precise count distinct but low cost. N/A Measure - TopN TopN Measure N/A Measure - Percentile Percentile N/A P0 Query Cache Cache query result in query's memory or external cache service. N/A HBase Metastore Use HBase as metastore. I guess it will be removed in the GA version. (xxyu) RDBMS Metastore Use RDBMS as metastore. Should as the first choice of metastore. Cardinality Computation Calculate cardinality of fact table and dimension table. Planning Storage Cleanup Remove useless data from storage or metastore. New implementation CSV Source Build cube from user-side csv file. New implementation SQL Standard to be updated In testing. Global Dictionary(Hive) Use hive and MR to build global dictionary New global dictionary will replace this feature. Global Dictionary(AppendTireDictionary) Tire dictionary New global dictionary will replace this feature. Global Dictionary(Spark Bucket Dictionary) Use apache spark to build global dictionary New implementation Cube Planner to be updated In design phase, did not have a conclusion of how to implement. System Cube and Dashboard to be updated Not well tested, planning Read write Seperatation The query engine and build engine use different Hadoop cluster. New implementation provided Pushdown Engine to be updated New pushdown engine will only support SparkSQL. Shrunken Dictionary to be updated This feature maybe useless UHC dictionary to be updated This feature maybe useless Deploy on AWS EMR Planning All-in-one container Provided a quick-start container for learning purpose. How to learn Kylin in Docker Hadoop3 support Planning Hive3 support N/A Spark3 support Support use Spark3 for build and query. N/A Hybrid Model / Flexible cuboid build Add dimension or remove dimension without purge whole cube data. Planning Multi-level partition segment Looke like Hive's multi-level partition design. N/A Use catalyst to replace calcite Make query analysis quicker and lighter. N/A
Link
- Deprecated~Development Plan Kylin 4.0