Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this release, there are more than 80+ new feature and improvements , more than 100+ bug fixes , please find the detail at :

New Features

New load data solution

The old CarbonData load solution depends on Kettle engine, but Kettle engine is not designed for handling big data domain and the code maintainability is complex in this flow. So in the 1.0 version, a new data loading solution without kettle dependency is added and makes more modular and improved performance.

Support Spark2.1 integration in carbon

Spark 2.1 has added many features and improved the performance. CarbonData also gets the advantage of it after upgrading.

Data update/delete SQL support

Now user can delete and update the carbon table using standard sql syntax. This feature currently is supported in Spark 1.5/1.6 integration, it will be support in Spark 2.1 integration soon.

Support adaptive data compression for int/bigint/decimal to increase compression ratio

This feature can adapt the data to the smaller data type that fits the value, and it also supports delta compression technique to reduce the store size.

Support to define Date/Timestamp format for different columns

Now user can provide Date/Timestamp format for each column while loading the data. Provide option in the create table DDL itself to define the format for each Timestamp column, also provide defaults so that users can create table with Timestamp columns without having to always define the Date/Timestamp format.

Implement LRU cache for B-Tree

Btree in CarbonData keeps the information of blocks and blocklets of carbon tables inside memory. If number of tables increases or data increases there is a possibility of going out of memory. LRU cache of Btree now keep only recently or frequently used block/blocklet information in memory and evicts the unused or less used block/blocklet information.

Performance Improvement

CarbonData V2 format to improve first time query performance

This V2 format is more organized and maintains less metadata(reads metadata on demand) so that first time queries are faster. And also it has less IO cost compare to V1. Several testcases show that first time query response time reduced around 50%.

Vectorized reader support

It reads the data in batches, column by column. This feature reduces GC time and improve performance during data scan.

Fast join using bucket table

This feature enable bucket table support for CarbonData. It can improve the join query performace by avoiding shuffling if both tables are bucketed on same column with same number of buckets.It is supported in Spark 2.1 version.

Leveraging off-heap memory to reduce GC

By leveraging off-heap memory, it improves both loading and reading performance. In data loading, it improves data sorting performance and in reading, also it reduces GC overhead as it stores data in off-heap

Support single-pass loading

Currently data loading happens in 2 jobs (generate dictionary first, then do the actual data loading), this feature enables single job to finish the data loading with dictionary generation on the fly. It can improve the performance for the scenario that data loading with less incremental updates on dictionary, which usually is this case after initial data load.

Support pre-generated dictionary for data loading

User can use the generated dictionary, this feature also supports with customized dictionary by users to improve data load efficiency.

...

Please find the detailed JIRA list : https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12338020

Sub-task

Bug

  • [CARBONDATA-333] - Unable to perform compaction
  • [CARBONDATA-341] - CarbonTableIdentifier being passed to the query flow has wrong tableid
  • [CARBONDATA-362] - Optimize the parameters' name in CarbonDataRDDFactory.scala
  • [CARBONDATA-374] - Short data type is not working.
  • [CARBONDATA-375] - Dictionary cache not getting cleared after task completion in dictionary decoder
  • [CARBONDATA-381] - Unnecessary catalog metadata refresh and array index of bound exception in drop table
  • [CARBONDATA-390] - Float Data Type is Not Working
  • [CARBONDATA-404] - Data loading from DataFrame to carbon table is FAILED
  • [CARBONDATA-405] - Data load fail if dataframe is created with LONG datatype column . 
  • [CARBONDATA-412] - in windows, when load into table whose name has "_", the old segment will be deleted.
  • [CARBONDATA-418] - Data Loading performance issue
  • [CARBONDATA-421] - Timestamp data type filter issue with format other than "-"
  • [CARBONDATA-442] - Query result mismatching with Hive
  • [CARBONDATA-448] - Solve compilation error for spark2 integration
  • [CARBONDATA-451] - Can not run query on windows now
  • [CARBONDATA-456] - Select count(*) from table is slower.
  • [CARBONDATA-459] - Block distribution is wrong in case of dynamic allocation=true
  • [CARBONDATA-471] - Optimize no kettle flow and fix issues in cluster
  • [CARBONDATA-474] - Implement unit test cases for core.datastorage package
  • [CARBONDATA-476] - storeLocation start with file:/// cause table not found exceptioin
  • [CARBONDATA-481] - [SPARK2]fix late decoder and support whole stage code gen
  • [CARBONDATA-486] - Reading dataframe concurrently will lead to wrong data
  • [CARBONDATA-487] - spark2 integration is not compiling
  • [CARBONDATA-492] - When profile spark-2.0 is avtive , CarbonExample have error in intellij idea
  • [CARBONDATA-493] - Insert into select from a empty table cause exception
  • [CARBONDATA-497] - [Spark2] fix datatype issue of CarbonLateDecoderRule
  • [CARBONDATA-518] - CarbonExample of spark moudle can not run as kettlehome and storepath shoug get form carbonproperties now
  • [CARBONDATA-522] - New data loading flowcauses testcase failures like big decimal etc
  • [CARBONDATA-532] - When set use_kettle=false , the testcase [TestEmptyRows] run failed
  • [CARBONDATA-536] - Initialize GlobalDictionaryUtil.updateTableMetadataFunc for Spark 2.x
  • [CARBONDATA-537] - Bug fix for DICTIONARY_EXCLUDE option in spark2 integration
  • [CARBONDATA-539] - Return empty row in map reduce application
  • [CARBONDATA-544] - Delete core/.TestFileFactory.carbondata.crc,core/Testdb.carbon
  • [CARBONDATA-552] - Unthrown FilterUnsupportedException in catch block
  • [CARBONDATA-557] - Option use_kettle is not work when use spark-1.5
  • [CARBONDATA-558] - Load performance bad when use_kettle=false
  • [CARBONDATA-560] - In QueryExecutionException, can not use executorService.shutdownNow() to shut down immediately.
  • [CARBONDATA-562] - Carbon Context initialization is failed with spark 1.6.3 
  • [CARBONDATA-563] - Select Queries are not working with spark 1.6.2.
  • [CARBONDATA-573] - To fix query statistic issue
  • [CARBONDATA-574] - Add thrift server support to Spark 2.0 carbon integration
  • [CARBONDATA-577] - Carbon session is not working in spark shell.
  • [CARBONDATA-581] - Node locality cannot be obtained in group by queries
  • [CARBONDATA-582] - Able to create table When Number Of Buckets is Given in negative
  • [CARBONDATA-585] - Dictionary file is locked for Updation
  • [CARBONDATA-589] - carbon spark shell is not working with spark 2.0
  • [CARBONDATA-593] - Select command seems to be not working on carbon-spark-shell . It throws a runtime error on select query after show method is invoked
  • [CARBONDATA-595] - Drop Table for carbon throws NPE with HDFS lock type.
  • [CARBONDATA-600] - Should reuse unit test case for integration module
  • [CARBONDATA-608] - Compliation Error with spark 1.6 profile
  • [CARBONDATA-609] - CarbonDataFileVersionIssue
  • [CARBONDATA-611] - mvn clean -Pbuild-with-format package does not work
  • [CARBONDATA-614] - Fix dictionary locked issue
  • [CARBONDATA-617] - Insert query not working with UNION
  • [CARBONDATA-618] - Add new profile to build all modules for release purpose
  • [CARBONDATA-619] - Compaction API for Spark 2.1 : Issue in compaction type
  • [CARBONDATA-620] - Compaction is failing in case of multiple blocklet
  • [CARBONDATA-621] - Compaction is failing in case of multiple blocklet
  • [CARBONDATA-622] - Should use the same fileheader reader for dict generation and data loading
  • [CARBONDATA-627] - Fix Union unit test case for spark2
  • [CARBONDATA-628] - Issue when measure selection with out table order gives wrong result with vectorized reader enabled
  • [CARBONDATA-629] - Issue with database name case sensitivity
  • [CARBONDATA-630] - Unable to use string function on string/char data type column
  • [CARBONDATA-632] - Fix wrong comments of load data in CarbonDataRDDFactory.scala
  • [CARBONDATA-633] - Query Crash issue in case of offheap
  • [CARBONDATA-634] - Load Query options invalid values are not consistent behaviour. 
  • [CARBONDATA-635] - ClassCastException in Spark 2.1 Cluster mode in insert query when name of column is changed and When the orders of columns are changed in the tables
  • [CARBONDATA-636] - Testcases are failing in spark 1.6 and 2.1 with no kettle flow. 
  • [CARBONDATA-639] - "Delete data" feature doesn't work
  • [CARBONDATA-641] - DICTIONARY_EXCLUDE is not working with 'DATE' column
  • [CARBONDATA-643] - When we are passing ALL_DICTIONARY_PATH' in load query ,it is throwing null pointer exception.
  • [CARBONDATA-644] - Select query fails randomly on spark shell
  • [CARBONDATA-648] - Code Clean Up
  • [CARBONDATA-650] - Columns switching error in performing the string functions
  • [CARBONDATA-654] - Add data update and deletion example
  • [CARBONDATA-667] - after setting carbon property carbon.kettle.home in carbon.properties , while loading data, it is not referring to the carbon.properties file in carbonlib directory
  • [CARBONDATA-668] - Dataloads fail when no. of column in load query is greater than the no. of column in create table
  • [CARBONDATA-669] - InsertIntoCarbonTableTestCase.insert into carbon table from carbon table union query random test failure
  • [CARBONDATA-671] - Date data is coming as null when date data is before 1970
  • [CARBONDATA-673] - Reverting big decimal compression as it has below issue
  • [CARBONDATA-674] - Store compatibility 0.2 to 1.0

Improvement

New Feature

Task

  • [CARBONDATA-444] - Improved integration test-case for AllDataTypesTestCase1
  • [CARBONDATA-445] - Improved integration test-case for AllDataTypesTestCase3

Test

Wish

  • [CARBONDATA-85] - please support insert into carbon table from other format table

...