Apache CarbonData community is pleased to announce the release of the Version 1.6.1 in The Apache Software Foundation (ASF).
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenarios, it supports queries on a single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
We encourage you to use the release https://archive.apache.org/dist/carbondata/1.6.1/, and feedback through the CarbonData user mailing lists!
This release note provides information on the new features, improvements, and bug fixes of this release.
What’s New in CarbonData Version 1.6.1?
CarbonData 1.6.1 intention was to move closer to unified analytics and improve the stability. In this version of CarbonData, around 40 JIRA tickets related to improvements, and bugs have been resolved. Following are the summary.
Index Server performance improvements for Full Scan and TPCH Queries
Carbon currently prunes and caches all block/blocklet datamap index information into the driver. If the cache size becomes huge(70-80% of the driver memory) then there can be excessive GC in the driver which can slow down the queries and the driver may even go OutOfMemory. Moving out the indexes to separate JDBCServer reduced the overhead on the primary JDBCServer, but introduced delay in fetching the bulk pruning blocks list from the Index server. This is improved in this release and performance is same as running without Index Server.
Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12345993
- [CARBONDATA-3454] - Optimize the performance of select count(*) for index server
- [CARBONDATA-3462] - Add usage and deployment document for index server
- [CARBONDATA-3452] - select query failure when substring on dictionary column with join
- [CARBONDATA-3474] - Fix validate mvQuery having filter expression and correct error message
- [CARBONDATA-3476] - Read time and scan time stats shown wrong in executor log for filter query
- [CARBONDATA-3477] - Throw out exception when use sql: 'update table select\n...'
- [CARBONDATA-3478] - Fix ArrayIndexOutOfBoundsException issue on compaction after alter rename operation
- [CARBONDATA-3480] - Remove Modified MDT and make relation refresh only when schema file is modified.
- [CARBONDATA-3481] - Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning
- [CARBONDATA-3482] - Null pointer exception when concurrent select queries are executed from different beeline terminals.
- [CARBONDATA-3483] - Can not run horizontal compaction when execute update sql
- [CARBONDATA-3485] - data loading is failed from S3 to hdfs table having ~2K carbonfiles
- [CARBONDATA-3486] - Serialization/ deserialization issue with Datatype
- [CARBONDATA-3487] - wrong Input metrics (size/record) displayed in spark UI during insert into
- [CARBONDATA-3490] - Concurrent data load failure with carbondata FileNotFound exception
- [CARBONDATA-3493] - Carbon query fails when enable.query.statistics is true in specific scenario.
- [CARBONDATA-3494] - Nullpointer exception in case of drop table
- [CARBONDATA-3495] - Insert into Complex data type of Binary fails with Carbon & SparkFileFormat
- [CARBONDATA-3499] - Fix insert failure with customFileProvider
- [CARBONDATA-3502] - Select query fails with UDF having Match expression inside IN expression
- [CARBONDATA-3505] - Fixed drop database cascade issue when 2 database point to same location.
- [CARBONDATA-3506] - Alter table add, drop, rename and datatype change fails with hive compatile property
- [CARBONDATA-3507] - Create Table As Select Fails in Spark-2.3
- [CARBONDATA-3508] - Select query fails when the cg datamap is dropped concurrently while running the select query on filter column on which datamap is created
- [CARBONDATA-3513] - can not run major compaction when using hive partition table
- [CARBONDATA-3520] - CTAS should fail if select query contains duplicate columns
- [CARBONDATA-3526] - Cache issue and select query failure with multiple updates
- [CARBONDATA-3527] - Throw 'String length cannot exceed 32000 characters' exception when load data with 'GLOBAL_SORT' from csv which include big complex type data
- [CARBONDATA-3488] - Check the file size after move local file to carbon path
- [CARBONDATA-3489] - Optimizing the performance of sorting
- [CARBONDATA-3491] - Return updated/deleted rows count when execute update/delete sql
- [CARBONDATA-3501] - Support to execute update sql on table with long_string field (Not update long_string field)
- [CARBONDATA-3511] - Query time improvement by reducing the number of NameNode calls while having carbonindex files in the store
- [CARBONDATA-3515] - Limit local dictionary size to 10% of allowed blocklet size
- [CARBONDATA-3523] - Should store file size into index file
- [CARBONDATA-3524] - support compaction by GLOBAL_SORT
- [CARBONDATA-3528] - refactor java checkstyle rules
- [CARBONDATA-3540] - Delete all external segments when dropping table
- [CARBONDATA-3544] - CLI should support a option to show statistics for all columns