Apache CarbonData 1.4.0 Release

Apache CarbonData community is pleased to announce the release of the Version 1.4.0 in The Apache Software Foundation (ASF).

CarbonData is a high-performance big data store solution that supports fast filter lookups and ad-hoc OLAP analysis. Due to varied business driven analysis, and the demand for flexibility of data analytics, big data domain is shadowed with data duplication and increased data management cost. CarbonData provides a new converged data storage to address data de-duplication, and supports various application scenarios. CarbonData has been deployed in 20+ enterprise production environments, largest single cluster (100+ nodes) managing data of tens of trillions. The I/O scanning and computing performance is improved by leveraging features such as multi-level index, dictionary encoding, pre-aggregation, dynamic partitioning, and quasi-real-time data query; there by achieving second-level response to analytics query on tens of trillions of data.

We encourage everyone to download the release https://dist.apache.org/repos/dist/release/carbondata/1.4.0/, and feedback through the CarbonData user mailing lists!

This release note provides information on the new features, improvements, and bug fixes of this release.

What’s New in Version 1.4.0?

In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData.

Supports SDK

Provided Carbon SDK to write and read CarbonData files through Java API, supporting Avro schema and JSON data.

Supports External Table with Location

Now you can create external table by specifying the location of Carbon data files.

Supports Streaming with Pre-Aggregate Table

Now you can create pre-aggregate table on streaming tables. This enhances OLAP type of query performance on streaming tables.

Supports Partition with Pre-Aggregate

Now when you drop the partition column in the main table, the same column can be dropped in the aggregate table keeping both in sync.

Enhanced Data Load

Now the data load performance has been enhanced.

Support Lucene Index for Text Search (Alpha feature)

This feature allows you to perform text search on Carbon data.

Supports S3 Read on CarbonData Files

Supports Search Mode (Alpha feature)

Supports search mode to improve concurrent queries performance.

Supports Bloom Filter Index (Alpha feature)

This feature fastens blocklet pruning.

Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12342754

Bug

[CARBONDATA-1506] - SDV tests error in CI
[CARBONDATA-1763] - Carbon1.3.0-Pre-AggregateTable - Recreating a failed pre-aggregate table fails due to table exists
[CARBONDATA-2098] - Add documentation for pre-aggregate tables
[CARBONDATA-2119] - CarbonDataWriterException thrown when loading using global_sort
[CARBONDATA-2131] - Alter table adding long datatype is failing but Create table with long type is successful, in Spark 2.1
[CARBONDATA-2133] - Exception displays after performing select query on newly added Boolean data type
[CARBONDATA-2134] - Prevent implicit column filter list from getting serialized while submitting task to executor
[CARBONDATA-2142] - Fixed aggregate data map creation issue in case of hive metastore
[CARBONDATA-2143] - Fixed query memory leak issue for task failure during initialization of record reader
[CARBONDATA-2147] - Exception displays while loading data with streaming
[CARBONDATA-2149] - Displayed complex type data is error when use DataFrame to write complex type data.
[CARBONDATA-2150] - Unwanted updatetable status files are being generated for the delete operation where no records are deleted
[CARBONDATA-2151] - Filter query on Timestamp/Date column of streaming table throwing exception
[CARBONDATA-2161] - Compacted Segment of Streaming Table should update "mergeTo" column
[CARBONDATA-2182] - add one more param called ExtraParmas in SessionParams for session Level operations
[CARBONDATA-2183] - fix compaction when segment is delete during compaction and remove unnecessary parameters in functions
[CARBONDATA-2185] - add InputMetrics for Streaming Reader
[CARBONDATA-2199] - Exception occurs when change the datatype of measure having sort_column
[CARBONDATA-2200] - Like operation on streaming table throwing Exception
[CARBONDATA-2207] - TestCase Fails using Hive Metastore
[CARBONDATA-2208] - Pre aggregate datamap creation is failing when count(*) present in query
[CARBONDATA-2209] - Rename table with partitions not working issue and batch_sort and no_sort with partition table issue
[CARBONDATA-2211] - Alter Table Streaming DDL should blocking DDL like other DDL ( All DDL are blocking DDL)
[CARBONDATA-2212] - Event should be fired from Stream before and after updating the status
[CARBONDATA-2217] - nullpointer issue drop partition where column does not exists and clean files issue after second level of compaction
[CARBONDATA-2219] - Add validation for external partition location to use same schema
[CARBONDATA-2261] - Support Set segment command for Streaming Table

New Feature

[CARBONDATA-2055] - Support integrating Streaming table with Spark Streaming

Improvement

[CARBONDATA-2103] - Avoid 2 time lookup in ShowTables command
[CARBONDATA-2137] - Delete query is taking more time while processing the carbondata.
[CARBONDATA-2144] - There are some improper place in pre-aggregate documentation
[CARBONDATA-2148] - Use Row parser to replace current default parser:CSVStreamParserImp
[CARBONDATA-2168] - Support global sort on partition tables
[CARBONDATA-2184] - Improve memory reuse for heap memory in `HeapMemoryAllocator`
[CARBONDATA-2187] - Restructure the partition folders as per the standard hive folders
[CARBONDATA-2196] - during stream sometime carbontable is null in executor side
[CARBONDATA-2201] - firing the LoadTablePreExecutionEvent before streaming causes NPE
[CARBONDATA-2204] - Access tablestatus file too many times during query
[CARBONDATA-2223] - Adding Listener Support for Partition

Task

[CARBONDATA-2135] - Documentation for Table Comment and Column Comment
[CARBONDATA-2138] - Documentation for HEADER option
[CARBONDATA-2214] - Remove config 'spark.sql.hive.thriftServer.singleSession' from installation-guide.md
[CARBONDATA-2215] - Add the description of Carbon Stream Parser into streaming-guide.md

Page tree