Apache CarbonData community is pleased to announce the release of the Version 1.2.0  in The Apache Software Foundation (ASF). CarbonData is a new BigData native file format for a faster interactive query using advanced columnar storage, index, compression, and encoding techniques to improve computing efficiency. In turn, it will help to speed up queries an order of magnitude faster over PetaBytes of data.

We encourage everyone to download the release https://archive.apache.org/dist/carbondata/1.2.0/, and feedback through the CarbonData user mailing lists!

This release note provides information on the new features, improvements, and bug fixes of this release. 

What’s New in Version 1.2.0?

In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData.

Support Presto Integration

CarbonData Presto connector allows faster fetching of results of interactive queries. It enables exploration of data to determine the types of record in tables at quicker rates and it is faster with queries that comprise of joins with a large Fact table and many smaller Dimension tables.

Support Hive Integration

Hive connector with CarbonData is the best solution when you want to use batch-style data processing, large data aggregations, and large fact-to-fact joins.

Optimized Measure Filter for Improved Performance

Supports Sort Columns

Now you can specify only required columns (which are used in query filters) to be sorted while loading data. It improves the loading speed marginally.

Supports Four Types of Sort Scope

Now the sort scope is only defined while creating the table and it cannot be changed during loading. There are four types of sort supported Local Sort, Batch Sort, Global Sort, and No Sort. These sorts help to improve the performance like load, point query, and so on.

Support Partition

Partition helps in better data organization, management, and storage. Partitioning the table also helps in avoiding full table scan in some scenarios; hence improving the query performance. There are three types of partition table supported Hash Partition, Range Partition, and List Partition.

Optimized Data Update & Delete for Spark 2.1

Optimized data update and delete for Spark 2.1 for improved query performance.

Support DataMap

Support DataMap framework that can be used for index and statistics to accelerate query performance. It enables developers to add custom indexes for driver side pruning. 

Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12340260



