Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario, it supports queries on a single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!

...

CarbonData 1.5.3 intention was to move more closer to unified analytics. We are allowing DDL to operate on LRU cache for the user to handle LRU cache as per his requirement. We have also upgraded the integration support for Presto latest version. More importantly, we have further improved the CarbonData performance.

In this version of CarbonData, more than 20 JIRA tickets related to new features, improvements, and bugs has have been resolved. Following are the summary.

...

Before, though the user could set the cache size, the functionality was limited as the user did not have a clear picture on of how much cache should be set for his/her requirement. 

From this version, we support DDL on CarbonData LRU Cache which allows you to do the following operations:

  • Show the current cache used per table.
  • Showing current cache used for a specific table.
  • Clearing cache for a specific table.

Supports SDK Read from Different Schema

This version allows the user to read two or more CarbonData files in a location with different schema.

...

Improved Single/Concurrent Query Performance

When the number of segments are more, query performance reduces due to higher memory footprint, multi-thread pruning, retrieval from unsafe Datamap, and so on.

...

  • Reduced memory footprints during the query.
  • Added multi-thread pruning in case of non filter nonfilter query.
  • Updated driver cache unsafe storage format for faster retrieval of data.

...

Before for count(*), the prune used to be the same as a select * query which is very time-consuming due to different process processes involved. 

In this version, we have optimized the count(*) query performance by reading blocklet row count directly from DataMapRow. This reduces the query time and improves the query performance.

...