Apache CarbonData community is pleased to announce the release of the Version 1.1.0 in The Apache Software Foundation (ASF). CarbonData is a new BigData native file format for faster interactive query using advanced columnar storage, index, compression, and encoding techniques to improve computing efficiency. In turn it will help to speedup queries an order of magnitude faster over PetaBytes of data.
We encourage everyone to download the release https://dist.apache.org/repos/dist/release/carbondata/1.1.0/, and feedback through the CarbonData user mailing lists!
Introducing V3 Data Format
Now CarbonData introduces new and improved V3 data format called V3 (version 3).
- Improves the query performance by ~20% to 50%.
- Improves the sequential IO by using larger size blocklets, this helps in reading larger data at once to memory.
- Introduced pages with size of 32000 each for every column inside the blocklets, and min/max is maintained for each page to improve the filter queries.
- Improved compression/decompression for row pages.
Alter Table Support
Now CarbonData supports Alter table for table modification.
- Renaming of existing table.
- Adding a new column for existing table.
- Removing of new column for existing table.
- Upcasting of data type from INT to BIGINT or decimal precision from lower to higher.
Batch Sort Support for Data Loading
Now CarbonData supports batch sort support to improve data loading performance.
Benefits: Batch sort makes sort step as non blocking step, and capable of sorting whole batch in memory and converts to CarbonData file.
Improved Single Pass
Improved Single Pass
Benefits: Improved Single Pass load by upgrading to latest Netty framework, and launched dictionary client for each loading thread.
Range Filter Support
Now CarbonData supports range filters.
Benefits: Range filters combines the between filters to one filter to improve the filter performance.
Improvements on Large Cluster
Now there are improvements on large clusters.
- No more parallel loading of dictionary metadata in executor. Now dictionary metadata is loaded only once after all tasks inside executor uses it.
- Minimized file operations to avoid multiple namenode calls during query