Apache CarbonData community is pleased to announce the release of the Version 1.3.1 in The Apache Software Foundation (ASF). CarbonData is a new BigData native file format for a faster interactive query using advanced columnar storage, index, compression, and encoding techniques to improve computing efficiency. In turn, it will help to speed up queries an order of magnitude faster over PetaBytes of data.
We encourage everyone to download the release https://archive.apache.org/dist/carbondata/1.3.1/, and feedback through the CarbonData user mailing lists!
This release note provides information on the new features, improvements, and bug fixes of this release.
What’s New in Version 1.3.1?
In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData.
Restructured Carbon Partition
The Carbon partition is been restructured to use the Hive standard folder structure.
Support Global Sort on Partition Tables
Supports Global Sort on partition tables, this improves the query performance and better resource management.
Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12342754
Bug
- [CARBONDATA-1506] - SDV tests error in CI
- [CARBONDATA-1763] - Carbon1.3.0-Pre-AggregateTable - Recreating a failed pre-aggregate table fails due to table exists
- [CARBONDATA-2098] - Add documentation for pre-aggregate tables
- [CARBONDATA-2119] - CarbonDataWriterException thrown when loading using global_sort
- [CARBONDATA-2131] - Alter table adding long datatype is failing but Create table with long type is successful, in Spark 2.1
- [CARBONDATA-2133] - Exception displays after performing select query on newly added Boolean data type
- [CARBONDATA-2134] - Prevent implicit column filter list from getting serialized while submitting task to executor
- [CARBONDATA-2142] - Fixed aggregate data map creation issue in case of hive metastore
- [CARBONDATA-2143] - Fixed query memory leak issue for task failure during initialization of record reader
- [CARBONDATA-2147] - Exception displays while loading data with streaming
- [CARBONDATA-2149] - Displayed complex type data is error when use DataFrame to write complex type data.
- [CARBONDATA-2150] - Unwanted updatetable status files are being generated for the delete operation where no records are deleted
- [CARBONDATA-2151] - Filter query on Timestamp/Date column of streaming table throwing exception
- [CARBONDATA-2161] - Compacted Segment of Streaming Table should update "mergeTo" column
- [CARBONDATA-2182] - add one more param called ExtraParmas in SessionParams for session Level operations
- [CARBONDATA-2183] - fix compaction when segment is delete during compaction and remove unnecessary parameters in functions
- [CARBONDATA-2185] - add InputMetrics for Streaming Reader
- [CARBONDATA-2199] - Exception occurs when change the datatype of measure having sort_column
- [CARBONDATA-2200] - Like operation on streaming table throwing Exception
- [CARBONDATA-2207] - TestCase Fails using Hive Metastore
- [CARBONDATA-2208] - Pre aggregate datamap creation is failing when count(*) present in query
- [CARBONDATA-2209] - Rename table with partitions not working issue and batch_sort and no_sort with partition table issue
- [CARBONDATA-2211] - Alter Table Streaming DDL should blocking DDL like other DDL ( All DDL are blocking DDL)
- [CARBONDATA-2212] - Event should be fired from Stream before and after updating the status
- [CARBONDATA-2217] - nullpointer issue drop partition where column does not exists and clean files issue after second level of compaction
- [CARBONDATA-2219] - Add validation for external partition location to use same schema
New Feature
- [CARBONDATA-2055] - Support integrating Streaming table with Spark Streaming
Improvement
- [CARBONDATA-2103] - Avoid 2 time lookup in ShowTables command
- [CARBONDATA-2137] - Delete query is taking more time while processing the carbondata.
- [CARBONDATA-2144] - There are some improper place in pre-aggregate documentation
- [CARBONDATA-2148] - Use Row parser to replace current default parser:CSVStreamParserImp
- [CARBONDATA-2168] - Support global sort on partition tables
- [CARBONDATA-2184] - Improve memory reuse for heap memory in `HeapMemoryAllocator`
- [CARBONDATA-2187] - Restructure the partition folders as per the standard hive folders
- [CARBONDATA-2196] - during stream sometime carbontable is null in executor side
- [CARBONDATA-2201] - firing the LoadTablePreExecutionEvent before streaming causes NPE
- [CARBONDATA-2204] - Access tablestatus file too many times during query
Task
- [CARBONDATA-2135] - Documentation for Table Comment and Column Comment
- [CARBONDATA-2138] - Documentation for HEADER option
- [CARBONDATA-2214] - Remove config 'spark.sql.hive.thriftServer.singleSession' from installation-guide.md
- [CARBONDATA-2215] - Add the description of Carbon Stream Parser into streaming-guide.md