Spark is no doubt a powerful processing engine and a distributed cluster computing framework for faster processing. Unfortunately there are few areas where spark has drawbacks. If we combine Apache Spark with Apache CarbonData, it can overcome those drawbacks. Few of the drawbacks with Apache Spark are as below:

  1. No Support for ACID transaction
  2. No data quality enforcement
  3. Small files problem
  4. Inefficient data skipping

Read the complete blog here.

  • No labels