Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/953fzz19pttr3nv4syt3br88lymgjfz0

JIRA or Github Issue: https://github.com/apache/incubator-doris/issues/9847

Released: <Doris Version>


Motivation

During load process, the same operation are performed on all replicas such as sort and aggregation, which are resource-intensive. Concurrent data load would consume much CPU and memory resources.


Related Research

Advantage:

  • Reduce the usage of CPU and Memory for load.
  • Improve the concurrent capability of load.

Disadvantage:

  • load result strongly depends on the master replica and the fault tolerance is reduced.

Detailed Design

During load process, one replica should be chosen as Master and other replicas as Slave for each tablet. We will perform write process (writing data into MemTable and then data flush) on master replica and synchronizie segment files to slave replicas before transaction finished.

1. Basic Architecture

The basic architecture is as following:


2. Replica Synchronization Mechanism

Scheduling

  • generating execution plan (including master replica selection)
  • distribute data to master replica
  • Segment download and rowset metadata compatibility
  • Bring rowset transfered from master replica into txn_manager of slave replica


  • No labels