Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/953fzz19pttr3nv4syt3br88lymgjfz0

JIRA or Github Issue: https://github.com/apache/incubator-doris/issues/9847

Released: <Doris Version>

Motivation

During load process, the same operation are performed on all replicas such as sort and aggregation, which are resource-intensive. Concurrent data load would consume much CPU and memory resources.

Related Research

Advantage:

Reduce the usage of CPU and Memory for load.

Improve the concurrent capability of load.

Disadvantage:

load result strongly depends on the master replica and the fault tolerance is reduced.

Detailed Design

During load process, one replica should be chosen as Master and other replicas as Slave for each tablet. We will perform write process (writing data into MemTable and then data flush) on master replica and synchronizie segment files to slave replicas before transaction finished.

1. Basic Architecture

The basic architecture is as following:

2. Replica Synchronization Mechanism

Scheduling

generating execution plan (including master replica selection)
distribute data to master replica
Segment download and rowset metadata compatibility
Bring rowset transfered from master replica into txn_manager of slave replica

Page tree

DSIP-015: Support single replica load for load