Status
Current state: Under Discussion
Discussion thread: https://lists.apache.org/thread/953fzz19pttr3nv4syt3br88lymgjfz0
JIRA or Github Issue: https://github.com/apache/incubator-doris/issues/9847
Released: <Doris Version>
Motivation
During load process, the same operation are performed on all replicas such as sort and aggregation, which are resource-intensive. Concurrent data load would consume much CPU and memory resources.
Related Research
Advantage:
Reduce the usage of CPU and Memory for load.
- Improve the concurrent capability of load.
...
- load result strongly depends on the master replica and the fault tolerance is reduced.
Detailed Design
During load process, one replica should be chosen as Master and other replicas as Slave for each tablet. We will perform write process (writing data into MemTable and then data flush) on master replica and synchronizie segment files to slave replicas before transaction finished.
1. Basic Architecture
The basic architecture is as following:
2. Replica Synchronization Mechanism
Scheduling
- generating execution plan (including master replica selection)
- distribute data to master replica
- Segment download and rowset metadata compatibility
- Bring rowset transfered from master replica into txn_manager of slave replica
...