This page is meant as a template for writing a DSIP.

Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/ht0m6d4rzsjwy0hd9sdmf4q6w74m93yf

JIRA or Github Issue: https://github.com/apache/incubator-doris/issues/8025

Released: 0.15.0

Google Doc

Motivation

To simplify binlog load usage, shorten data flow.

Related Research

Debezium is a mature lib used by FlinkCDC, provides capability of aquire and parse binlog of MySQL. It aquire binlog by fake itself as a MySQL's slave, and translate the binlog into a readable format.

The similiar feature is also provided by ClickHouse, it's useful for new user with demand of HTAP.

The advantage of this feature is friendly to user, a new user who has never used any bigdata service can start his OLAP tour only by deploy a Doris service.

The disadvantage is mainly focus on lack of Fe resource. 

Detailed Design

1. Scheme

User can create a Sync Job with type of 'debezium', and FE will start to aquire binlog from certain MySQL. The received binlog event will be put into a blocking queue, and consumer will send these data to Be routinely.

2. Fault Tolerant

The position info of binlog consumption will be stored in metadata of FE, and updated once a batch of data is committed.

If Fe was restart, the sync job can start consuming binlog from the last position of data commit.

3. Usage

The prerequirement of this feature is MySQL version 5.7 or above to support binlog.

A simple example for creating a Sync Job is:

Scheduling

I'm prefer to develop the feature by two phase.

Phase I, I'll complete the functional code so that the feature can be uesd.

Phase II, I'll start to optimize the performance of this feature.

  • No labels