Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagesql
LOAD DATA
    [LOCAL]
    INFILE 'file_name'
    [REPLACE | IGNORE]
    INTO TABLE tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
    ]
    [LINES
        [TERMINATED BY 'string']
    ]
    [IGNORE number {LINES | ROWS}]
    [(col_name_or_user_var
        [, col_name_or_user_var] ...)]
    [SET (col_name={expr | DEFAULT}
        [, col_name={expr | DEFAULT}] ...)]
 	[PROPERTIES (key1 = value1 [, key2=value2]) ]

This synatax is very similar to BrokerLoad, but BrokerLoad has a label definition and MySQL Load does not have this.
And BrokerLoad is an async statement, but mysql load is a sync statement.

...


To reduce the disk usage of FE Node, we use produce-consumer block network stream to receive bytes from client and send bytes to BE node.

Authentication

Doris now have two authentication mechanisms for mysql client and http request: client will use mysql protocol and the http request "Basic auth".

So if the fe want to send a stream load request to be, it still need a user and password to auth.

And it is unacceptable becanse user already auth from mysql client. We call it "Double Auth" issue.

To avoid this issue, I design to use the cluster token as an whitelist auth for the streamload auth of be.

The cluster token is unique and only admin user who can login fe machine will obtain the token, so it's safe in this case.

IGNORE Lines

Doris now support skip 1 or 2 lines when reading csv file with specific format named csv_with_names and csv_with_names_and_types.

And this is not enough for mysql load syntax. It should support common ignore lines for csv.


Scheduling

Maybe it will take about 2 months to finish this feature:

  1. Impl the syntax of LOAD DATA and make it load server local file into doris. (3 weeks)Support load client local file into doris.(2 weeks) 
  2. Avoid double authentication for stream load
  3. Support IGNORE number {LINES | ROWS} in stream load (2 week)Support [REPLACE | IGNORE] in stream load (2 week)

Reference

  1. MySQL Syntax
  2. MySQL Network Protocol
  3. TiDB Network Protocol Implement