Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We want to implement the 3 principlesgoals:

  1. Guarantee once data is accepted, it is eventually imported successfully

...



Now, let's review the previous 3 principlesgoals again:

  1. Guarantee once data is accepted, it is eventually imported successfully.
    • Since in the current design, WAL is a single replica, there is a risk of data loss

    • Compatibility with Schema Change:

      1. For each insert, we first begin a transaction. This will block schema change and ensure the import is successful.

      2. When Load Be is restarted and schema changed, the data in WAL maybe can not loaded (maybe consider SchemaHash to solve it).

  2. Data import is not duplicated  (YES)
  3. Commit order is the same as user write order
    • The commit order can not be guaranteed if Load Be1 is down and can not started, the newly Load Be2 may commit first, and then Load Be1 starts the recovery WAL. In this case, the version of the old data is newer. (consider seq_id or some other solutions)

    • The order between conditional deletion and auto_batch_load is not guaranteed. (maybe conditional deletion statements can also be handled in auto_batch_load mode)

    • Currently, only insert statements that contain all columns are supported, and consider supporting insert that does not contain all columns

...

FAQ

  1. What scenario to use it?
    • Currently, we use it in log writing scene

  2. How to use it
    • enable auto_batch_load for table
    • client use "insert into...values(),(),()" statement to write data
  3. How to work with stream load
    • The commit order can not by be guaranteed
  4. Why not use "begin... insert... commit"
    • From the user's view, the "begin... insert... commit" ensures the atomicity of multi rows. Many use cases do not require such capabilities, and can use insert directly.

    • "begin... insert... commit" requires the user to control the commits, and also associates with a transaction and RowSet, commits can not too frequently; auto_batch_load can commit data from multi clients.

    • Data may be lost if users don't commit when "begin... insert... commit"; auto_batch_load pre-write WAL to ensure that the data can be eventually loaded.

  5. The performance impact of too much rpc
    • Many KV systems are used in this way, and rpc has little impact

    • In our tests, Fe can parse and redirect 20,000 + SQL per second. At the same time, Fe is scalable.

...