Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is meant as a template for writing a DSIP.

Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]

...

There is an implementation migrating data to s3, https://github.com/apache/incubator-doris/pull/9197.

The implementation migrates whole data of tablet to s3, and once a tablet is migrated to s3, the tablet is not allowed to be writtenspecific approach to this implementation is:
1. Use schema change to generate cold data migration jobs to s3. This job is at the partition level.
2. Complete the data migration in the BE side by using similar logic as schema change.

Advantage:
1. The implementation is simple and the progress is controllable.

Because the same logic of schema change is adopted, the entire process has FE to achieve final control, and it can ensure that the atomic effect at the partition level is effective, and no intermediate state is generated.

Shortcoming:
1. load on cold data cannot be supported.
2. Cannot support schema change.

However, these two functions are strong requirements of users. We need to realize the tiered storage of data in cold and hot without affecting the complete functions of Doris.

Detailed Design

The proposal aims to store cold rowsets in s3 without losing any feature, like updating and schema changing.  The whole work can be divided into four parts.

  1. Policy

...

  1. : How to define and describe the hot and cold properties of data, such as cooling time, storage location, etc.
  2. Decision: How to execute a policy, such as who initiates data migration tasks, how to synchronize multiple replicas, etc.
  3. Execution: Interaction logic with s3, such as reading and writing of IO stack
  4. Result:The status of data after data migration is completed, such as organization of storage paths, garbage cleaning, clone and deletion, compaction, etc.

1. Policy

Currently, Doris supports local data tiered storage, and its general approach is as follows:

  1. Local storage is divided into HDD and SSD, corresponding to cold storage and hot storage respectively.
  2. The cooling time can be set at the partition level. After expiration, FE will migrate data from HDD to SSD through storage migration task.

In order to ensure compatibility with the current logic, and to keep the structure of the code clear. When implementing S3 storage, we use an additional set of strategies.

Along with the previous strategy, the new tiered storage strategy is as follows:

  • Local
    • HDD
    • SSD
  • Remote(S3)

The Local is the current hierarchical storage implementation of HDD and SSD. The Remote refers to S3.

For the Local level, we keep the original strategy unchanged, that is, the partition-level hierarchical storage setting is still supported.

For the Remote level, we only support policy settings at the table level. However, the application granularity of this policy is still at the partition level. This way we can ensure that the strategy is simple enough.

StoragePolicy

First, user

There are cooldown and remote_cooldown in current implementation. Cooldown is used to migration partition from HDD to SSD while remote_cooldown is used to migration partition from local storage to S3.

Cooldown

storage_cooldown_time

Users can specify storage_cooldown_time for a table by either create table statement or alter table statement.

Code Block
CREATE TABLE example_db.table_name
(
    k1 BIGINT,
    k2 LARGEINT,
    v1 VARCHAR(2048) REPLACE,
    v2 SMALLINT SUM DEFAULT "10"
)
UNIQUE KEY(k1, k2)
DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
PROPERTIES(
    "storage_medium" = "SSD",
    "storage_cooldown_time" = "2015-06-04 00:00:00"
);

ALTER TABBLE example_db.table_name SET ("storage_cooldown_time" = "2015-06-04 00:00:00")

Users can also specify storage_cooldown_time for a partition via modify partition statement.

Code Block
ALTER TABLE example_db.table_name MODIFY PARTITION ("storage_cooldown_time" = "2015-06-04 00:00:00")

storage_cooldown_seconds

Storage_cooldown_time works on absolute time, sometimes users want to set a ttl time range on a partition or time. So storage_cooldown_seconds and dynamic_partition.hot_partition_num come. storage_cooldown_seconds is a config in fe.conf, it is strange that it does not implemented as an attribute of a table or a partition.

Users can specify storage_cooldown_seconds in fe.conf, when the config is set, a partition in ssd is cooldown to hdd {storage_cooldown_seconds} seconds after the partition is created.

dynamic_partition.hot_partition_num

https://doris.apache.org/advanced/partition/dynamic-partition.html#noun-interpretation

RemoteCooldown

remote_storage_cooldown_time is used to cooldown to S3.

StoragePolicy

The proposal proposes the storage policy. Users can create a storage policy and apply it to a table. When a cooldown_datetime is specified, cooldown_ttl is ignored. 

Code Block
CREATE RESOURCE "storage_policy_name"
PROPERTIES(
     "type"="storage_policy",
     "cooldown_datetime" = "2022-06-01", // time when data is transfter to medium
     "cooldown_ttl" = 1h,"1h", // data is transfter to medium after 1 hour
     "s3_*"resource" = "my_s3" // point to a s3 resource
);

CREATE TABLE example_db.table_hash
(
    k1 BIGINT,
    k2 LARGEINT,
    v1 VARCHAR(2048) REPLACE,
    v2 SMALLINT SUM DEFAULT "10"
)
UNIQUE KEY(k1, k2)
DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
PROPERTIES(
    "storage_medium" = "SSD",
    "sotrage_policy" = "storage_policy_name"
);

When a cooldown_datetime is specified, cooldown_ttl is ignored.

Users can modify cooldown_datetime, cooldown_ttl, s3_ak, s3_sk of a storage policy, others attributed are not allowed to be modified.  For simplicity, be refresh storage policy periodically to refresh ak sk.

A storage policy can be applied to multi tables. And user can simple modify one policy to apply to many tables.

2. Decision

A backend choose a rowset according to policy and migrate it to s3.

...