You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

This page is meant as a template for writing a DSIP.

Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]

Discussion thread: 

JIRA or Github Issue: 

Released: <Doris Version>

Google Doc: <If the design in question is unclear or needs to be discussed and reviewed, a Google Doc can be used first to facilitate comments from others.>

Motivation

Cloud object storage is cheaper than multi replication local storage, thus we can put cold data to s3 to store much more data at lower price.  To be more general, doris should not lose any feature due to putting cold data to s3.

Related Research

There is an implementation migrating data to s3, https://github.com/apache/incubator-doris/pull/9197. The implementation migrates whole data of tablet to s3, and once a tablet is migrated to s3, the tablet is not allowed to be written.

Detailed Design

The proposal aims to store cold rowsets in s3 without losing any feature, like updating and schema changing.  The whole work can be divided into four parts.

Policy

There are cooldown and remote_cooldown in current implementation. Cooldown is used to migration partition from HDD to SSD while remote_cooldown is used to migration partition from local storage to S3.

Cooldown

storage_cooldown_time

Users can specify storage_cooldown_time for a table by either create table statement or alter table statement.

CREATE TABLE example_db.table_name
(
    k1 BIGINT,
    k2 LARGEINT,
    v1 VARCHAR(2048) REPLACE,
    v2 SMALLINT SUM DEFAULT "10"
)
UNIQUE KEY(k1, k2)
DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
PROPERTIES(
    "storage_medium" = "SSD",
    "storage_cooldown_time" = "2015-06-04 00:00:00"
);

ALTER TABBLE example_db.table_name SET ("storage_cooldown_time" = "2015-06-04 00:00:00")


Users can also specify storage_cooldown_time for a partition via modify partition statement.

ALTER TABLE example_db.table_name MODIFY PARTITION ("storage_cooldown_time" = "2015-06-04 00:00:00")


storage_cooldown_seconds

Storage_cooldown_time works on absolute time, sometimes users want to set a ttl time range on a partition or time. So storage_cooldown_seconds and dynamic_partition.hot_partition_num come. storage_cooldown_seconds is a config in fe.conf, it is strange that it does not implemented as an attribute of a table or a partition.

Users can specify storage_cooldown_seconds in fe.conf, when the config is set, a partition in ssd is cooldown to hdd {storage_cooldown_seconds} seconds after the partition is created.

dynamic_partition.hot_partition_num

https://doris.apache.org/advanced/partition/dynamic-partition.html#noun-interpretation

RemoteCooldown

remote_storage_cooldown_time is used to cooldown to S3.


StoragePolicy

CREATE RESOURCE "storage_policy_name"
PROPERTIES(
     "type"="storage_policy",
     "cooldown_datetime" = "2022-06-01", // time when data is transfter to medium
     "cooldown_ttl" = 1h, // data is transfter to medium after 1 hour
     "s3_*"
);


Decision

Action


Result

Scheduling

specific implementation steps and approximate scheduling.

  • No labels