You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This page is meant as a template for writing a DSIP.

Status

Current state[One of "Under Discussion", "Accepted", "Rejected"]

Discussion thread: 

JIRA or Github Issue: 

Released: <Doris Version>

Google Doc: <If the design in question is unclear or needs to be discussed and reviewed, a Google Doc can be used first to facilitate comments from others.>

Motivation

Cloud object storage is cheaper than multi replication local storage, thus we can put cold data to s3 to store much more data at lower price.  To be more general, doris should not lose any feature due to putting cold data to s3.

Related Research

There is an implementation migrating data to s3, https://github.com/apache/incubator-doris/pull/9197. The implementation migrates whole data of tablet to s3, and once a tablet is migrated to s3, the tablet is not allowed to be written.

Detailed Design

The proposal aims to store cold rowsets in s3 without losing any feature, like updating and schema changing.  The whole work can be divided into four parts.

Policy

There are cooldown and remote_cooldown in current implementation. Cooldown is used to migration partition from HDD to SSD while remote_cooldown is used to migration partition from local storage to S3.

Users can specify storage_cooldown_time for a table by either create table statement or alter table statement.

CREATE TABLE example_db.table_name
(
    k1 BIGINT,
    k2 LARGEINT,
    v1 VARCHAR(2048) REPLACE,
    v2 SMALLINT SUM DEFAULT "10"
)
UNIQUE KEY(k1, k2)
DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
PROPERTIES(
    "storage_medium" = "SSD",
    "storage_cooldown_time" = "2015-06-04 00:00:00"
);

ALTER TABBLE example_db.table_name SET ("storage_cooldown_time" = "2015-06-04 00:00:00")


Uses can also specify storage_cooldown_time for a partition via modify partition statement.


ALTER TABLE example_db.table_name MODIFY PARTITION ("storage_cooldown_time" = "2015-06-04 00:00:00")


Decision

Action


Result

Scheduling

specific implementation steps and approximate scheduling.

  • No labels