This page is meant as a template for writing a DSIP.

Status

Current state: [One of "Under Discussion", "Accepted", "Rejected"]

Discussion thread: https://lists.apache.org/thread/wrwytwyod1mm7b5pn5zc64d3boy613k1

JIRA or Github Issue:

Released: <Doris Version>

Google Doc: <If the design in question is unclear or needs to be discussed and reviewed, a Google Doc can be used first to facilitate comments from others.>

Motivation

Currently, there are various interfaces for file IO operations in Doris:

There are FileReader and FileWriter in the query layer. There are corresponding implementations for HDFS, S3, Broker, and Local(see Fig. 1, Fig. 2).
In the storage layer, there is a BlockManager that abstracts Block, there are WriteableFileBlock, ReadableFileBlock(see Fig. 3).
For directory management work, there is an Env interface that can include directory operations, including RemoteEnv and PosixEnv, and there are also some link files and delete blocks in BlockManager; in addition, for S3, HDFS, there are operations such as S3StorageBackend that contain some file directories, including mkdir, copy , rm these operations

Fig. 1 FileReader

Fig. 2 FileWriter

Fig. 1

Fig. 3 BlockManager

So many ways to operate will cause the following problems:

It's messy, sometimes I don't know which one to use, many functions are repeated, but they have different abstract names;
Modifying a feature or fix a bug needs to be modified in multiple places. For example, if we want to read S3 and have a local cache, then all places need to be added;

We need to unify the IO stack to make it more clear and extensible. In fact, access to IO can be roughly divided into the following three types:

Directory operations, create files, delete files, get file list, etc.
File write operation
File read operation

And we could implement these API for different storage backends:

Local file
S3 file
HDFS file
Broker

Once implemented, it can be used in the storage layer (separation of hot and cold, separation of storage and computing), query layer (query S3, query HDFS), backup and recovery, etc.

When a new kind of file system is introduced, we only need to implement a new derived class for it and no need to modify any other interface in upper layer.

Related Research

some research related to the function, such as the advantages and disadvantages of the design, related considerations, etc.

Detailed Design

According to DSIP-010: Cooldown Data to S3, although cooldown policy is Table/Partition level, BE side data uploading is performed at the rowset level, so we need to assign each rowset its file system. When reading or writing to rowset, its FileSystem will create the appropriate FileReader or FileWriter to support IO operation on different storage media. Remote file systems can implement local cache on SSD at different granularities(not only Segment level), and rowsets are not aware of these local caches.

Each file system instance is bound to a root path. Each LocalFileSystem instance corresponds to a local disk(aka DataDir). Each S3FileSystem instance corresponds to a S3 Resource in FE, which contains endpoint, ak, sk, bucket, etc.

Scheduling

Replace BlockManager with FileSystem, replace ReadableBlock with FileReader, replace WritableBlock with FileWriter, and remove dependency on Env. Move some StorageBackend interface(upload, download, ...) to RemoteFileSystem.
Rewrite rowset IO path, use FileSystem, FileWriter, FileReader related interface.
Replace all Env, FileUtils related calls with FileSystem interface.
Unify FileReader and FileWriter in the query layer with filesystem layer FileReader, FileWriter.

Page tree

DSIP-006: Refactor IO stack

Status

Motivation

Related Research

Detailed Design

Scheduling