Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • There are FileReader and FileWriter in the query layer. There are corresponding implementations for HDFS, S3, Broker, and Local(see Fig. 1, Fig. 2).
  • In the storage layer, there is a BlockManager that abstracts Block, there are WriteableFileBlock, ReadableFileBlock(see Fig. 3).
  • For directory management work, there is an Env interface that can include directory operations, including RemoteEnv and PosixEnv, and there are also some link files and delete blocks in BlockManager; in addition, for S3, HDFS, there are operations such as S3StorageBackend that contain some file directories, including mkdir, copy , rm these operations

Image Added

Fig. 1 FileReader

Image Added

Fig. 2 FileWriter

Fig. 1Image Modified

Fig. 3 BlockManager

So many ways to operate will  cause the following problems:

...

some research related to the function, such as the advantages and disadvantages of the design, related considerations, etc.

Detailed Design

the detailed design of the function.

Scheduling

...

According to DSIP-010: Cooldown Data to S3 although cooldown policy is Table/Partition level, BE side data uploading is performed at the rowset level, so we need to assign each rowset its file system. When reading or writing to rowset, its FileSystem will create the appropriate FileReader or FileWriter to support IO operation on different storage media. Remote file systems can implement local cache on SSD at different granularities(not only Segment level), and rowsets are not aware of these local caches.

Image Added 

Each file system instance is bound to a root path. Each LocalFileSystem instance corresponds to a local disk(aka DataDir). Each S3FileSystem instance corresponds to a S3 Resource in FE, which contains endpoint, ak, sk, bucket, etc.

Scheduling

  1. Replace BlockManager with FileSystem, replace ReadableBlock with FileReader, replace WritableBlock with FileWriter, and remove dependency on Env. Move some StorageBackend interface(upload, download, ...) to RemoteFileSystem.

  2. Rewrite rowset IO path, use FileSystem, FileWriter, FileReader related interface.
  3. Replace all Env, FileUtils related calls with FileSystem interface.
  4. Unify FileReader and FileWriter in the query layer with filesystem layer FileReader, FileWriter

...

  1. .