Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • There are FileReader and FileWriter in the query layer. There are corresponding implementations for HDFS, S3, Broker, and Local(see Fig. 1, Fig. 2).
  • In the storage layer, there is a BlockManager that abstracts Block, there are WriteableFileBlock, ReadableFileBlock(see Fig. 3).
  • For directory management work, there is an Env interface that can include directory operations, including RemoteEnv and PosixEnv, and there are also some link files and delete blocks in BlockManager; in addition, for S3, HDFS, there are operations such as S3StorageBackend that contain some file directories, including mkdir, copy , rm these operations

Image Added

Fig. 1 FileReader

Image Added

Fig. 2 FileWriter

Fig. 1Image Modified

Fig. 3 BlockManager

So many ways to operate will  cause the following problems:

...

some research related to the function, such as the advantages and disadvantages of the design, related considerations, etc.

Detailed Design

...

According to DSIP-010: Cooldown Data to S3 although cooldown policy is Table/Partition level, BE side data uploading is performed at the rowset level, so we need to assign each rowset its file system. When reading or writing to rowset, its FileSystem will create the appropriate FileReader or FileWriter to support IO operation on different storage media. Remote file systems can implement local cache on SSD at different granularities(not only Segment level), and rowsets are not aware of these local caches.

Image Added 


Scheduling


Ff we change the IO interface directly, it will impact lots of place. I will divide it into two steps:

1. Rewrite the IO stack in totally new files, and leave current implements along, for easy reviewing.
2. Use the new IO stack to replace current calls.