Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

5. Single remote replica mode

Doc in chinese

View file
name冷热数据单一副本模式详细设计.docx
height250

Description

The current cold and hot data is divided into two parts. The hot data is stored on the local disk and the cold data is stored in the remote cluster. Due to the independence of segments between BEs, cold data has multiple copies on the remote cluster, resulting in data redundancy. To solve this problem, the single remote replica is proposed.

Description

Single copy, as the name suggests, means that the number of data copies stored on the remote cluster is one, that is, on top of the original file system copy management of the clusterA single remote replica, as the name implies, means that the number of data copies saved on the remote cluster is one, that is, no additional layer of copy management is added to the original file system copy management of the cluster, so as to greatly reduce the storage space occupiedgreatly reduce the occupied storage space.
In order to meet the cooperative collaborative management between different replicas, the FE copies, a single copy selects one of several replicas copies as the upload node uniformly by FE, while and the replica copy that is not selected will only check whether the data of the current segment data has been uploaded on the cluster, and synchronously delete the local data. The copy of the selected upload file will upload the data file normally, and upload the data of the current tablet to the cluster for other fragment checks.OrganizationChartImage Removedthe cluster, and delete the local one synchronously. data. The copy of the selected uploaded file uploads the data file normally, and uploads the current tablet data to the cluster for inspection by other shards.

OrganizationChart

Image Added

Main parameters

FE:

a) Tablet state
  cooldown_replica_id=-1 (persistent): The replica that can currently upload data, FE will send it to BE.
cooldown_term=-1 (persistent): The term of cooldown_replica_id, which is monotonically increasing, is used to prevent BE from receiving cooldown_replica_id out of order or receiving an outdated cooldown_replica_id from FE with split-brain. FE sends cooldown_replica_id to BE at the same time.
b) Replica state
cooldowned_version=-1 (non-persistent): Bring it when BE reports tablet info to FE. The cooldown_replica_id can be used for reference when re-selection is required (e.g. select the replica with the largest cooldowned_version).
cooldown_meta_id="" (non-persistent): bring it when BE reports tablet info to FE. It is used to judge that the upload progress of each replica has reached a consensus, so as to trigger the deletion task. (see Synchronization meta for the principle)

BE:

cooldown_replica_id=-1 (non-persistent)
cooldown_term=-1 (non-persistent)
cooldowned_version=-1 (non-persistent): The maximum version of the continuous rowset uploaded by the tablet, which can be calculated if the BE is restarted.
cooldown_meta_id="" (persistent): The uuid bound to the cold data meta, every upload of data/ColdDataCompaction will generate a new meta and generate a uuid, the meta data uploaded to s3 is {uuid, rowset_metas}. For replicas with the same cooldown_meta_id, the meta of the cold data part is exactly the same.

CooldownConfHandler

Flow Chart

CooldownHandler in FE

...