Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

cluster-size N Wait for at least N initial members before completing cluster initialization and serving clients.

Use this option in a persistent cluster so all brokers in a persistent cluster can exchange the status of their persistent stores store and do consistency checks before serving clients.

...

Each store is an independent replica of the cluster's state. If a broker crashes while the rest of the cluster continuesthere are other brokers running, its store is marked "dirty" because it will be out-of-date with regard to the rest of the cluster.

...

If the entire cluster is shut down by an administrator using the qpid-cluster -k command, then all brokers will shut down at exactly the same point with the same state in their stores. In this case the stores are marked "clean".

If the cluster is reduced to a single broker, and that broker is shut down, its store is marked clean since it is the the only broker and therefore has the authoritative store.

When the cluster is restarted, brokers with clean stores will recover from their storesstore, brokers with dirty stores will get an update from a clean broker.

...

The cluster-id identifies the persistent cluster state. It remains the same if the cluster is shut down and restarted. It ensures no accidental mixing of stores belonging to different clusters.

...

If there is any mis-match in these IDs, all members of the cluster will log a message and exit.

Manual recovery

If every broker in the cluster crashes then they will all have dirty stores. Manual In the unlikely event that all brokers in a cluster crash so close together that its impossible to determine which was the last one to shut down, all there stores will be dirty.
In this case manual intervention is required to identify the "best" which store to recover from.

TODO: describe manual intervention: We provide a tool to examine each brokers data-directory, indicate which is most recent and mark it as a clean store so the cluster will use it to recovertwo parts. First identify which is the best store to start from. Second mark the store as clean by writing a UUID to the shudown ID in the data directory.

Design details

Persistent restart scenarios:

...

If the new member has a non-empty store, the cluster-id must match the cluster. The new member gets an update from the cluster.

Manual Recovery

If the entire cluster fails then manual recovery is required.

While running brokers will peridiocally (on every membership change and at some configured time interval) write a sequence number to disk.

Provide tools to examine broker data directories and determine if they belong to the same cluster (same cluster-id) and if so which is the latest based on the sequence number.

Recovery procedure is to mark the latest store as clean and restart the cluster.TDB: how to identify the best store?