You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Work in progress

This site is in the process of being reviewed and updated.

h1. Principles

We have to build a multi-master replication system. That means the if we have N servers, the total number of connections between all those servers will be equals to : N x ( N - 1 ). If we have 4 servers, we will have 12 possibles connections. If we have 10 servers, we will have 90 possibles relationsetc. For 100 servers, this is 9 000  possibles relations !!! Even if this is a polynomial growth, the reality s totally different. We won't allow such a scheme, which is totally out of control. In real world, having only two servers seems quite common, for small to large organization. If more servers are needed - for scalibility or to build a failover system, then N to N communication network will not be implemented. We will generally select one or two "master" servers which will communicate with a few other servers, like in a chain.

To workaround this kind of problem, we could also define a "master-master", responsible of dispatching the modifications. All servers are connected to the Master-Master, and send to it all the modification (even if those modifications can be stored locally). Then the Master-Master replicate these modifications. If the Master-master is not responding, then another Master-Master is chosen. This does not solve the problem of synchronization between groups of server if a group is disconnected for some reason.

We also have to consider those points :

  • Servers are generally localised in different countries/subsidiaries and manage disjuncted set of data (local data)
  • Remote data can be managed through the use of Referrals, even if the performances are not as good as direct calls.
  • There are generally three differents concerns administrator want to address : failover, scalability, performance.
  • Data are not modified very often, and tree moving should almost *never* occur (modifyRDN)
  • Systems which need MM replications, like triplesec, suppose that the replication is done on the fly.

 Connection between servers

Between 2 servers 

We have two cases : the servers are connected or they are disconnected :

or

This is the simplest possible case, put aside a single server !

We will have to deal with two disconnected servers, and especially with retries when they get connected again.

The main considerations will be to replicate the data which are valid, as some of them may have been updated on both servers. 

Another scenario is when an entry is modified on both server and are replicated. Here is a schema which describes the different cases :

 
At  T1, the entry is sent to the other server, and if it's the very same entry, we will have to do something :

  • If both entries attributes are equals, just send an ack to the other server will be enough.
  • If at least one attribute is different, we will have to check the CSN of each entry and select the oldest.

Between 3 servers 

We will have4 differents cases :

  1. All servers are connected together
  2. One server is disconnected with one other
  3. One server is totally disconnected
  4. All servers are disconnected

Case 1 : All servers are connected 

In this scenario, the main problem is when we have an entry modified on more than one server, and before it has been replicated. We will have to decide which entry should be kept. We use the CSN for that
 

Inclusion into the Interceptors chain

Every modification done are sent to all connected replicas. This is done when the modifications are validated locally, and only if everything went right.

Interceptor chain description 

The following schema shows how the interceptor chain is working.

Here, a request  is injected in the topmost interceptor, which will handle it and if needed, do an associated action, before passing the request to the next interecptor, and so on until it reach the nexus, where this object is managed.
 
If we have an exception or an error condition in ant interceptor, then we go back to the chain either by throwing and exception, or simply by returning.
 
An interceptor can be a pass-through for an incoming request (doing nothing with it) 
 

Existing interceptors 

We have more than one chain of interceptors, to cover all the different operations 

Using an interceptor for Mitosis 

The idea is to use an interceptor when a request (add, mod, del, ...) is sent to the server. It should only send a message to other replicas only if everything went fine for all other interceptor. This will then be a passthrough interceptor for the incoming request, which will do all the work at the end.

Potential problems 

The main problem is that we should guarantee an ACID transaction , which means that we must set a lock on the object  until it has been managed by the nexus.

Add Entry operation

Suppose that we add an entry Ei in the serveur Si at time Ti, and that we have to propagate this change to server Sk.

This operation will have to deal with those followong cases :

  1. The server Sk is not responding
  2. The entry Ei already exist in serveur Sk
  3. The entry Ei does not exist in serveur Sk
  4. The entry Ei has already been created on another server Sn at Tx where Tx < Ti
  5. The entry Ei has been created on another server Sn at Tx where Tx > Ti

The third case is the most general case.

Case 1

If the remote  server is not responding, we should store the entry somwhere, and mark it as "pending for server K". We will have to build a mechanism to resend the entry when the server is up again.

We also have to take in consideration the fact that we are not dispatching the entry to a single remote server, but to a set of servers, and that more than one server can be unresponsive.

The best approach here would be to keep track of all the  remaining remote server within the entry : let's assume that for en entry Ei, we have to update N servers. We will store a list of remote server associated to this entry, and when a server send an Ack, then we will remove its name from the list. When the list is empty, we can consider that the entry has been replicated on all servers.

If one or more server cannot be contacted, this list will keep the non responsive server's name, and a thread will try regularly to send the data to those servers, until either they respond, or we reach a timeout. In this last case, we will have to mark the server as unavailable, and inform the administrator. When this server will be reconnected to the grid of Ldap servers, it will have to be synchronized with any server of the grid..

  • No labels