Status
Current state: Accepted
Discussion thread: https://lists.apache.org/thread.html/rdc6a06316882d24b224444b81cd0cc5debfe7020c57ed6dd52334f9a%40%3Cdev.lucene.apache.org%3E
JIRA:
-
SOLR-17065Getting issue details...
STATUS
Released: NA
Motivation
Large scale Solr installations often require cross data-center replication in order to achieve data replication for both, access latency reasons as well as disaster recovery.
In the past users have either designed their own solutions to deal with this or have tried to rely on the now-deprecated CDCR.
It would be really good to have support for cross data-center replication within Solr, that is offered and supported by the community. This would allow the effort around this shared problem to converge.
Public Interfaces
Briefly list any new interfaces that will be introduced as part of this proposal or any existing interfaces that will be removed or changed. The purpose of this section is to concisely call out the public contract that will come along with this feature.
The definition of a public interface is found on the main SIP page.
Proposed Changes
The XDC replication design intends to use an independent messaging layer in the middle. This layer would allow Solr to concentrate on what it does best, instead of having to also behave like a message queue.
Solr would provide an abstraction that would allow users to implement a solution that supports their messaging system of choice.
The Solr cluster that receives the updates first would have the extra responsibility of versioning the documents, and pushing them to a messaging queue of choice. Once done, the mirroring logic outside of Solr would be responsible for copying this data to a local topic/queue for other DR clusters. (Diagram Below)
Data Flow
Cross DC Consumer
- Read packet
- Extract and send to local SolrCloud
- Retry or
- Ignore
- If OK, move forward
- If Error
Update Flow
- Send request from the client
- Receive by a Solr instance and process by the update request processor(s)
- Version the document
- Index locally
- Forward the document with the version.
- Use the existing version, Index locally
- Is this the original recipient of the document? If yes:
- If this is not the original recipient
- Handle Delete requests
- Convert Delete by Query to Delete by IDs, with versions
Admin requests
Conditionally replicate the admin requests. Collection creation should be replicated so that a user can start sending updates without having to remember to manually create collections in all clusters. The original replication factor would be the same across the clusters.
All subsequent replica management requests e.g. ADDREPLICA or DELETEREPLICA should not be mirrored as these are used to handle query/update scaling in specific DCs.
Architecture
Cross-DC Consumer
Cross-DC consumer would be a standalone application that would consume data from a messaging queue, and write to Solr.
Reading from the messaging queue, and writing back to a retry queue would be abstracted out, allowing users to have custom implementations based on their choice of queue.
Sending updates to Solr would be implemented as part of the consumer, in a manner that would allow users to extend the application to handle custom requests.
Interface SourceQueue
- readPacket
- resubmit
Interface TargetStore
- process()
- Returns ENUM - success, failure, retry, resubmit
Admin Interface to skip/rewind the queue pointer
Solr Package
The Solr package would be deployed with the Solr instance. This would basically include an Update Request Processor which would handle the following:
- Deletes - both, by ID and by Query. This is still WIP in terms of how to handle this.
- Forward update requests to the messaging queue, if the request hasn’t been replicated to other DCs.
Deploying and enabling
Based on the current proposal, deploying the XDC solution would include deploying and configuring the consumer as well as the plugin.
The plugin should be standard and agnostic of the messaging queue and other replicated clusters.
This information can be provided by either an independent znode or a cluster property.
Open Questions
Update Failure
Consider the following situation:
- Document is sent by the client.
- DC1 versions the doc [doc1] version1
- Before writing to the message queue, the Solr instance dies.
- The document is now indexed locally, but does not exist anywhere else.
Proposed Solution: Require clients to ensure update response comes back positive. Retry if it doesn’t (conditionally).
Handling deletes
Security considerations
Compatibility, Deprecation, and Migration Plan
- What impact (if any) will there be on existing users?
- If we are changing behavior how will we phase out the older behavior?
- If we need special migration tools, describe them here.
- When will we remove the existing behavior?
Test Plan
Describe in few sentences how the SIP will be tested. We are mostly interested in system tests (since unit-tests are specific to implementation details). How will we know that the implementation works as expected? How will we know nothing broke?
Rejected Alternatives
If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other way.