Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread.html/r186364d4d22a6301887b54023cb3db48a5324f197590a3b3e95535fd%40%3Cdev.solr.apache.org%3E

JIRA: SOLR-15636 - Getting issue details... STATUS

Released: <Solr Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast). Confluence supports inline comments that can also be used.

Motivation

Many organizations are frustrated with Solr Cloud deployments due to the perceived cost of managing a separate, dedicated Apache ZooKeeper ensemble. We can ameliorate this complexity by running our own embedded Zookeeper ensemble, based on ZOOKEEPER-3874 - Getting issue details... STATUS and released with ZooKeeper 3.7

This ensemble should be launched automatically from Solr processes, and dynamically configure quorum information.

There is some overlap between the motivations of this SIP and SIP-5 Coordination Module + Apache Curator but the two approaches should be complimentary.

Public Interfaces

We will need to create APIs for retrieving quorum status from a Solr node. This may include determining if the node is part of serving a quorum, which quorum it is connected to, getting information about other quorum members (ports, addresses) for observers joining. We may also need APIs for instructing nodes to join or depart a particular quorum.

The full extent of the necessary APIs is not yet determined.

We will need to expose additional ports from Solr nodes for ZK functionality. This will likely include the ZK secureClientPort, and possibly the serverPort, electionPort, Admin port and others.

Proposed Changes

There are several phases to accomplishing what we would need to do.

Migrate Unit Tests to use ZooKeeperServerEmbedded (ZKSE)

Currently, our unit tests use a fragile construction for an embedded Zookeeper. In order to develop confidence towards an embedded ZooKeeper in production settings, we should ensure that our test framework is using the same APIs.

Migrate ZKRun implementation to use ZKSE

When we launch a Solr service in "cloud" mode without specifying a zookeeper host to connect to, it launches its own service on a separate port.

This is the simplest usage of an embedded zookeeper server that we currently have, it does not use quorums and has lifecycle tied to that of the parent Solr node.

Create an auto-clustering implementation for several ZKRun nodes

This approach may not be feasible for service discovery, but would be the ultimate goal of our efforts.

For example, we would start three Solr nodes each with ZKSE, and instruct all of the ZK servers to form a cluster. There may be ordering issues to resolve here, as well as concerns about service discovery for other Solr nodes.

Compatibility, Deprecation, and Migration Plan

Existing users will be able to continue to run Solr Cloud with an external ZooKeeper quorum.

Major Risks

Zookeeper services launched this way may be subject to Solr availability - if the server is exhausted from too many queries or bad queries then that may adversely impact the health of the whole cluster rather than causing isolated failure on given replicas. This should be mitigated by offering multiple ZK services in a quorum that can tolerate individual node failure, but may be enough motivation to use a larger default quorum size of 5 or 7 members instead of the minimal 3 node setup.

Security considerations

When running our own ZK services, the security of ZK becomes our responsibility instead of being something that we can delegate. The ZK Servers that we start should be secure by default using available authentication methods and practices.

Test Plan

[ TBD ]

Rejected Alternatives

Continue to launch embedded ZK process the same way that we do now. This is an unattractive proposal because we will be tied to ZK internals which are subject to change and not part of their public APIs.
SOLR-7099 - Getting issue details... STATUS bin/solr -cloud mode should launch a local ZK in its own process using zkcli's runzk option (instead of embedded in the first Solr process)
SOLR-7074 - Getting issue details... STATUS Simple script to start external Zookeeper
SOLR-6734 - Getting issue details... STATUS Standalone solr as *two* applications -- Solr and a controlling agent

Space shortcuts

Page tree

Status

Motivation

Public Interfaces

Proposed Changes

Migrate Unit Tests to use ZooKeeperServerEmbedded (ZKSE)

Migrate ZKRun implementation to use ZKSE

Create an auto-clustering implementation for several ZKRun nodes

Compatibility, Deprecation, and Migration Plan

Major Risks

Security considerations

Test Plan

Rejected Alternatives

2 Comments

Jan Høydahl

Eric Pugh

Space shortcuts

Page tree

SIP-14 Embedded Zookeeper

Status

Motivation

Public Interfaces

Proposed Changes

Migrate Unit Tests to use ZooKeeperServerEmbedded (ZKSE)

Migrate ZKRun implementation to use ZKSE

Create an auto-clustering implementation for several ZKRun nodes

Compatibility, Deprecation, and Migration Plan

Major Risks

Security considerations

Test Plan

Rejected Alternatives

2 Comments

Jan Høydahl

Eric Pugh