This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Page tree
Skip to end of metadata
Go to start of metadata

Introduction

Katta integration with Solr allows Hadoop indexing into shards, which are replicated to N nodes/servers of a Solr cluster. This is useful for large Solr clusters that require failover, replication and the ability to provision shards dynamically. Katta uses Zookeeper to coordinate the creation and deployment of shards to Solr servers.

See http://issues.apache.org/jira/browse/SOLR-1395

See http://sourceforge.net/projects/katta/

See http://hadoop.apache.org/zookeeper

Features

  • Uses Hadoop RPC which is implemented with non-blocking (NIO) sockets underneath. This should scale better than the current HTTP approach when there are hundreds of nodes because HTTP can create unnecessary overhead.
  • All current distributed Solr requests function properly with no changes
  • Incremental indexing may be accomplished by creating new shards and deploying them into the Katta cluster. The alternative method is to update a shard deployed on a Solr server (using the Solr normal XML over HTTP interface). On commit, the newly updated shard would be uploaded back into the Katta cluster, and the old version of the shard removed.
  • Solr Katta has built in failover
  • No labels