Apache Solr Documentation

5.0 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

5.1 Draft Ref Guide Topics

Meta-Documentation

This Unreleased Guide Will Cover Apache Solr 5.1

Skip to end of metadata
Go to start of metadata

SolrJ is an API that makes it easy for Java applications to talk to Solr. SolrJ hides a lot of the details of connecting to Solr and allows your application to interact with Solr with simple high-level methods.

The center of SolrJ is the org.apache.solr.client.solrj package, which contains just five main classes. Begin by creating a SolrClient, which represents the Solr instance you want to use. Then send SolrRequests or SolrQuerys and get back SolrResponses.

SolrClient is abstract, so to connect to a remote Solr instance, you'll actually create an instance of either HttpSolrClient, or CloudSolrClient .  Both communicate with Solr via HTTP, the difference is that HttpSolrClient is configured using an explicit Solr URL, while CloudSolrClient is configured using the zkHost String for a SolrCloud cluster

Single node Solr client
SolrCloud client

Once you have a SolrClient, you can use it by calling methods like query(), add(), and commit().

Building and Running SolrJ Applications

The SolrJ API is included with Solr, so you do not have to download or install anything else. However, in order to build and run applications that use SolrJ, you have to add some libraries to the classpath.

At build time, the examples presented with this section require solr-solrj-x.y.z.jar to be in the classpath.

At run time, the examples in this section require the libraries found in the 'dist/solrj-lib' directory.

The Ant script bundled with this sections' examples includes the libraries as appropriate when building and running.

You can sidestep a lot of the messing around with the JAR files by using Maven instead of Ant. All you will need to do to include SolrJ in your application is to put the following dependency in the project's pom.xml:

If you are worried about the SolrJ libraries expanding the size of your client application, you can use a code obfuscator like ProGuard to remove APIs that you are not using.

Setting XMLResponseParser

SolrJ uses a binary format, rather than XML, as its default format. Users of earlier Solr releases who wish to continue working with XML must explicitly set the parser to the XMLResponseParser, like so:

Performing Queries

Use query() to have Solr search for results. You have to pass a SolrQuery object that describes the query, and you will get back a QueryResponse (from the org.apache.solr.client.solrj.response package).

SolrQuery has methods that make it easy to add parameters to choose a request handler and send parameters to it. Here is a very simple example that uses the default request handler and sets the q parameter:

To choose a different request handler, for example, just set the qt parameter like this:

Once you have your SolrQuery set up, submit it with query():

The client makes a network connection and sends the query. Solr processes the query, and the response is sent and parsed into a QueryResponse.

The QueryResponse is a collection of documents that satisfy the query parameters. You can retrieve the documents directly with getResults() and you can call other methods to find out information about highlighting or facets.

Indexing Documents

Other operations are just as simple. To index (add) a document, all you need to do is create a SolrInputDocument and pass it along to the SolrClient's add() method.

Uploading Content in XML or Binary Formats

SolrJ lets you upload content in XML and binary formats instead of the default XML format. Use the following to upload using binary format, which is the same format SolrJ uses to fetch results.

Using the ConcurrentUpdateSolrClient

When implementing java applications that will be bulk loading a lot of documents at once, ConcurrentUpdateSolrClient is an alternative to consider instead of using HttpSolrClient. The ConcurrentUpdateSolrClient buffers all added documents and writes them into open HTTP connections. This class is thread safe. Although any SolrClient request can be made with this implementation, it is only recommended to use the ConcurrentUpdateSolrClient for /update requests.

EmbeddedSolrServer

The EmbeddedSolrServer class provides an implementation of the SolrClient client API talking directly to an micro-instance of Solr running directly in your Java application. This embedded approach is not recommended in most cases and fairly limited in the set of features it supports – in particular it can not be used with SolrCloud or Index Replication. EmbeddedSolrServer exists primarily to help facilitate testing.

For information on how to use EmbeddedSolrServer please review the SolrJ JUnit tests in the org.apache.solr.client.solrj.embedded package of the Solr source release.

Related Topics

Labels
  • No labels
  1. We can add an example for CloudSolrServer at this page.

  2. Typo (there is no such thing as ConcurrentHttpSolrServer) : "… The ConcurrentHttpSolrServer buffers all added documents ..." => " … The ConcurrentUpdateSolrServer buffers all added documents ..."

      1. What about concurrent/batch updates using SolrCloud ?
        Does the ConcurrentUpdateSolrServer connect via zokeeperHostString ? 

  3. Is there any notation on how to use SolrJ for querying a SolrCloud ?  Do I have to manually figure out which nodes are up before trying to connect ?  Or do I have to set up an external load-balancer that will check/verify availability of a node before the request is sent ?

  4. solrcloud+zookeeper集群下,一般建议什么情况下用CloudSolrServer?什么情况下用LBHttpSolrServer?

    1. questions about using solr/solrj should be sent to the solr-user@lucene mailing list.

      Commenting on these pages should be focused on questions/comments/suggestions on the specific wording of documentation.

      (allthough FYI: not a lot of people on the solr-user mailing list – or who read/response to documentation suggestions - read/understand chinese, so if you can translate your questions to english before sending them you are more likely to get helpful responses)

  5. Three of the links in the first paragraph (SolrClient,  HttpSolrClient, and CloudSolrClient) are dead, leading to 404 errors.  

    1. Looks like the javadoc base URL hasn't been updated to 5.0.0 yet, they're still pointing to 4.10.0.  Those particular classes do not exist in 4.10, they will be new with 5.0 when it is released.

      Thanks for the heads up.  Because 5.0 javadocs don't exist yet, I don't think we want to update the javadoc base URL yet ... doing so will break all javadoc URLs in the entire guide, which would be a lot worse than having a few links like these not work.

       

  6. I think the HttpSolrClient sample code should use /solr/collection1 for the URL path, or maybe /solr/techproducts to reflect a core name used with the new bin/solr command.

    As I understand it, the legacy solr.xml format goes away in 5.0, taking defaultCoreName with it, and making the default core name hardcoded to collection1.  If my understanding is correct, any core name besides collection1 must be specified when creating the HttpSolrClient.

    1. i updated the URLs to include techproducts to be consistent with how the rest of the usage is already written.

      it's important to note though that that's just one way to use HttpSolrClient, where the client is tied to a specific collection.  Alternatively you can leave the HttpSolrClient init URL pointed t the root URL of your Solr instance, and use SolrRequest.setPath("/techproducts/myhandlername"); to control which collection (and handler) the request goes to ... but since we didn't already have any examples of that sort of thing, i left it alone

  7. The request handler path explanation uses some super-legacy method (setting 'qt' by name).

    It might be better to use SolrQuery.setRequestHandler() in the example if possible. I can see that behind the scenes it also temporarily uses qt parameter as storage. However, being explicit about this is quite confusing, especially when combined with a long legacy explanation in solrconfig.xml and the fact that the legacy setup is actually turned off.