Apache Solr Documentation

6.5 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

Ref Guide Topics

Meta-Documentation

*** As of June 2017, the latest Solr Ref Guide is located at https://lucene.apache.org/solr/guide ***

Please note comments on these pages have now been disabled for all users.

Skip to end of metadata
Go to start of metadata

This page serves as a place to organize thoughts and collect lists of things to try and fix / clean up.

If you are in the process of looking into something, or imminently plan to start looking into something – please put your name under it. If you complete something on this list, please delete it.

    • SOLR-8593: Parallel SQL now uses Apache Calcite as its SQL framework. As part of this change the default aggregation mode has been changed to facet rather than map_reduce. There has also been changes to the SQL aggregate response and some SQL syntax changes. Consult the documentation for full details.
      • The Parallel SQL Interface page needs to be updated for these changes. 
      • Pinged Kevin Risden, Joel, Dat for info on this.
    • SOLR-10087: StreamHandler now supports registering custom streaming expressions from the blob store (Kevin Risden)
      • Pinged Kevin Risden for more info on this - CT
    • SOLR-8593: Integrate Apache Calcite into the SQLHandler (Kevin Risden, Cao Manh Dat, Joel Bernstein)
      • Pinged Kevin Risden, Joel, Dat for info on this - CT
    • SOLR-10171: Add Constant Reduction Rules to Calcite Planner (Kevin Risden)
  • SOLR-9836: Add ability to recover from leader when index corruption is detected on SolrCore creation. (Mike Drob via Mark Miller)
    • IMO, this doesn't need docs, but not sure - CT
    • Hoss: Skimming the git diff, i note this comment:
      "Various recovery strategies can be specified via system properties "-DCoreInitFailedAction={fromleader, none}"
      ...so it definitely seems like there are some end user options (not clear how many yet) added here that should be documented somewhere, but I'm not really sure where.
    • Hoss: ok, i audited the changes more throughly, this one sys property mentioned above is the only new user facing option/choice: By default a core init failure does nothing, but if you set -DCoreInitFailedAction=fromleader on startup AND if the type of failure is specifically corrupt index AND if you're in cloud mode THEN it will attempt to recover from the leader.   But where/how the hell do we document such a niche option?
  • SOLR-8542 Integrate Learning to Rank into Solr - write-up done as-far-as-intended, feedback welcome.
  • Backup/Restore
    • SOLR-9055: Make collection backup/restore extensible. (Hrishikesh Gadre, Varun Thacker, Mark Miller)
    • SOLR-9038: Add a command-line tool to manage the snapshots functionality (Hrishikesh Gadre via yonik)
  • Auth & security
    • SOLR-9324: Support Secure Impersonation / Proxy User for solr authentication (Gregory Chanan, Hrishikesh Gadre via yonik)
  • Streaming Expressions
    • SOLR-9077: Streaming expressions should support collection alias (Kevin Risden)
    • SOLR-9944: Map the nodes function name to the GatherNodesStream (Joel Bernstein)
      • Joel Bernstein: this requires changing everything that now refers to gatherNodes to nodes?
  • Miscellaneous pages
    • SOLR-9884: Add version to segments handler output (Steven Bower via Erick Erickson)
    • SOLR-9886: Add a 'enable' flag to caches to enable/disable caches (Pushkar Raste, noble)
    • SOLR-9918: Add SkipExistingDocumentsProcessor that skips duplicate inserts and ignores updates to missing docs (Tim Owen via koji)

  • SOLR-9901: Implement move in HdfsDirectoryFactory. (Mark Miller)
  • schema.xml
    • since none of the examples that come with solr have a schema.xml anymore, and haven't for a while, we should probably audit any mentions of "schema.xml" in the guide and update as appropriate (in many places just refering to "the Schema" might be fine)
      • I audited every mention of schema.xml in the ref guide and changed anything where it made sense to generalize things to something like "the Schema" as well as tweaking up a few places where it made sense to really call out the specific behavior of Managed Schema (ie: how schema file names are interpreted in Defining core.properties and CoreAdmin API) - Hoss
    • There are still a TON of refrences to schema.xml that i did not touch however, because they are on pages side by side with explicit & specific examples of schema.xml XML declarations (ie: how to define a field, or field type) ... we need to hollistically decide what we want to do with this – replace them all with sample Schema API curl commands?
      • CT: I think replacing them with sample Schema API commands is the right thing to do, but it feels to me it would be better to do that when the new API structure is released (seems likely for 6.1). Otherwise, we'll have to touch all those places in back-to-back releases. If someone has the energy to do it all now, though, that would be fine with me.
  • SOLR-7893 - Document ZooKeeper SSL support Reopened : Once ZooKeeper is upgraded to 3.5.1 or newer
    • To setup SSL encryption, please follow the instructions in the ZooKeeper SSL User Guide. Note that the relevant settings can be added as SOLR_OPTS variables in  solr.in.sh or  solr.in.cmd

  • SOLR-7742: Support for Immutable ConfigSets (Gregory Chanan)
  • SOLR-7182: Make the Schema-API a first class citizen of SolrJ. The new SchemaRequest and its inner classes can be used to make requests to the Schema API. (Sven Windisch, Marius Grama via shalin)
  • Document JSON Facet API, SOLR-7214 (and SOLR-7306)
    • yonik's draft page: Faceted Search
    • SOLR-7800: JSON Facet API: the avg() facet function now skips missing values rather than treating them as a 0 value. The def() function can be used to treat missing values as 0 if that is desired. Example: facet:{ mean:"avg(def(myfield,0))" }

    • SOLR-7676: Faceting on nested objects / Block-join faceting with the new JSON Facet API. Example: Assuming books with nested pages and an input domain of pages, the following will switch the domain to books before faceting on the author field:authors:{ type:terms, field:author, domain:{toParent:"type:book"} } (yonik)
    • SOLR-8230: JSON Facet API: add "facet-info" into debug section of response when debugQuery=true (Michael Sun, yonik)
    • SOLR-8466: adding facet.method=uif to bring back UnInvertedField faceting which is used to work on facet.method=fc. It's more performant for rarely changing indexes. Note: it ignores prefix and contains yet. (Jamie Johnson via Mikhail Khludnev) MK> I prefer to keep it undocumented.
    • SOLR-8312: Add domain size and numBuckets to facet telemetry info (facet debug info for the new Facet Module). (Michael Sun, yonik)
  • Some processor factories on Update Request Processors could/should probably link out to other pages in the guide on the relevant topics (ie: the more cross linking we have in the ref guide, the more likeley folks are to find hte details they are looking for)
  • we should re-org & re-think about how we describe the various REST APIs - would be good to get cassandra's take on this
    • we should consider have a top level section on "REST APIs"
    • Managed Resources currently lives under Managing Solr - but that's just because i couldn't think of a better place for it
    • Schema API currently lives under the schema section, but it thematically it's got very similar usecases to Managed Resources (and it's only going to grow closer if/when we start supporting CREATE on field type).
  • Independent of organization,  Managed Resources could use some love in terms of it's "editorial voice"
  • Consider splitting this subsection out into it's own page (similar to what we did for atomic updates): Uploading Data with Index Handlers#Nested Child Documents
  • "Search & Analytics APIs"
    • Comments from joel:
      • We should think of the best way to document SOLR-6150 and SOLR-5973. We could add them to the plugin page, but I prefer to call them "Search & Analytics APIs".  There could be three sub-pages covering these two tickets: RankQuery API , AnalyticsQuery API , MergeStrategy API.  Any thoughts? I'd also be happy to add another subpage called: PostFilter API as well.
      • Joel: Docs are ready for this under Trunk changes. Just needs to find a home in the docs somewhere
    • Comments from Hoss: i'm not convinced these are really appropriate for the ref guide:
      • most of the content on these pages seems like it should instead just be made part of the javadocs of the respective classes
      • they are extremeley low level, and get into a lot of detail about how users can write plugins to do things w/o taking about any out of the box functinality
        • we don't have any other pages like that in the ref guide that i know of, at most we cover general topics with a discusion of the specific implementations in solr and then have single sentences like "Custom entity processors can be written to extend or replace the ones supplied" or "The query parser plugins are all subclasses of QParserPlugin. If you have custom parsing needs, you may want to extend that class to create your own query parser" with links to the javadocs where they can learn more about hte API they need to implement
  • The list of Solr clients on Client API Lineup should be updated
  • Spatial Search needs more how/when-to-use info about the various properties of the bbox fieldType.
  • With the default changed to use managed-schema in all the configsets as of 5.5, the Ref Guide could use a review in regard to specifying schema.xml as a file that needs to be edited. Suggestion is to remove specific references to the file and use "your schema" instead, where appropriate. Also need to pay attention to specific instructions for editing elements of the file and if there is need/desire/room to add info on how to accomplish the same results with the Schema API.

Issues from CHANGES.txt that were never doc'ed as part of their release:

  • Streaming Expressions
    • SOLR-9009: Adds ability to get an Explanation of a Streaming Expression (Dennis Gove)

    • SOLR-9103: Restore ability for users to add custom Streaming Expressions (Cao Manh Dat)
  • SOLR-2212: Add a factory class corresponding to Lucene's NoMergePolicy. (Lance Norskog, Cao Manh Dat via shalin)
  • Faceting

    • SOLR-8988: Adds query option facet.distrib.mco which when set to true allows the use of facet.mincount=1 in cloud mode. (Keith Laban, Dennis Gove)

  • Collections API

    • SOLR-7117: Provide an option to limit the maximum number of cores that can be created on a node by the Auto Add Replica feature. For this you can set a "maxCoresPerNode" property via the Cluster Property API (Varun Thacker, Mark Miller)

      • Available properties aren't listed the docs, but probably should be.
  • Faceted Search

    • SOLR-9026: Extend facet telemetry support to legacy (non-json) facets under "debug/facet-debug" in the response. (Michael Sun, yonik)

  • On the Result Clustering page, the Chinese text in the Quick Start XML code block vanishes in the exported PDF.
  • Post Tool
    • SOLR-7546: bin/post (and SimplePostTool in -Dauto=yes mode) now sends rather than skips files without a known content type, as "application/octet-stream", provided it still is in the allowed filetypes setting. (ehatcher)
  • Running Solr on HDFS
    • SOLR-7437: Make HDFS transaction log replication factor configurable. (Mark Miller)
    • SOLR-6766: Expose HdfsDirectoryFactory Block Cache statistics via JMX. (Mike Drob, Mark Miller)
  • New "files" Example
  • JSON Request API
    •   SOLR-7422: Optional flatter form for the JSON Facet API via a "type" parameter (yonik)
    •   SOLR-7417: JSON Facet API - unique() is now implemented for numeric and date fields. (yonik)
    •   SOLR-7473: Facet Module (Json Facet API) range faceting now supports the "mincount" parameter in range facets to supress buckets less than that count. (yonik)
    •   SOLR-7477: Multi-select faceting support for the Facet Module via the "excludeTags" parameter which disregards any matching tagged filters for that facet. (yonik)
    •   SOLR-7522: Facet Module - Implement field/terms faceting over single-valued numeric fields. (yonik)
    •   SOLR-7553: Facet Analytics Module: new "hll" function that uses HyperLogLog to calculate distributed cardinality.  (yonik)
    •   SOLR-7443: Implemented range faceting over date fields in the new facet module (JSON Facet API).  (yonik)
  • Velocity Search UI
    • SOLR-1723: VelocityResponseWriter improvements (Erik Hatcher)
    • SOLR-2035: Add a VelocityResponseWriter $resource tool for locale-specific string lookups. (Erik Hatcher)
  • Uploading Structured Data Store Data with the Data Import Handler (DIH)
    • SOLR-6258 / SOLR-6269: Added onRollback event handler hook to Data Import Handler (DIH). 
      (ehatcher)
      • the other DIH event handlers (onImportStart and onImportEnd) have not yet been documented so there isn't yet a clean spot to put onError
    • SOLR-6263: Add DIH handler name to variable resolver as ${dih.handlerName}. 
      (ehatcher)
      • no current documentation of variable resolvers, so need to build a section for this whole feature
  • Suggester

    • SOLR-7888: Analyzing suggesters can now filter suggestions by a context field (Arcadius Ahouansou, janhoy)
  • JSON Facet API
    • SOLR-8217: JSON Facet API: add "method" param to terms/field facets to give an execution hint for what method should be used to facet.  (yonik)
    • SOLR-8155: JSON Facet API - field faceting on a multi-valued string field without docValues (i.e. UnInvertedField implementation), but with a prefix or with a sort other than count, resulted in incorrect results. This has been fixed, and facet.prefix support for facet.method=uif has been enabled. (Mikhail Khludnev, yonik)
    • SOLR-8835: JSON Facet API: fix faceting exception on multi-valued numeric fields that have docValues. (yonik)
  • Solr CELL
    • SOLR-8166: Introduce possibility to configure ParseContext in ExtractingRequestHandler/ExtractingDocumentLoader (Andriy Binetsky via Uwe Schindler)
  • Analysis

    • LUCENE-6747: FingerprintFilter is a TokenFilter that outputs a single token which is a concatenation of the sorted and de-duplicated set of input tokens. Useful for normalizing short text in clustering/linking  tasks. (Mark Harwood, Adrien Grand)
  • SOLR-7217: HTTP POST body is auto-detected when the client is curl and the content type is form data (curl's default), allowing users to use curl to send JSON or XML without having to specify the content type. (yonik)
    • Hoss: Since quite a bit of diuscussion was forming about this, I've moved the existing comments into SOLR-7405 ... we should hash out if/where the guide should be updated for this in that jira.
  • SOLR-9090: Add directUpdatesToLeadersOnly flag to solrj CloudSolrClient. (Marvin Justice, Christine Poerschke)
  • SOLR-9038: Solr core snapshots: The current commit can be snapshotted which retains the commit and associates it with a name. The core admin API can create snapshots, list them, and delete them. Snapshot names can be referenced in doing a core backup, and in replication. Snapshot metadata is stored in a new snapshot_metadata/ dir. (Hrishikesh Gadre via David Smiley)
    • See note for SOLR-9326 below.
  • SOLR-9326 : Ability to create/delete/list snapshots at collection level. (Hrishikesh Gadre via yonik)
    • Per Hrishikesh in SOLR-9326, he wants to finish the other parts of this feature before providing dev notes for documentation. There are a couple more issues pending before this is done.
  • SOLR-9610: New AssertTool in SolrCLI for easier cross platform assertions from command line (janhoy)
    • Going to defer this to a later release. I'm not sure yet why someone would use this, or when.
  • SOLR-7216: Solr JSON Request API:
    • - HTTP search requests can have a JSON body.
    • - JSON request can also be passed via the "json" parameter.
    • - Smart merging of multiple JSON parameters: ruery parameters starting with "json." will be merged into the JSON request.
    • - Legacy query parameters can also be passed in the "params" block of the JSON request.
  • SOLR-7212: Parameter substitution / macro expansion across entire request. Substitution can contain further expansions and default values are supported. Example: q=price:[ ${low:0} TO ${high} ]&low=100&high=200 (yonik)
  • Doc SOLR-6125 (from 4.9) unless the syntax is changed by SOLR-6195

 

Sections that need an overhaul generally

(from a personal list kept by Cassandra Targett of pages that need to be more clear, a lot more information, better formatting, and/or to be reviewed for consistency with the rest of the guide)

 

PDF Format Fixes

  • on Upgrading Solr there's a line break in the middle of a word: "for the details of all chan<break>ges" that i can't make sense of.
  • Links to anchors on other pages work fine in wiki view, and work fine in html export, but cause links in the PDF export to go the corresponding  https://cwiki... url
  • The section links from the TOC all take you to the previous page, rather than to the top of the page where the section starts. (Same behavior on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on Adobe Reader.) This looks like a general problem - see e.g. Pg 99&100: Under solr.HTMLStripCharFilterFactory, the links labeled "Major Changes from Solr 3 to Solr 4." go one page previous to the start of this section in the guide.
    • sarowe, ctargett, hoss, steffkes looked into it but couldn't figure out a good fix (shelved)
  • Widow control for tables? See p. 47-48, where the table header is on one page and the content on the other. Not sure if it's possible, though.
    • CT: there is widow/orphan options in CSS and I added it to the CSS for the PDF export (on 3 Apr 2015), but it's not clear if this is being used. If we find some specific examples we can revisit this now that we know there are rules to govern the behavior.
  • Solve the "word break problem", where we have to define 'word-break: break-all' so text doesn't print off the page.
  • Check monospace text highlighting & fix places it doesn't occur
  • Code box titles take up too much vertical space, and they look like they're crowding the content below them.
  • On pg. 184 (in the section "Language Analysis"), in the description of the "Serbian Normalization Filter", the input and the "tokenizer to filter" texts are missing many characters.  This may be the same old problem with display of non-Latin-1 characters.  Other language analysis examples should be audited for the same problem.
    • CT: yuck. This problem existed in the 5.0 version of the guide also, so is unrelated to the changes I made with the CSS.

 

  • No labels

9 Comments

  1. I really should get to SOLR-5926 as well, ComplexPhraseQueries have been in Solr for a while.

  2. I added the intial docs for SOLR-5244 to a new page under Searching call "Exporting Search Results"

  3. SOLR-6585 is an API change , do we need documentation in ref guide?

      1. I don't see how it fits there . It is something a developer of a requesthandler can take advantage of . So , it should go in probably into the javadocs only

        1. it drastictly impacts how users are affected by a requestHandler with a path based name, (particularly since EVERY request handler shipped with Solr now implements this new interface automatically by subclassing RequestHandlerBase) and absolutely needed to to be called out in the docs – but since you didn't seem to understand the significance of your change, i went a head and added the bare minimum doc for you: 

           

          https://cwiki.apache.org/confluence/pages/diffpages.action?pageId=32604291&originalId=50860580

          1. Yeah , but those handlers always return null. And that makes them behave exactly the same way they used to . But it is helpful to know that it is possible to handle sub paths 

  4. PathHierarchyTokenizerFactory supports 'reverse' and 'skip' options, not sure for how long.

  5. The JDBC driver is briefly touched on in the Parallel SQL page. But there is more documentation attached to JIRA tickets as PDF's. We can decide if we have time get it included in the initial 6.0 release.