Apache Solr Documentation

5.3 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

5.4 Draft Ref Guide Topics

Meta-Documentation

This Unreleased Guide Will Cover Apache Solr 5.4

Skip to end of metadata
Go to start of metadata

Schemaless Mode is a set of Solr features that, when used together, allow users to rapidly construct an effective schema by simply indexing sample data, without having to manually edit the schema. These Solr features, all specified in solrconfig.xml, are:

  1. Managed schema: Schema modifications are made through Solr APIs rather than manual edits - see Managed Schema Definition in SolrConfig.
  2. Field value class guessing: Previously unseen fields are run through a cascading set of value-based parsers, which guess the Java class of field values - parsers for Boolean, Integer, Long, Float, Double, and Date are currently available.
  3. Automatic schema field addition, based on field value class(es): Previously unseen fields are added to the schema, based on field value Java classes, which are mapped to schema field types - see Solr Field Types.

Using the Schemaless Example

The three features of schemaless mode are pre-configured in the data_driven_schema_configs config set in the Solr distribution. To start an example instance of Solr using these configs, run the following command:

This will launch a Solr server, and automatically create a collection (named "gettingstarted") that contains only three fields in the initial schema: id, _version_, and _text_.

You can use the /schema/fields Schema API to confirm this: curl http://localhost:8983/solr/gettingstarted/schema/fields will output:

Because the data_driven_schema_configs config set includes a copyField directive that causes all content to be indexed in a predefined "catch-all" _text_ field, to enable single-field search that includes all fields' content, the index will be larger than it would be without the copyField. When you nail down your schema, consider removing the _text_ field and the corresponding copyField directive if you don't need it.

Configuring Schemaless Mode

As described above, there are three configuration elements that need to be in place to use Solr in schemaless mode. In the data_driven_schema_configs config set included with Solr these are already configured. If, however, you would like to implement schemaless on your own, you should make the following changes.

Enable Managed Schema

As described in the section Managed Schema Definition in SolrConfig, changing the schemaFactory will allow the schema to be modified by the Schema API. Your solrconfig.xml should have a section like the one below (and the ClassicIndexSchemaFactory should be commented out or removed).

Define an UpdateRequestProcessorChain

The UpdateRequestProcessorChain allows Solr to guess field types, and you can define the default field type classes to use. To start, you should define it as follows (see the javadoc links below for update processor factory documentation):

Javadocs for update processor factories mentioned above:

Make the UpdateRequestProcessorChain the Default for the UpdateRequestHandler

Once the UpdateRequestProcessorChain has been defined, you must instruct your UpdateRequestHandlers to use it when working with index updates (i.e., adding, removing, replacing documents). Here is an example using InitParams to set the defaults on all /update request handlers:

After each of these changes have been made, Solr should be restarted (or, you can reload the cores to load the new solrconfig.xml definitions).

Examples of Indexed Documents

Once the schemaless mode has been enabled (whether you configured it manually or are using data_driven_schema_configs ), documents that include fields that are not defined in your schema should be added to the index, and the new fields added to the schema.

For example, adding a CSV document will cause its fields that are not in the schema to be added, with fieldTypes based on values:

Output indicating success:

The fields now in the schema (output from curl http://localhost:8983/solr/gettingstarted/schema/fields ):

You Can Still Be Explicit

Even if you want to use schemaless mode for most fields, you can still use the Schema API to pre-emptively create some fields, with explicit types, before you index documents that use them.

Internally, the Schema REST API and the Schemaless Update Processors both use the same Managed Schema functionality.

Once a field has been added to the schema, its field type is fixed. As a consequence, adding documents with field value(s) that conflict with the previously guessed field type will fail. For example, after adding the above document, the "Sold" field has the fieldType tlongs, but the document below has a non-integral decimal value in this field:

This document will fail, as shown in this output:

  • No labels

8 Comments

  1. Formatting nit: The 3rd to last code block (showing fields that have been added to the schema) needs formatting to match the others. Just add ":borderStyle=solid|borderColor=#666666" after the word 'code' and before the closing curly bracket.

  2. Will it be useful to mention that you can use the REST API to add a schema field in case you don't want that particular field to be type guessed. Or is it implied?

    1. good note varun - it probably wasn't implied enough, but doesn't really need it's own sub-section - so i took advantage of the "note" box formatting to give it a quick shout out.

  3. The update chain shown here differs a bit from what's in Solr example schemaless directory.  It's not the end of the World, it's similar.  But in particular the order of the "*Update*" stages are different, which is a topic I still find a bit confusing.  Which version is "better", canonical, etc.?

    1. The update chain shown here is exactly the same as the one in solrconfig.xml in the example-schemaless conf dir in the 4.10 release (candidate) - I'm guessing you're comparing to a previous release?  The order was changed for 4.10: http://svn.apache.org/r1610743.

      1. OK thanks.

        So in 4.10 these are now first:

        <processor class="solr.LogUpdateProcessorFactory" />
        <processor class="solr.DistributedUpdateProcessorFactory" />

        And this is still last:

        <processor class="solr.RunUpdateProcessorFactory"/>