Schemaless Mode is a set of Solr features that, when used together, allow users to rapidly construct an effective schema by simply indexing sample data, without having to manually edit the schema. These Solr features, all controlled via
- Managed schema: Schema modifications are made at runtime through Solr APIs, which requires the use of
schemaFactorythat supports these changes - see Schema Factory Definition in SolrConfig for more details.
- Field value class guessing: Previously unseen fields are run through a cascading set of value-based parsers, which guess the Java class of field values - parsers for Boolean, Integer, Long, Float, Double, and Date are currently available.
- Automatic schema field addition, based on field value class(es): Previously unseen fields are added to the schema, based on field value Java classes, which are mapped to schema field types - see Solr Field Types.
Using the Schemaless Example
The three features of schemaless mode are pre-configured in the
data_driven_schema_configs config set in the Solr distribution. To start an example instance of Solr using these configs, run the following command:
This will launch a Solr server, and automatically create a collection (named "
gettingstarted") that contains only three fields in the initial schema:
You can use the
/schema/fields Schema API to confirm this:
curl will output:
data_driven_schema_configs config set includes a
copyField directive that causes all content to be indexed in a predefined "catch-all"
_text_ field, to enable single-field search that includes all fields' content, the index will be larger than it would be without the
copyField. When you nail down your schema, consider removing the
_text_ field and the corresponding
copyField directive if you don't need it.
Configuring Schemaless Mode
As described above, there are three configuration elements that need to be in place to use Solr in schemaless mode. In the
data_driven_schema_configs config set included with Solr these are already configured. If, however, you would like to implement schemaless on your own, you should make the following changes.
Enable Managed Schema
As described in the section Schema Factory Definition in SolrConfig, Managed Schema support is enabled by default, unless your configuration specifies that
ClassicIndexSchemaFactory should be used.
You can configure the
ManagedIndexSchemaFactory (and control the resource file used, or disable future modifications) by adding an explicit
<schemaFactory/> like the one below, please see Schema Factory Definition in SolrConfig for more details on the options available.
Define an UpdateRequestProcessorChain
The UpdateRequestProcessorChain allows Solr to guess field types, and you can define the default field type classes to use. To start, you should define it as follows (see the javadoc links below for update processor factory documentation):
Javadocs for update processor factories mentioned above:
Make the UpdateRequestProcessorChain the Default for the UpdateRequestHandler
Once the UpdateRequestProcessorChain has been defined, you must instruct your UpdateRequestHandlers to use it when working with index updates (i.e., adding, removing, replacing documents). Here is an example using InitParams to set the defaults on all
/update request handlers:
After each of these changes have been made, Solr should be restarted (or, you can reload the cores to load the new
Examples of Indexed Documents
Once the schemaless mode has been enabled (whether you configured it manually or are using
For example, adding a CSV document will cause its fields that are not in the schema to be added, with fieldTypes based on values:
Output indicating success:
The fields now in the schema (output from
You Can Still Be Explicit
Even if you want to use schemaless mode for most fields, you can still use the Schema API to pre-emptively create some fields, with explicit types, before you index documents that use them.
Internally, the Schema API and the Schemaless Update Processors both use the same Managed Schema functionality.
Once a field has been added to the schema, its field type is fixed. As a consequence, adding documents with field value(s) that conflict with the previously guessed field type will fail. For example, after adding the above document, the "
Sold" field has the fieldType
tlongs, but the document below has a non-integral decimal value in this field:
This document will fail, as shown in this output: