Apache Solr Documentation

5.0 Ref Guide (PDF Download)
Solr Tutorial
Solr Community Wiki

Older Versions of this Guide (PDF)

5.1 Draft Ref Guide Topics

Meta-Documentation

This Unreleased Guide Will Cover Apache Solr 5.1

Skip to end of metadata
Go to start of metadata

Solr supports location data for use in spatial/geospatial searches. Using spatial search, you can:

  • Index points or other shapes
  • Filter search results by a bounding box or circle or by other shapes
  • Sort or boost scoring by distance between points, or relative area between rectangles
  • Index and search multi-value time or other numeric durations

There are three main field types available for spatial search:

  • LatLonType (and its non-geodetic twin PointType)
  • SpatialRecursivePrefixTreeFieldType (RPT for short)
  • BBoxField

RPT offers more features than LatLonType and fast filter performance, although LatLonType is more appropriate when efficient distance sorting/boosting is desired. They can both be used simultaneously for what each does best – LatLonType for sorting/boosting, RPT for filtering.  

BBoxField is for indexing bounding boxes, querying by a box, specifying a search predicate (Intersects,Within,Contains,Disjoint,Equals), and a relevancy sort/boost like overlapRatio or simply the area.

For more information on Solr spatial search, see http://wiki.apache.org/solr/SpatialSearch.

Indexing and Configuration

For indexing geodetic points (latitude and longitude), supply the pair of numbers as a string with a comma separating them in latitude then longitude order. For non-geodetic points, the order is x,y for PointType, and for RPT you must use a space instead of a comma, or use WKT.

See the section SpatialRecursivePrefixTreeFieldType below for RPT configuration specifics.

Spatial Filters

There are 2 types of Spatial filters, which both support the following parameters:

Parameter

Description

d

the radial distance, usually in kilometers. (RPT & BBoxField can set other units via the setting distanceUnits)

pt

the center point using the format "lat,lon" if latitude & longitude. Otherwise, "x,y" for PointType or "x y" for RPT field types.

sfield

a spatial indexed field

score

(Advanced option; RPT and BBoxField field types only) If the query is used in a scoring context (e.g. as the main query in q), this local-param determines what scores will be produced. Valid values are:

  • none - A fixed score of 1.0. (the default)
  • kilometers - distance in kilometers between the field value and the specified center point
  • miles - distance in miles between the field value and the specified center point
  • degrees - distance in degrees between the field value and the specified center point
  • distance - distance between the field value and the specified center point in the distanceUnits configured for this field
  • recipDistance - 1 / the distance

When used with BBoxField,additional options are supported:

  • overlapRatio - The relative overlap between the indexed shape & query shape.
  • area - haversine based area of the overlapping shapes expressed in terms of the distanceUnits configured for this field
  • area2D - cartesian coordinates based area of the overlapping shapes expressed in terms of the distanceUnits configured for this field
filter(Advanced option; RPT and BBoxField field types only) If you only want the query to score (with the above score local-param), not filter, then set this local-param to false.

geofilt

The geofilt filter allows you to retrieve results based on the geospatial distance (AKA the "great circle distance") from a given point. Another way of looking at it is that it creates a circular shape filter.  For example, to find all documents within five kilometers of a given lat/lon point, you could enter &q=*:*&fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5. This filter returns all results within a circle of the given radius around the initial point:

bbox

The bbox filter is very similar to geofilt except it uses the bounding box of the calculated circle. See the blue box in the diagram below. It takes the same parameters as geofilt. Here's a sample query: &q=*:*&fq={!bbox sfield=store}&pt=45.15,-93.85&d=5. The rectangular shape is faster to compute and so it's sometimes used as an alternative to geofilt when it's acceptable to return points outside of the radius. However, if the ideal goal is a circle but you want it to run faster, then instead consider using the RPT field and try a large "distErrPct" value like 0.1 (10% radius). This will return results outside the radius but it will do so somewhat uniformly around the shape.

Icon

When a bounding box includes a pole, the bounding box ends up being a "bounding bowl" (a spherical cap) that includes all values north of the lowest latitude of the circle if it touches the north pole (or south of the highest latitude if it touches the south pole).

Filtering by an arbitrary rectangle

Sometimes the spatial search requirement calls for finding everything in a rectangular area, such as the area covered by a map the user is looking at.  For this case, geofilt and bbox won't cut it.  This is somewhat of a trick, but you can use Solr's range query syntax for this by supplying the lower-left corner as the start of the range and the upper-right corner as the end of the range.  Here's an example:  &q=*:*&fq=store:[45,-94 TO 46,-93].  LatLonType does not support rectangles that cross the dateline, but RPT does.  If you are using RPT with non-geospatial coordinates (geo="false") then you must quote the points due to the space, e.g. "x y".

Optimization: Solr Post Filtering

Most likely, the fastest spatial filters will be to simply use the RPT field type.  However, sometimes it may be faster to use LatLonType with Solr post filtering in circumstances when both the spatial query isn't worth caching and there aren't many matching documents that match the non-spatial filters (e.g. keyword queries and other filters).  To use Solr post filtering with LatLonType, use the bbox or geofilt query parsers in a filter query but specify cache=false and cost=100 (or greater) as local-params. Here's a short example:

&q=...mykeywords...&fq=...someotherfilters...&fq={!geofilt cache=false cost=100}&sfield=store&pt=45.15,-93.85&d=5

Distance Function Queries

There are four distance function queries: geodist, see below, usually the most appropriate; dist, to calculate the p-norm distance between multi-dimensional vectors; hsin, to calculate the distance between two points on a sphere; and sqedist, to calculate the squared Euclidean distance between two points. For more information about these function queries, see the section on Function Queries.

geodist

geodist is a distance function that takes three optional parameters: (sfield,latitude,longitude). You can use the geodist function to sort results by distance or score return results.

For example, to sort your results by ascending distance, enter ...&q=*:*&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&sort=geodist asc.

To return the distance as the document score, enter ...&q={!func}geodist()&sfield=store&pt=45.15,-93.85&sort=score+asc.

More Examples

Here are a few more useful examples of what you can do with spatial search in Solr.

Use as a Sub-Query to Expand Search Results

Here we will query for results in Jacksonville, Florida, or within 50 kilometers of 45.15,-93.85 (near Buffalo, Minnesota):

&q=*:*&fq=(state:"FL" AND city:"Jacksonville") OR {!geofilt}&sfield=store&pt=45.15,-93.85&d=50&sort=geodist()+asc

Facet by Distance

To facet by distance, you can use the Frange query parser:

&q=*:*&sfield=store&pt=45.15,-93.85&facet.query={!frange l=0 u=5}geodist()&facet.query={!frange l=5.001 u=3000}geodist()

There are other ways to do it too, like using a {!geofilt} in each facet.query.

Boost Nearest Results

Using the DisMax or Extended DisMax, you can combine spatial search with the boost function to boost the nearest results:

&q.alt=*:*&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&bf=recip(geodist(),2,200,20)&sort=score desc

SpatialRecursivePrefixTreeFieldType (abbreviated as RPT)

This field type offers several functional improvements over LatLonType:

  • Query by polygons and other complex shapes, in addition to circles & rectangles
  • Multi-valued indexed fields
  • Ability to index non-point shapes (e.g. polygons) as well as point shapes
  • Rectangles with user-specified corners that can cross the dateline
  • Multi-value distance sort and score boosting (warning: non-optimized)
  • Well-Known-Text (WKT) shape syntax (required for specifying polygons & other complex shapes)

RPT incorporates the basic features of LatLonType and PointType, such as lat-lon bounding boxes and circles, in addition to supporting geofilt, bbox, geodist, and a range-queries.

Schema configuration

To use RPT, the field type must be registered and configured in schema.xml. There are many options for this field type.  

Setting

Description

name

The name of the field type.

class

This should be solr.SpatialRecursivePrefixTreeFieldType. But be aware that the Lucene spatial module includes some other so-called "spatial strategies" other than RPT, notably TermQueryPT*, BBox, PointVector*, and SerializedDV. Solr requires a field type to parallel these in order to use them. The asterisked ones have them.

spatialContextFactory

If polygons or linestrings are required, then JTS Topology Suite is a needed to implement them. It's a JAR file that you need to put on Solr's classpath (but not via the standard solrconfig.xml mechanisms). If you intend to use those shapes, set this attribute to com.spatial4j.core.context.jts.JtsSpatialContextFactory. Furthermore, the context factory has its own options which are directly configurable on the Solr field type here; follow the link to the Javadocs, and remember to look at the superclass's options in SpatialContextFactory as well. One option in particular you should most likely enable is autoIndex (i.e. use PreparedGeometry) as it's been shown to be a major performance boost for polygons. Further details about specifying polygons to index or query are at Solr's Wiki linked below.

geo

If true, the default, latitude and longitude coordinates will be used and the mathematical model will generally be a sphere. If false, the coordinates will be generic X & Y on a 2D plane using Euclidean/Cartesian geometry.

distanceUnits

This is used to specify the units for distance measurements used throughout the use of this field. This can be degrees, kilometers or miles. It is applied to nearly all distance measurements involving the field: maxDistErr, distErr, d, geodist and the score when score is distance, area, or area2d. However, it doesn't affect distances embedded in WKT strings, (eg: "BUFFER(POINT(200 10),0.2))"), which are still in degrees.

distanceUnits defaults to either "kilometers" if geo is "true", or "degress" if geo is "false".

distanceUnits replaces the units attribute; which is now deprecated and mutually exclusive with this attribute.

distErrPct

Defines the default precision of non-point shapes (both index & query), as a fraction between 0.0 (fully precise) to 0.5. The closer this number is to zero, the more accurate the shape will be. However, more precise indexed shapes use more disk space and take longer to index. Bigger distErrPct values will make queries faster but less accurate.

maxDistErr

Defines the highest level of detail required for indexed data. If left blank, the default is one meter – just a bit less than 0.000009 degrees. This setting is used internally to compute an appropriate maxLevels (see below).

worldBounds

Defines the valid numerical ranges for x and y, in the format of ENVELOPE(minX, maxX, maxY, minY). If geo="true", the standard lat-lon world boundaries are assumed. If geo=false, you should define your boundaries.

distCalculator

Defines the distance calculation algorithm. If geo=true, "haversine" is the default. If geo=false, "cartesian" will be the default. Other possible values are "lawOfCosines", "vincentySphere" and "cartesian^2".

prefixTree

Defines the spatial grid implementation. Since a PrefixTree (such as RecursivePrefixTree) maps the world as a grid, each grid cell is decomposed to another set of grid cells at the next level. If geo=true then the default prefix tree is "geohash", otherwise it's "quad". Geohash has 32 children at each level, quad has 4. Geohash can only be used for geo=true as it's strictly geospatial.

maxLevels

Sets the maximum grid depth for indexed data. Instead, it's usually more intuitive to compute an appropriate maxLevels by specifying maxDistErr .

Once the field type has been defined, use it to define a field that uses it.

Because RPT has more advanced features, some of which are new and experimental, please review the Solr Wiki at http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 for more information about using this field type.

BBoxField

The BBoxField field type indexes a single rectangle (bounding box) per document field and supports searching via a bounding box.  It supports most spatial search predicates, it has enhanced relevancy modes based on the overlap or area between the search rectangle and the indexed rectangle.  It's particularly useful for its relevancy modes.  To configure it in the schema, use a configuration like this:

BBoxField is actually based off of 4 instances of another field type referred to by numberType.  It also uses a boolean to flag a dateline cross.  Assuming you want to use the relevancy feature, docValues is required.  Some of the attributes are in common with the RPT field like geo, units, worldBounds, and spatialContextFactory because they share some of the same spatial infrastructure.

To index a box, add a field value to a bbox field that's a string in the WKT/CQL ENVELOPE syntax.  Example: ENVELOPE(-10, 20, 15, 10) which is minX, maxX, maxY, minY order.  The parameter ordering is unintuitive but that's what the spec calls for.

To search, you can use the {!bbox} query parser, or the range syntax e.g. [10,-10 TO 15,20], or the ENVELOPE syntax wrapped in parenthesis with a leading search predicate.  The latter is the only way to choose a predicate other than Intersects.  For example: 

&q={!field f=bbox}Contains(ENVELOPE(-10, 20, 15, 10))

Now to sort the results by one of the relevancy modes, use it like this:

&q={!field f=bbox score=overlapRatio}Intersects(ENVELOPE(-10, 20, 15, 10))

The score local-param can be one of overlapRatio, area, and area2D.  area scores by the document area using surface-of-a-sphere (assuming geo=true) math, area2D uses simple width * height.  overlapRatio computes a [0-1] ranged score based on how much overlap exists relative to the document's area and the query area.  The javadocs of BBoxOverlapRatioValueSource have more info on the formula, if you're really curious.  There is an additional parameter queryTargetProportion that allows you to weight the query side of the formula to the index (target) side of the formula.  You can also use &debug=results to see useful score computation info.

 

Labels
  • No labels
  1. SpatialRecursivePrefixTreeFieldType requires JTS. We should highlight this on the page and link to the page describing how to add jars to your Solr installation.

  2. No it doesn't.  It's only required if you go out of your way to configure the spatialContextFactory to the JTS one.  Read the docs above.

    1. Ah okay, a user was asking on IRC about errors thrown by the core on startup. I guess he copy/pasted the schema field from somewhere but didn't add the JTS jars. My bad.

  3. In "Spatial Filters" section, please add this parameter after "d", "sfield", "pt":

    score - The score parameter can be used with values "kilometer", "miles" or "degrees" for distances to be returned as the score in these units. With score as "distance", "area" or "area2d" (the latter two used together with bbox parser), the distances or the overlapped areas of the results are returned as the scores in the units specified in the 'distanceUnits' parameter of the spatial field.

     

    In schema configuration, please add:

    distanceUnits - This is used to specify the units for distance measurements used throughout the use of this field. This can be degrees, kilometers or miles. It is applied to nearly all distance measurements involving the field: maxDistErr, distErr, d, geodist and the score when score is 'distance', 'area', or 'area2d'. However, it doesn't affect distances embedded in WKT strings, like BUFFER(POINT(200 10),0.2)), which are still in degrees. If 'distanceUnits' is not specified, it is set to "kilometers" if geo id "true".

     

    This is deprecated, maybe doc'd as follows:

    units - This is deprecated and currently can only be "degrees". Spatial fields originating from Solr 4 (e.g. SpatialRecursivePrefixTreeFieldType, BBoxField) have the 'units' attribute deprecated, now replaced with 'distanceUnits'.  If you keep units as "degrees" then it should be backwards compatible but you'll get a deprecation warning on startup. Both "units" and 'distanceUnits' must not be specified at once.

    currently can only be "degrees"

    1. i followed up with ishan about this a bit on IRC, and took a stab at some of these edits with a little bit of tweaking – notably i beefed up the details about "score" and paired down the "deprecation" details for "units" (if you're already using it, all you need ot know is you can't also use distanceUnits, if you aren't already using it - good.)

      Very possible i missed something or screwed something up, so more sanity checking & copy editiing would be helpful.

       

      1. Thanks Hoss, it looks good. Few more changes, please:

        1. The example field type definition for RPT and BBox should use distanceUnits=kilometers instead of units=degrees:

        <fieldType name="location_rpt"   class="solr.SpatialRecursivePrefixTreeFieldType"
                       spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
                       autoIndex="true"
                       distErrPct="0.025"
                       maxDistErr="0.001"
                       distanceUnits="kilometers" />
        and
        <field name="bbox" type="bbox" />
        <fieldType name="bbox" class="solr.BBoxField"
                geo="true" distanceUnits="kilometers" numberType="_bbox_coord" />
        <fieldType name="_bbox_coord" class="solr.TrieDoubleField" precisionStep="8" docValues="true" stored="false"/>
        2. For the score param, "overlapRatio" could be added after area and area2d:

        "overlapRatio" - the ratio of overlap between the query box and result shapes.

        3. Another param "filter" got missed out, could be nice to document it before the "score" param, and after the "d" and "sfield":

        filter - Although designed to be filters, the filtering behavior can be avoided by setting this parameter to "false", for cases where only the scores are needed. In such cases, the "d" parameter has no effect. If not specified, this is "true".

        Then the change to score:
        score - Although designed to be filters, when When used in scoring contexts this parameter determines what scores will be produced. Valid values are:

         

        4. score param should be "area2D" instead of "area2d". Sorry for the oversight.

  4. Hi, in the documentation for 'prefixTree'  in the schema config table above, it says this:

    ==

    If geo=false then the default prefix tree is "geohash", otherwise it's "quad". Geohash has 32 children at each level, quad has 4. Geohash cannot be used for geo=false as it's strictly geospatial.

    ===

    Should geo=true when prefixTree=geohash?