...
Lucene has four underlying types that a docvalues field can have. Currently Solr uses three of these:
Wiki Markup |
---|
NUMERIC: a single-valued per-document numeric type. This is like having a large long\[\] array for the whole index, though the data is compressed based upon the values that are actually used. |
- For example, consider 3 documents with these values:
No Format |
---|
doc[0] = 1005
doc[1] = 1006
doc[2] = 1005
|
In this example the field would use around 1 bit per document, since that is all that is needed.
Wiki Markup |
---|
SORTED: a single-valued per-document string type. This is like having a large String\[\] array for the whole index, but with an additional level of indirection. Each unique value is assigned a term number that represents its ordinal value. So each document really stores a compressed integer, and separately there is a "dictionary" mapping these term numbers back to term values. |
- For example, consider 3 documents with these values:
No Format |
---|
doc[0] = "aardvark"
doc[1] = "beaver"
doc[2] = "aardvark"
|
Value "aardvark" will be assigned ordinal 0, and "beaver" 1, creating these two data structures: No Format |
---|
doc[0] = 0
doc[1] = 1
doc[2] = 0
term[0] = "aardvark"
term[1] = "beaver"
|
- SORTED_SET: a multi-valued per-document string type. Its similar to SORTED, except each document has a "set" of values (in increasing sorted order). So it intentionally discards duplicate values (frequency) within a document and loses order within the document.
- For example, consider 3 documents with these values:
No Format |
---|
doc[0] = "cat", "aardvark", "beaver", "aardvark"
doc[1] =
doc[2] = "cat"
|
Value "aardvark" will be assigned ordinal 0, "beaver" 1, and "cat" 2, creating these two data structures: No Format |
---|
doc[0] = [0, 1, 2]
doc[1] = []
doc[2] = [2]
term[0] = "aardvark"
term[1] = "beaver"
term[2] = "cat"
|
Wiki Markup |
---|
BINARY: a single-valued per-document byte\[\] array. This can be used for encoding custom per-document datastructures. |