Skip to end of metadata
Go to start of metadata


Overview

Currently gfsh lucene query uses the default StringQueryProvider, which provides syntax as

 
gfsh> search lucene --name=personIndex --region=/Person --queryString=john* --defaultField=name


StringQueryProvider cannot provide complex search with combination of conditions, especially mixed with numeric range search, for example:

search for a person whose name must contain "john", salary might within 750000 to 80000. 

To fulfill the requirement, we will specify points config map for each numeric field into the built-in StringQueryProvider. 

Approach

Lucene's StandardQueryParser can parse the syntax (see chapter "Gfsh command line syntax" for detail) if specify some fields are numeric. 

 

The numeric fields could be Integer, Float, or Double. 

To do that, the parser should set the PointsConfigMap with a name and type mapping. 

Our index contains the indexed field list. It also saved the meta-info of each field's type somewhere. Get the meta-info of field-type mapping and create the PointsConfigMap, then set it into parser. 

Challenge

  • The meta-info is saved into each serializer, no generic interface. The code needs to refactor.
  • Need to explicitly specify HeterogeneousLuceneSerializer as default serializer, which saved the meta-info in its private 

    mappers. 

  • The FlatFormatSerializer is implemented in different way. It has no mappers data structure.  
  • Pdx data type is parsed by pdxMapper, how to get the meta-info is still unknown. 

Gfsh command line syntax

 
# find a exact match for a numeric field
gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue=763000" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1


# use 2 SHOULD conditions, which is equivalent to "A OR B"
gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue=763000 revenue=764000" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1


# use 2 MUST conditions, which is equivalent to "A AND B". Lucene recognizes "+" as MUST
gfsh>search lucene --region=/Person --name=personIndex --queryString="+revenue>763000 +revenue<766000" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1


# >=, <= are valid syntax for inclusive condition
gfsh>search lucene --region=/Person --name=personIndex --queryString="+revenue>=763000 +revenue<=766000" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key766 | Person{name='Tom766 Zhou', email='tzhou766@example.com', revenue=766000, homepage='Page{id=766, c.. | 1
key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1
key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1

# Lucene recognizes "-" as NOT. One NOT condition will reduce results. 
gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue<2000 revenue>9997000 -name=Tom9998*" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key0   | Person{name='Tom0 Zhou', email='tzhou0@example.com', revenue=0, homepage='Page{id=0, content="Hel.. | 1
key1   | Person{name='Tom1 Zhou', email='tzhou1@example.com', revenue=1000, homepage='Page{id=1, content=".. | 1
key9999| Person{name='Tom9999 Zhou', email='tzhou9999@example.com', revenue=9999000, homepage='Page{id=999.. | 1
Note: name=Tom9998* equals name:Tom9998* and Tom9998* (when defaultField is name)

Java API

There is no change to Java API.


  • No labels