Overview
Currently gfsh lucene query uses the default StringQueryProvider, which provides syntax as
gfsh> search lucene --name=personIndex --region=/Person --queryString=john* --defaultField=name
StringQueryProvider cannot provide complex search with combination of conditions, especially mixed with numeric range search, for example:
search for a person whose name must contain "john", salary might within 750000 to 80000.
To fulfill the requirement, we will specify points config map for each numeric field into the built-in StringQueryProvider.
Approach
Lucene's StandardQueryParser can parse the syntax (see chapter "Gfsh command line syntax" for detail) if specify some fields are numeric.
The numeric fields could be Integer, Float, or Double.
To do that, the parser should set the PointsConfigMap with a name and type mapping.
Our index contains the indexed field list. It also saved the meta-info of each field's type somewhere. Get the meta-info of field-type mapping and create the PointsConfigMap, then set it into parser.
Challenge
- The meta-info is saved into each serializer, no generic interface. The code needs to refactor.
- Need to explicitly specify HeterogeneousLuceneSerializer as default serializer, which saved the meta-info in its private
mappers.
- The FlatFormatSerializer is implemented in different way. It has no mappers data structure.
- Pdx data type is parsed by pdxMapper, how to get the meta-info is still unknown.
Gfsh command line syntax
There's no change in gfsh. Current gfsh parameters have supported the numeric query syntax.
# create index with 4 numeric fields gfsh> create lucene index --name=personIndex --region=/Person --field=name,email,address,revenue,revenue_float,revenue_double,revenue_long # find a exact match for a numeric field gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue=763000" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1 # use 2 SHOULD conditions, which is equivalent to "A OR B" gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue=763000 revenue=764000" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1 key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1 # use 2 MUST conditions, which is equivalent to "A AND B". Lucene recognizes "+" as MUST gfsh>search lucene --region=/Person --name=personIndex --queryString="+revenue>763000 +revenue<766000" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1 key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1 # >=, <= are valid syntax for inclusive condition gfsh>search lucene --region=/Person --name=personIndex --queryString="+revenue>=763000 +revenue<=766000" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key766 | Person{name='Tom766 Zhou', email='tzhou766@example.com', revenue=766000, homepage='Page{id=766, c.. | 1 key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1 key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1 key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1 # Another way to specify range query gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue:[763000 TO 766000]" --defaultField=name # Query on float, double, long fields. All the 4 numeric types (integer, float, double, long) are supported gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue_float:[763000.0 TO 766000.0]" --defaultField=name gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue_double:[763000 TO 766000]" --defaultField=name gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue_long:[763000 TO 766000]" --defaultField=name # Combination query to return a subset gfsh>search lucene --region=/Person --name=personIndex --queryString="+revenue_long:[763000 TO 766000] +revenue_float:[762000 TO 765000]" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1 key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1 key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1 # Lucene recognizes "-" as NOT. One NOT condition will reduce results. gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue<2000 revenue>9997000 -name=Tom9998*" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key0 | Person{name='Tom0 Zhou', email='tzhou0@example.com', revenue=0, homepage='Page{id=0, content="Hel.. | 1 key1 | Person{name='Tom1 Zhou', email='tzhou1@example.com', revenue=1000, homepage='Page{id=1, content=".. | 1 key9999| Person{name='Tom9999 Zhou', email='tzhou9999@example.com', revenue=9999000, homepage='Page{id=999.. | 1 Note: name=Tom9998* equals name:Tom9998* and Tom9998* (when defaultField is name) # Query with a numeric field in a JSON object gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue="400000" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key400 | Person{name='Tom400 Zhou', email='tzhou400@example.com', revenue=400000, homepage='Page{id=400, c.. | 1 json1 | PDX[8776019,__GEMFIRE_JSON]{revenue=400000, address=PDX[16524384,__GEMFIRE_JSON]{city=New York, p.. | 1 # Query in numeric field in a nested object gfsh>search lucene --region=/Customer --name=customerIndex --queryString="+contacts.revenue:[763000 TO 766000] +revenue:[762000 TO 765000]" --defaultField=name Note: Both conditions take effect and display 3 (not 4) Customer objects
Java API
There is no change to Java API.