Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently gfsh lucene query can only query with uses the default StringQueryProvider, which provides syntax as

...

To fulfill the requirement, we will introduce another specify points config map for each numeric field into the built-in TermQueryProviderStringQueryProvider

Approach

If in gfsh, the user specify one or more --termQuery, then use the new TermQueryProvider

If user also specify --queryString, then it becomes a MUST condition to work together with other --termQuery parameters. 

Assumptions

When specifying numeric range, we have to set a few reasonable assumptions to limit the scope of the work.

Lucene's StandardQueryParser can parse the syntax (see chapter "Gfsh command line syntax" for detail) if specify some fields are numeric. 

 

The numeric fields could be Integer, Float, or Double. 

To do that, the parser should set the PointsConfigMap with a name and type mapping. 

Our index contains the indexed field list. It also saved the meta-info of each field's type somewhere. Get the meta-info of field-type mapping and create the PointsConfigMap, then set it into parser. 

Challenge

  • The meta-info is saved into each serializer, no generic interface. The code needs to refactor.
  • Need to explicitly specify HeterogeneousLuceneSerializer as default serializer, which saved the meta-info in its private 

    mappers. 

  • The FlatFormatSerializer is implemented in different way. It has no mappers data structure.  
  • Pdx data type is parsed by pdxMapper, how to get the meta-info is still unknown. 
  • In one --termQuery parameter, if one value is float format, then all the numeric value will be treat as float in the --termQuery parameter.
  • lower value and high value for range query are both inclusive. 
  • For numeric query, we only support IntPoint.newRangeQuery and FloatPoint.newRangeQuery
  • We have to use --termQuery as parameter of gfsh, so the real condition is put into "()".

Gfsh command line syntax

 
Code Block
gfsh> search lucene# find a exact match for a numeric field
gfsh>search lucene --region=/Person --name=personIndex --region=/PersonqueryString="revenue=763000" --queryString=john* --termQuery=address.must("97006") --termQuery=salary.should([75000,80000]) --defaultField=name
Note: find the persons with name starts with "john" and address contain "97006", with his salary might be within 75000 to 80000 (inclusive)


gfsh> search lucene --name=personIndexdefaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1


# use 2 SHOULD conditions, which is equivalent to "A OR B"
gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue=763000 revenue=764000" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1


# use 2 MUST conditions, which is equivalent to "A AND B". Lucene recognizes "+" as MUST
gfsh>search lucene --region=/Person --termQuery=name.should(john*)name=personIndex --termQueryqueryString=address.should("97006")"+revenue>763000 +revenue<766000" --termQuery=salary.should([75000,80000]) --defaultField=name
Note: find the persons either name starts with "john" or address contains "97006" or salary are between 75000 to 80000. 

Java API

The new TermQueryProvider can be used in java application too.

...

defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1


# >=, <= are valid syntax for inclusive condition
gfsh>search lucene --region=/Person --name=personIndex --queryString="+revenue>=763000 +revenue<=766000" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key766 | Person{name='Tom766 Zhou', email='tzhou766@example.com', revenue=766000, homepage='Page{id=766, c.. | 1
key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1
key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1

# Lucene recognizes "-" as NOT. One NOT condition will reduce results. 
gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue<2000 revenue>9997000 -name=Tom9998*" --defaultField=name
 key   |                                                                                                     | score
------ | --------------------------------------------------------------------------------------------------- | -----
key0   | Person{name='Tom0 Zhou', email='tzhou0@example.com', revenue=0, homepage='Page{id=0, content="Hel.. | 1
key1   | Person{name='Tom1 Zhou', email='tzhou1@example.com', revenue=1000, homepage='Page{id=1, content=".. | 1
key9999| Person{name='Tom9999 Zhou', email='tzhou9999@example.com', revenue=9999000, homepage='Page{id=999.. | 1
Note: name=Tom9998* equals name:Tom9998* and Tom9998* (when defaultField is name)

Java API

There is no change to Java API.