...
Currently gfsh lucene query can only query with uses the default StringQueryProvider, which provides syntax as
...
To fulfill the requirement, we will introduce another specify points config map for each numeric field into the built-in TermQueryProviderStringQueryProvider.
Approach
If in gfsh, the user specify one or more --termQuery, then use the new TermQueryProvider
If user also specify --queryString, then it becomes a MUST condition to work together with other --termQuery parameters.
Assumptions
When specifying numeric range, we have to set a few reasonable assumptions to limit the scope of the work.
Lucene's StandardQueryParser can parse the syntax (see chapter "Gfsh command line syntax" for detail) if specify some fields are numeric.
The numeric fields could be Integer, Float, or Double.
To do that, the parser should set the PointsConfigMap with a name and type mapping.
Our index contains the indexed field list. It also saved the meta-info of each field's type somewhere. Get the meta-info of field-type mapping and create the PointsConfigMap, then set it into parser.
Challenge
- The meta-info is saved into each serializer, no generic interface. The code needs to refactor.
- Need to explicitly specify HeterogeneousLuceneSerializer as default serializer, which saved the meta-info in its private
mappers.
- The FlatFormatSerializer is implemented in different way. It has no mappers data structure.
- Pdx data type is parsed by pdxMapper, how to get the meta-info is still unknown.
- In one --termQuery parameter, if one value is float format, then all the numeric value will be treat as float in the --termQuery parameter.
- lower value and high value for range query are both inclusive.
- For numeric query, we only support IntPoint.newRangeQuery and FloatPoint.newRangeQuery
- We have to use --termQuery as parameter of gfsh, so the real condition is put into "()".
Gfsh command line syntax
Code Block |
---|
gfsh> search lucene# find a exact match for a numeric field gfsh>search lucene --region=/Person --name=personIndex --region=/PersonqueryString="revenue=763000" --queryString=john* --termQuery=address.must("97006") --termQuery=salary.should([75000,80000]) --defaultField=name Note: find the persons with name starts with "john" and address contain "97006", with his salary might be within 75000 to 80000 (inclusive) gfsh> search lucene --name=personIndexdefaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1 # use 2 SHOULD conditions, which is equivalent to "A OR B" gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue=763000 revenue=764000" --defaultField=name key | | score ------ | --------------------------------------------------------------------------------------------------- | ----- key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1 key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1 # use 2 MUST conditions, which is equivalent to "A AND B". Lucene recognizes "+" as MUST gfsh>search lucene --region=/Person --termQuery=name.should(john*)name=personIndex --termQueryqueryString=address.should("97006")"+revenue>763000 +revenue<766000" --termQuery=salary.should([75000,80000]) --defaultField=name Note: find the persons either name starts with "john" or address contains "97006" or salary are between 75000 to 80000. |
Java API
The new TermQueryProvider can be used in java application too.
...
defaultField=name
key | | score
------ | --------------------------------------------------------------------------------------------------- | -----
key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1
# >=, <= are valid syntax for inclusive condition
gfsh>search lucene --region=/Person --name=personIndex --queryString="+revenue>=763000 +revenue<=766000" --defaultField=name
key | | score
------ | --------------------------------------------------------------------------------------------------- | -----
key766 | Person{name='Tom766 Zhou', email='tzhou766@example.com', revenue=766000, homepage='Page{id=766, c.. | 1
key765 | Person{name='Tom765 Zhou', email='tzhou765@example.com', revenue=765000, homepage='Page{id=765, c.. | 1
key764 | Person{name='Tom764 Zhou', email='tzhou764@example.com', revenue=764000, homepage='Page{id=764, c.. | 1
key763 | Person{name='Tom763 Zhou', email='tzhou763@example.com', revenue=763000, homepage='Page{id=763, c.. | 1
# Lucene recognizes "-" as NOT. One NOT condition will reduce results.
gfsh>search lucene --region=/Person --name=personIndex --queryString="revenue<2000 revenue>9997000 -name=Tom9998*" --defaultField=name
key | | score
------ | --------------------------------------------------------------------------------------------------- | -----
key0 | Person{name='Tom0 Zhou', email='tzhou0@example.com', revenue=0, homepage='Page{id=0, content="Hel.. | 1
key1 | Person{name='Tom1 Zhou', email='tzhou1@example.com', revenue=1000, homepage='Page{id=1, content=".. | 1
key9999| Person{name='Tom9999 Zhou', email='tzhou9999@example.com', revenue=9999000, homepage='Page{id=999.. | 1
Note: name=Tom9998* equals name:Tom9998* and Tom9998* (when defaultField is name)
|
Java API
There is no change to Java API.