Advanced Search Queries
CXF supports mapping the advanced query expressions to the typed Search API with the help of query language specific parsers.
Supported Query Languages
Feed Item Query Language
Feed Item Query Language(FIQL) is supported since CXF 2.3.0.
For example, the following query
lets users search for all the Apache projects with the name 'CXF' and the version greater or equal to '2.2'. The initial '=' separates the name of the query '_s' from the FIQL expression, while '==' and '=ge=' convey 'equals to' and 'greater or equals to' respectively.
An expression such as "name==CXF*" can be used to do the partial equality check (example: the name should start from "CXF").
More complex composite expressions can also be expressed easily enough, examples:
Here is a summary of FIQL operators:
Less or Equal
Greater or Equal
The last two operators, ","(OR) and ";"(AND) are used to concatenate and build composite (possibly nested) expressions, while the first 6 operators are used to build so called primitive expressions.
From CXF 2.7.7: single '=' operator can be optionally supported instead of '==': set a "fiql.support.single.equals.operator" contextual property to "true".
As you can see FIQL is rich enough for the service implementations to offer a more interesting search experience around the well-known data, while still keeping the complexity of URI expressions under control which makes it simpler to share such URI queries as well as use the same query language no matter what data store is used internally by the service.
Note, when passing the FIQL queries via URI query parameters, either '_search' or '_s' query parameter has to be used to mark a FIQL expression for it not to 'interfere' with other optional query parameters. Starting from CXF 2.7.2 it is also possible to use the whole query component to convey a FIQL expression, example,
Note that no "_s" or "_search" query parameter is available, the whole query string starting after "?" represents an actual FIQL expression.
Please use "search.use.all.query.component" contextual property for this option be supported.
Alternatively the expressions can be encoded as URI path segments, see the sections below for more information.
Open Data Protocol
The $filter query can have a number of the logical operators, here is a summary of the operators supported in scope of Search API:
Less or Equal
Greater or Equal
Please see the specification text for some examples.
Please note that OData protocol is not supported by CXF Search API, only the $filter query is supported (only logical operators for now) for querying the application data with CXF Search API. Users should work directly with Apache Olingo to get the OData protocol supported as part of the application flow.
Some of the following examples on this page may often refer to FIQL due to the fact FIQL has been supported for a long time, but the same examples will work with OData $filter expressions. For example, replace the "_s=name==CXF" query with "$filter=name eq CXF".
When to use advanced queries.
Consider a typical query expression such as "a=avalue&c=cvalue". This can mean either "find all resources with 'a' and 'c' properties equal to 'avalue' and 'cvalue'" or "find all resources with 'a' or 'c' properties equal to 'avalue' and 'cvalue'". It is application specific on whether it is "and" or "or" as far as the combination of multiple query properties is concerned.
It is also to capture conditional expressions with the custom language, example, "find all resource with 'a' property less than 123" when a number of properties is large or the entities which can be searched are created dynamically.
Use FIQL or OData for capturing simple or medium complexity queries, typically in cases where a set of properties that a user can specify is well-known. Example, a book store resource will let users search books given a number of useful properties(those of Book and/or Library a given book is available in, etc).
Furthermore, consider using FIQL/OData and SearchConditionVisitor for the purpose of generalizing the search code, when the number of properties and entities is large, dynamic, etc.
Dependencies and Configuration
The following dependency is required starting from CXF 2.6.0:
Additionally, starting from CXF 2.6.0, SearchContextProvider needs to be registered as jaxrs:provider.
Working with the queries
SearchContext needs be injected into an application code and used to retrieve a SearchCondition representing the current FIQL/OData query. This SearchCondition can be used in a number of ways for finding the matching data.
In this section we assume that the data to be matched are already available in memory. The follow-up section on converting the queries will show how the queries can be converted to some other query language typed or text expression.
So, suppose a list or map of Book instances is available. Here is one possible approach:
Note that a searchContext.getCondition(Book. class) call may return an arbitrary complex SearchCondition, it can be a simple primitiveexpression or a more complex, composite one.
Capturing the queries
For the query expression to be captured, a bean like Book.class is instantiated and has all the search properties injected into it. A complex composite expression will be 'injected' into a number of Book instances - something that may have to be optimized.
Note that by default, a bean such as Book class needs to have a matching property per every property name found in the FIQL expression, for example, given a 'name==b;id==123' expression, the Book class would need to have 'name' and 'id' properties available. The reason for this strict mode being enabled by default is that ignoring a property which can not be captured may lead to a false or unexpected match, for example, if Book 'name' property has been renamed to 'title' then ignoring the 'name' property will lead to a wider match. Thus, if the property does not exist, org.apache.cxf.jaxrs.ext.search.PropertyNotFoundException will be thrown; capturing it can let returning an empty response or retry with the more lax mode, see the next paragraph.
When a more lax parsing of FIQL expressions is expected, for example, where the primitive expressions are joined by "OR", using SearchBean (see one of the next subsections) or setting a contextual property "search.lax.property.match" will help. The former option is better when you need to know the list of all the properties which have been used in the expression, even those which will not be possible to use for the actual search; the latter option will simply have the unrecognized properties ignored.
Note that a "search.decode.values" property can be used to have the 'reserved' characters such as FIQL ',' or ';' characters passed as percent-encoded characters as part of the search property values.
Mapping of query properties to bean properties
As noted above, when a 'typed' bean such as Book.class is used to capture the expressions, a property found in the query expression that can not be mapped to a specific Book property will lead to an exception being reported or it can be optionally ignored. In the reality, there is a number of reasons why the direct match between properties found in query expressions and in capturing beans may not be ideal:
- Capturing beans may evolve independently of the actual queries; for example, a working query such as "name==b" will break if a Book 'name' gets renamed to 'title' which will make it difficult to have the queries bookmarked.
- Direct match will simply not work for cases where an actual bean property does not belong to the capturing bean itself but to one of its child properties; for example, a JPA2 Book entity may have an OwnerInfo bean with Name bean property which does contain a primitive 'name' property.
The preferred approach, when working with typed beans, is to register a bean properties map, using a "search.bean.property.map" contextual property or directly with SearchContext. For example, given
and the following map:
will let users type and bookmark queries (and without seeing them producing unexpected results) like this one:
Note, a property name such as "ownerinfo.name.name" uses '.' to let the parser navigate to the actual Name bean which has a 'name' property. This can be optimized in cases where the owner bean is known to have either a constructor or static valueOf() method accepting the 'name' property, for example, given
the mapping between "oname" and "ownerinfo.name" will work too.
You can also have many to one mappings, for example
Dealing with mistyped property names
Consider a case where a documented search property is named as 'address' (lower case) and a query contains a mistyped 'Address' instead. In this case, unless a "search.lax.property.match" property is set, PropertyNotFoundException will be thrown.
Supporting case-insensitive property mapping is easy, register a "search.bean.property.map" (mentioned earlier) map as Java TreeMap
with a case-insensitive String.CASE_INSENSITIVE_ORDER Comparator.
However it will not help if the 'address' property was mistyped as 'adress'. In this case, "search.bean.property.map" might still be useful with having few more keys supporting some typical typos, example, 'adress' - 'address', 'addres' - 'address', etc.
Starting from CXF 3.1.5, org.apache.cxf.jaxrs.ext.search.PropertyNameConverter is available and might be used for a more sophisticated conversion of mistyped property names to correct names.
The implementation can be registered as a "search.bean.property.converter" endpoint contextual property.
The parser properties are the ones which tell the parser how to treat the conversion of Date values and the unrecognized search property names.
As explained above, "search.lax.property.match" can be used to tell the parser that it should ignore the search property names which have no corresponding bean properties.
"search.date.format" and "search.timezone.support" tell the parser how to convert the date values, see "Using dates in queries" section.
More properties may be supported in the future.
All of these properties can be set as endpoint contextual properties or directly with SearchContext.
Mapping of query properties to column/field names
When converting FIQL queries to SQL or other untyped query language expressions, as well as when using Lucene converter, it can be useful to be able to map between an actual query parameter and the column or field name. All FIQL converters shipped with CXF have constructors accepting a map for mapping the queries to columns/fields. See the next "SearchBean" section for one example.
Note this property is not the same as the one described in the "Mapping of query properties to bean properties" section. The latter (the one described in the previous section) is required for getting FIQL queries captured into typed, domain specific beans like Book, and it can be sufficient for JPA2 which also has annotations like @Column.
org.apache.cxf.jaxrs.ext.search.SearchBean is a utility bean class which can simplify analyzing the captured FIQL expressions and converting them to the other language expressions, in cases where having to update the bean class such as Book.class with all the properties that may need to be supported is not practical or the properties need to be managed manually. For example:
Converting the queries
SearchCondition can also be used to convert the search requirements (originally expressed in FIQL/OData) into other query languages.
A custom SearchConditionVisitor implementation can be used to convert SearchCondition objects into custom expressions or typed objects. CXF ships visitors for converting expressions to SQL, JPA 2.0 CriteriaQuery or TypedQuery, Lucene Query.
org.apache.cxf.jaxrs.ext.search.sql.SQLPrinterVisitor can be used for creating SQL expressions. For example:
Note that SQLPrinterVisitor can also be initialized with the names of columns and the field aliases map:
The fields map can help hide the names of the actual table columns/record fields from the Web frontend. Example, the users will know that the 'level' property is available while internally it will be converted to a LEVEL_COLUMN name.
Warning: Using the SQLPrinterVisitor may leave your service open to SQL injection attacks. Please take appropriate steps to avoid these attacks (for example validating queries using a custom PropertyValidator, or manually escaping the input values).
CXF 2.6.4 and CXF 2.7.1 introduce org.apache.cxf.jaxrs.ext.search.jpa.JPATypedQueryVisitor and org.apache.cxf.jaxrs.ext.search.jpa.JPACriteriaQueryVisitor which can be used to capture FIQL/OData expressions into
javax.persistence.TypedQuery or javax.persistence.criteria.CriteriaQuery objects.
For example, given:
the following code can be used:
Using CriteriaQuery is preferred in cases when the actual result has to be shaped into a bean of different type, using one of JPA2 CriteriaBuilder's shape methods (array(), construct() or tuple()). For example:
Note that JPACriteriaQueryVisitor will automatically set aliases for an expression like "tuple.get('id', String.class)" to work.
JPACriteriaQueryVisitor will be enhanced to support more of JPA2 advanced constructs in time.
Or, instead of using Tuple, use a capturing bean like BeanInfo:
JPA2 typed converters also support join operations in cases when explicit collections are used, for example, given:
the following will find "all the books with good reviews written by Ted":
org.apache.cxf.jaxrs.ext.search.jpa.JPALanguageVisitor for converting FIQL/OData expressions into JPQL expressions have also been introduced.
Count expressions are supported at the two levels,
First, one may want to get the count of records matching a given search expression, this actually can be done by checking the size of the result list:
However this can be very inefficient for large number of records, so using a CriteriaBuilder count operation is recommended, for example:
Second, only when using FIQL, a count extension can be used. For example, one may want to find 'all the books written by at least two authors or all the books with no reviews'.
If a collection entity such as BookReview has a non primitive type, then typing "reviews==0" is all what is needed, otherwise a count extension needs to be used, for example: "count(authors)=ge=2"
Mapping of FIQL/OData expressions to Lucene (4.0.0-BETA) Query is supported starting from CXF 2.7.1. Please notice that starting from CXF 3.0.2, the Lucene version has been upgraded to 4.9.0 in order to benefit from query builders and other improvements.
org.apache.cxf.jaxrs.ext.search.lucene.LuceneQueryVisitor can be used to support the default (content) field or specific custom field queries.
Queries for specific terms and phrases are supported.
Example, "find the documents containing a 'text' term":
Note, "new LuceneQueryVisitor<SearchBean>("ct", "contents");" is a simple constructor which lets create a mapping between the "ct" name used in the query and the actual default content field. It is not required to use this mapping but it is recommended as it keeps the query expression shorter and does not leak the actual internal Lucene field name.
All the FIQL operators have been mapped to related Lucene Query objects. Queries such as "Less than", or "Greater than and less than" will work fine against the typed fields like "org.apache.lucene.document.IntField". The visitor can be configured with a "primitiveFieldTypeMap" map property to help it map a given query name, example "id" to Integer.class.
Phrases are supported too. Suppose you have few documents with each of them containing name and value pairs like "name=Fred", "name=Barry" and you'd like to list only the documents containing "name=Fred":
In this example, the visitor is requested to create Lucene org.apache.lucene.search.PhraseQuery against the specified contents field ("contents"). The visitor can also accept a contentsFieldMap map property when different phrases may need to be checked against different contents fields.
Starting from CXF 3.0.2, the typed Date range queries are supported by LuceneQueryVisitor. However, this feature should be used together with 'primitiveFieldTypeMap' in order to hint the visitor which fields are temporal and should be treated as such in the filter expressions. For example:
LuceneQueryVisitor supports wide range of date formats, still providing the option to customize it using 'search.date-format' property. This property accepts the date/time pattern expression in the SimpleDateFormat format. Also, since CXF 3.0.2, the LuceneQueryVisitor could be configured to use the Lucene analyzer. The reason to use analyzer is that during Lucene query construction the visitor can use the per-field filters and tokenizers, taking into account stemming, stop-worlds, lower-casing, etc., as such properly processing the filter expression. For example:
Mapping of FIQL/OData expressions to LDAP queries as defined by RFC-4515 is supported starting from CXF 2.7.1 with the help of org.apache.cxf.jaxrs.ext.search.ldap.LdapQueryVisitor. Use this visitor when working with LDAP or OSGI.
Here is a summary of LDAP filter operators:
Less Or Equal
Greater or Equal
FIQL "=le=" and "=lt=" will both map to "<=", while "=ge=" and "=gt=" to ">=".
The converter is created like all other converters:
Note that since CXF 3.2.5 the query values are encoded by default, to prevent possible LDAP injection attacks. If you want to support wildcard searching with the LdapQueryVisitor from CXF 3.2.5 onwards, it is necessary to set the 'encodeQueryValues' property of LdapQueryVisitor to 'false'.
CXF 3.0.2 introduces an initial support for querying HBase databases. Please see this test for more information.
In cases when a custom conversion has to be done, a converter for doing the untyped (example, SQL) or typed (example, JPA2 TypedQuery) conversions can be provided.
If needed you can access a FIQL/OData query directly and delegate it further to your own custom FIQL handler:
Converting the queries with QueryContext
QueryContext is the helper context available from CXF 2.7.1 which makes it simpler for the application code to
get the converted query expression, with the actual converter/visitor registered as the jaxrs contextual property, for example:
and convert the queries:
where the client code may look like this:
Note, given that SQLPrinterVisitor will be shared between multiple requests it has to be made thread-safe by injecting a thread-local
org.apache.cxf.jaxrs.ext.search.visitor.SBThrealLocalVisitorState. This is not required when the visitor is created in the code on the per-request basis.
Custom visitors which are expected to be singletons and have the state accumulating between multiple visit calls have to be thread safe. Utility org.apache.cxf.jaxrs.ext.search.visitor.ThrealLocalVisitorState class can be used.
Plain queries to FIQL conversion
If you'd like to generalize the processing of search queries and use FIQL visitors, you may want to consider setting up a contextual property "search.use.plain.queries" to "true" and get the plain query expressions converted to FIQL expressions internally.
Also, by default, if a query property name ends with "From" then "=ge=" (greater or equals to) will be used, and if ends with "Till" then "=lt=" will be used, for example:
This can allow the plain query expressions mapped to typed bean properties and further used with all the existing converters.
Search Expressions in URI Path segments
By default, a FIQL expression is expected to be available in either '_s' or '_search' query.
For example, "find all the books with an 'id' property value less than 123":
Starting from CXF 2.6.2, it is possible to work with FIQL expressions included in URI path segments, for example, the same query can be expressed
in a number of ways:
Such expressions can be captured in the code using JAX-RS annotations:
Note that if you have an expression added to a URI path segment with a ";" character acting as a separator, example, "/books;id=lt=123",
or if an expression itself includes ";", example, "/books[id=lt=123;id=gt=300]" ("find all the books with id less than 123 or greater than 300")
then a boolean contextual property "ignore.matrix.parameters" has to be set to "true" for the runtime to avoid splitting the path segment into the path value and matrix parameters.
Queries involving multiple entities
Consider the query like "find chapters with a given chapter id from all the books with 'id' less than 123".
One easy way to manage such queries is to make FIQL and JAX-RS work together. For example:
In the previous section we had the properties of two entities, Book and Chapter, used in the query. The query was considered 'simple' because it was really only the simple book properties that were checked, and the only chapter property was a chapter id, assumed to be equal to a chapter list index.
Consider "Find all the chapters with id less than 5 for all the books with id greater than 300".
One way to handle is to follow the example from the previous section with few modifications:
In this case two conditions are created and the 2nd condition is used to filter the chapters from the books filtered by the 1st condition.
Perhaps a simpler approach, especially in case of JPA2, is to start looking for Chapters immediately, assuming Chapter classes have a one to one bidirectional relationship with Book:
Note this code assumes that "bookId" is mapped to "Book.id" property with the help of the contextual "search.bean.property.map" property as explained earlier.
First option is to have a bean capturing specific property values do a domain specific validation. For example, a Book.class may have its setName(String name) method validating the name value.
Another option is to inject a custom validator into a visitor which is used to build the untyped or typed query.
Finally, avoid letting users to use properties whose values which can not be well validated in the application code. Using a typed capturing bean like Book.class offers a perfect option to limit a number of supported properties to the ones known to be related to Books.
Bean Validation 1.1 can also be used.
Building the queries
CXF 2.4.0 introduces SearchConditionBuilder which makes it simpler to build FIQL queries. SearchConditionBuilder is an abstract class that returns a FIQL builder by default:
Here is an example of building more complex queries:
Note, starting from CXF 2.7.1 the following can be used to make connecting multiple primitive expressions simpler:
More updates to the builder API are available on the trunk:
Using dates in queries
By default, the date values have to have the following format: "yyyy-MM-dd", for example:
A custom date format can be supported. Use "search.date-format" contextual property, example, "search.date-format"="yyyy-MM-dd'T'HH:mm:ss" will let users type:
If needed, "search.timezone.support" can be enabled to get the timezones supported too.
At the moment, for custom date formats be recognized by SearchConditionBuilder, FIQLSearchConditionBuilder has to be created explicitly:
Date value can be specified as a duration from the current date/time, as its string representation, "PnYnMnDTnHnMnS".
Resulted date will be calculated as a current date + specified duration. For example:
This query will search for a date which is 90 days in the past or newer.
Alternative query languages
Custom org.apache.cxf.jaxrs.ext.search.SearchConditionParser implementations can be registered as a "search.parser" contextual property starting from CXF 3.0.0-milestone2.
Please use a "search.query.parameter.name" contextual property to indicate to the runtime that an OData '$filter' query option needs to be checked for the query expression and a "search.parser" property to point to the instance of org.apache.cxf.jaxrs.ext.search.odata.ODataParser, as shown in this test, see the startServers function.
And here is also an XML Spring configuration example (using SearchBean in this specific case):
Also note that Apache Olingo offers its own visitor model which can be used to work with JPA2, etc.
Starting from CXF 3.0.2, the content extraction support has been added in order to complement the search capabilites with text extraction from various document formats (PDF, ODF, DOC,TXT,RTF,...). It is based on Apache Tika and is available in two shapes: raw content extraction (TikaContentExtractor) and Lucene document content extraction (TikaLuceneContentExtractor).
The purpose of Tika content extractor is to provide the essential support of text extraction from supported document formats. Additionally, the metadata is being extracted as well depending on the document format (author, modified, created, pages, ...). The TikaContentExtractor accepts the list of supported parsers and returns the extracted metadata together with the desired extracted content format (by default raw text). For example:
By default, the TikaContentExtractor also performs the content type detection and validation, which could be turned off using the 'validateMediaType' constructor argument.
The TikaLuceneContentExtractor is very similar to TikaContentExtractor but instead of raw content and metadata it returns prepared Lucene document. However, in order to properly create the Lucene document which is ready to be indexed, TikaLuceneContentExtractor accepts an additional parameter, LuceneDocumentMetadata, with the field types and type converters. For example:
At this point, the document is ready to be analyzed and indexed. The TikaLuceneContentExtractor uses LuceneDocumentMetadata to create the properly typed document fields and currently supports DoubleField, FloatField, LongField, IntField, IntField, TextField (for content) and StringField (also used to store dates).
To demonstrate the full power of the CXF 3.0.2 content extraction and search capabiities, the demo project 'jax_rs_search' has been developed and is distributed in the samples bundle. The project could be found in the official Apache CXF Github repository. It integrates together Apache CXF, Apache Lucene and Apache Tika showing off some advanced features related to custom analyzers and different filter criteria (keyword and phrase search).