This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • Simple Faceted Search
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

Lucene.Net Simple Faceted Search

What is faceted search?

Faceted search is the ability to find results based upon classifications of data which are not fixed. Facets can allow the user to filter data through multiple paths and different ordering. You can click on any combination of facets and arrive to the results from different paths.

e.g. e-commerce searches over clothes that have facets of price range, color, brands, sex, etc.

References to adding faceted to Lucene.Net.

Status: Prototyping inside the Lucene.Net_2_9_4g branch

Which means anything can be a breaking change before its put into a release.

This shouldn't prevent anyone from making their own version or taking the existing code and building on top of it, improving it. In fact we encourage that you do so and submit patches and influence the design.

Simple Faceted Search

A contrib project was created for implementing a simple faceted search inside of Lucene.Net:
svn repository:

The tests do show how to use the simple faceted search, look at the TestSimpleFacetedSearch.cs file and the "HowToUse" Method.

Sample Usage

The usage is pretty straight forward. The code sample below actually comes from the tests for simple faceted search, so we encourage you to look at the rest of the tests inside the project.

            // Pulled from the HowToUse method.

            // create the index reader.
            // * the code sample assumes you've already done this, which is what the _Reader variable references below

            Query query = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", new

            // pass in the reader and the names of the facets that you've created using fields in the documents.
            // the facets are determined

            SimpleFacetedSearch sfs = new SimpleFacetedSearch(_Reader, new string[] { "language", "category" });

            // then pass in the query into the search like you normally would with a typical search class.

            SimpleFacetedSearch.Hits hits = sfs.Search(query, 10);

            // what comes back is different than normal.
            // the result documents & hits are grouped by facets.

            // you'll need to iterate over groups of hits-per-facet.

            long totalHits = hits.TotalHitCount;
            foreach (SimpleFacetedSearch.HitsPerFacet hpg in hits.HitsPerFacet)
                long hitCountPerGroup = hpg.HitCount;
                SimpleFacetedSearch.FacetName facetName = hpg.Name;

                foreach (Document doc in hpg.Documents)
                    string text = doc.GetField("text").StringValue();

                    // replace with logging or your desired output writer
                    System.Diagnostics.Debug.WriteLine(">>" + facetName + ": " + text);


Under the hood.

Some things that will help understanding what the code actually does underneath the covers.

  • Cartesian Product - a set of products
  • FieldValuesBitSets
    • maps a field / facet name to a bit set (OpenBitSetDISI) and its underlying values. e.g. if the facet was color, it would put all the distinct values for this field into a list like "blue","cyan","fusca","butterscotch" and all those other fun random names of colors into one flattened list.
    • This basically helps facilitate storing the bit set combinations described by example in the facet algorithm section below
  • OpenBitSetDISI
    • DISI stands for Document Id Set Iterator

Basically the search class builds up and caches bitsets of possible combinations on the specified facets that are supplied in the constructor of the query.

If the data does not have a certain combination of facets, it omits these combinations from being stored.

e.g. if you only have "grey" shirts for "males" indexed and you have 0 "grey" shirts indexed for "females", it will omit the "grey" shirts for "female" combination in the possible combinations. "grey" is a valid color and "females" is a valid sex for under each facet, they are not a valid combination. These combinations will never have hits.

The SimpleFacetedSearch will consume the supplied query and then return the groups of facet combinations and their hits.

Facet Algorithm

Suppose, we want a faceted search on fields f1 f2 f3,
and their values in index are


































1- Find all possible values for f1 which are (A,B) , for f2 which are (I,J) and for f3 which are (1,2,3)
2- Find Cartesian Product of (A,B)X(I,J)X(1,2,3). (12 possible groups)
3- Eliminate the ones that surely result in 0 hits. (for ex, B J 2. since they have no doc. in common)

Tips & Tricks

  • SFS instances are tread-safe. So, the same instance can be shared among many threads like IndexReader
  • SFS should only be created when an IndexReader is opened/reopened. Creation with every search can be performance killer
  • Iterating over the thousands of results(e.g, 1000 facets x 25 Hits per facet) can be too costly. The old trick can be used here too, "just show the hit counts, and make a new search when user clicks a category".


Thanks to DIGY and Ben West for working on this and putting this together for the community.


if any of this information is out of date, please contact the developers list and notify them or if you have wiki access, please modify.

  • No labels