...
- Indexed Transactional HBase 'ITH' uses secondary tables but handles that transparently for the user.
- Indexed HBase 'IH' is new in 0.20.3 https://issues.apache.org/jira/browse/HBASE-2037 compares both approaches. Both approaches don't support multi-valued attributes, so custom secondary tables are used for partition indices.
...
- For equality matches the scan starts at "=value\00" and ends at "=value\01". The trailing null byte and one byte bound the scan to the exact value.
- For greater-than matches the scan starts at "=value" without an upper bound.
- For less-than matches the scan stops at "=value" incremented by one bit and without an lower bound.
For substring matches a server-side filter is used: "^=<value pattern>\00\[A\-Fa\-f0\-9]\{8}\-\[A\-Fa\-f0\-9]\{4}\-\[A\-Fa\-f0\-9]\{4}\-\[A\-Fa\-f0\-9]\{4}\-\[A\-Fa\-f0\-9]\{12}$"Wiki Markup
If the filter contains an initial pattern the lower bound "=value" and upper bound "=value" incremented by one bit can be set.
It is not possible to obtain a candidate count from that type of index table in constant time. Instead the table must be scanned.
...
HBase sorts rows lexicographical by row key. To use the indices for greater-than and lesser-than filters it is important that the byte representation of the normalized values follows that rule. http://brunodumon.wordpress.com/2010/02/17/building-indexes-using-hbase-mapping-strings-numbers-and-dates-onto-bytes/ provides a good overview.