When dealing with sensitive data in compliance driven environment, certain data elements need to prohibited to be included in the same query. The intent is to prevent accident leakage of sensitive data that could be possible by combining data elements.


High level requirements include

  1. The current focus is on Hive, but in future maybe extended to HBase as well.
  2. The query in Hive should be blocked, if 2 columns excluded specifically in Ranger policy, are part of the query plan
  3. Administrators would explicitly specify columns that would need to be excluded. The rule can be global  or could be specified for certain users or groups
  4. The rule would apply for any kind of Hive query action (Select, Load etc)