...
Top K statistics are gathered along with partition level statistics. The interface IStatsAggregator needs to add a method aggregateStatsTopK() that reads multiple entries from the temporary storage:
Code Block |
---|
... public interface IStatsAggregator { ... /** * This method aggregates top K statistics. * * */ public List<String> aggregateStatsTopK(String keyPrefix, String statType); ... } |
Usage
Top K statistics are not enabled by default. The user can set the boolean variable hive.stats.topk.collect to be true to enable computing top K and putting top K into skewed information.
...