UPDATE STATISTICS is typically run on large tables with the SAMPLE clause. This ensures that a random sample of the table's data is used to compute statistics of adequate quality. By default the sampling ratio is 1% of all rows or 1 million rows, whichever is smaller. Sometime we have a sample table that was created for some other reason and wish to instruct UPDATE STATISTICS to use the existing table rather than create a new one. Also occasionally UPDATE STATISTICS may error out while creating or populating the sample table. The steps here can be used to create, populate and then use the resulting sample table for UPDATE STATISTICS.

 Our objective here is to run this command and obtain statistics for table tab1

UPDATE STATISTICS FOR TABLE tab1 ON EVERY COLUMN SAMPLE;

Step-by-step guide

  1. Create the sample table. Use CREATE TABLE tab1_sample LIKE tab1 WITH PARTITIONS;
  2. CQD HBASE_NUM_CACHE_ROWS_MAX '1' ; – or some other low number like 10 may work too.
  3. CQD PARALLEL_NUM_ESPS <num_of_partitions> – this is the number of salt values for salted tables
  4. UPSERT USING LOAD INTO tab1_sample SELECT * FROM tab1 SAMPLE RANDOM <x> PERCENT ROWS ; – <x> is computed to result in 1 million rows. We have to know approximate cardinality of tab1. This can be obtained by running an explain query like "EXPLAIN OPTION 'f' SELECT * FROM tab1"
  5. cqd USTAT_SAMPLE_TABLE_NAME 'trafodion.<sch-name>.tab1_sample' ;
  6. UPDATE STATISTICS FOR TABLE tab1 ON EVERY COLUMN ; 

 

There is no content with the specified labels