Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Many APIs are using a request structure rather than taking individual parameters. So need to add ValidWriteIdList to the request structure instead
  2. Some APIs already take ValidWriteIdList to invalidate outdated transactional statistics. We don’t need to change the API signature, but will reuse the ValidWriteIdList to validate cached entries in CachedStore

Thrift API will remain backward compatible. That is, new server can deal with old client. If the old client issue a create_table call, server side will receive the request of create_table with validWriteIdList=null, and won’t cache the entry at all if this is managed table.

hive_metastore.thriftOld API

New API

create_table(Table tbl)

create_table(Table tbl,string validWriteIdList)

get_table(string dbname,string tbl_name)

get_table(string dbname,string tbl_name,string validWriteIdList)

...

Old API

New API

createTable(Table tbl)

createTable(Table tbl,String validWriteIdList)

getTable(String catName,String dbName,String tableName)

getTable(String catName,String dbName,String tableName,String validWriteIdList)

Hive.java

The implementation details will be encapsulated in Hive.java. Which include:

  1. Generate new write id for every write operation involving managed tables. Since DbTxnManager cache write id for every transaction, so every query will generate at most one new write id for a single table, even if it consists of multiple Hive.java write API calls
  2. Retrieve table write id from config for every read operation if exists (for managed table, it guarantees to be there in config), and pass the write id to HMS API

Other

All other components invoking HMS API directly (bypass Hive.java) will be changed to invoke the newer HMS API. This includes HCatalog, Hive streaming, etc.

Use cases

Write

Hive needs to pass a ValidWriteIdList for every metastore write operation (table/partition). CachedStore will store ValidWriteIdList along with the entry in cache. Every Hive query (either DDL or DML) will retrieve a ValidWriteIdList at the beginning of the query. Let’s look at some examples.

...