Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor edits

...

The Thrift-based Hive service is the core of HS2 and responsible for servicing the Hive queries (e.g., from Beeline). Thrift is a an RPC framework for building cross-platform services. Its stack consists of 4 layers: Server, Transport, Protocol, and Processor. You can find more details about the layers at https://thrift.apache.org/docs/concepts.

The usage of those layers from in the HS2 implementation is described below.

...

The TThreadPoolServer allocates one worker thread per TCP connection. Each thread is always associated with a connection even if the connection is idle. So there is a potential performance issue resulting from a large number of threads due to a large number of concurrent connections. In the future we may think about switching HS2 might switch to another server type for TCP mode, for example TThreadedSelectorServer. Here is an article about a performance comparison between different Thrift Java servers.  

...

HTTP mode is required when a proxy is needed between the client and server (for example, for load balancing or security reasons). That is why it is supported, as well as TCP mode. You can specify the transport mode of the Thrift service through the Hive configuration property hive.server2.transport.mode.

...

The Protocol implementation is responsible for serialization /and deserialization.  We are HS2 is currently using TBinaryProtocol as our its Thrift protocol for serialization. In the future we may think about other protocols may be considered, such as TCompactProtocol, based on more performance evaluation.

...

Source Code Description

The following provides information on sections help you locate some basic components from of HiveServer2 in the source code for new users.

Server Side

  • Thrift IDL file for TCLIService: https://github.com/apache/hive/blob/master/service-rpc/if/TCLIService.thrift.
  • TCLIService.Iface implemented byorg.apache.hive.service.cli.thrift.ThriftCLIService class.
  • ThriftCLIService subclassed byorg.apache.hive.service.cli.thrift.ThriftBinaryCLIService and org.apache.hive.service.cli.thrift.ThriftHttpCLIService for TCP mode and HTTP mode respectively.
  • org.apache.hive.service.cli.thrift.EmbeddedThriftBinaryCLIService class: Embedded mode for HS2. Don't get confused with embedded Metastoremetastore, which is a different service . But (although the embedded mode concept is similar).
  • org.apache.hive.service.cli.session.HiveSessionImpl class: Instances of this class are created on the server side and managed by an org.apache.accumulo.tserver.TabletServer.SessionManager instance.

  • org.apache.hive.service.cli.operation.Operation class: Defines an operation (e.g., a query). Instances of this class are created on the server and managed by an org.apache.hive.service.cli.operation.OperationManager instance.
  • org.apache.hive.service.auth.HiveAuthFactory class: A helper used by both HTTP and TCP mode for authentication. Refer to Setting Up HiveServer2 for various authentication options, in particular Authentication/Security Configuration and Cookie Based Authentication.

Client Side

  • org.apache.hive.jdbc.HiveConnection class: Implements thejava.sql.Connection interface (part of JDBC). An instance of this class holds a reference to a SessionHandle instance which is retrieved when making Thrift API calls to the server.
  • org.apache.hive.jdbc.HiveStatement class: Implements the java.sql.Statement interface (part of JDBC). The client (e.g., Beeline) calls the HiveStatement.execute() method for the query. Inside the execute() method, the Thrift client is used to make API calls.
  • org.apache.hive.jdbc.HiveDriver class: Implements the java.sql.Driver interface (part of JDBC). The core method is connect() which is used by the JDBC client to initiate a SQL connection.

Interaction between

...

Client and

...

Server

  • org.apache.hive.service.cli.SessionHandle class:Session identifier. Instances of this class are returned from the server and used by the client as input for Thrift API calls.
  • org.apache.hive.service.cli.OperationHandle class: Operation identifier. Instances of this class are returned from the server and used by the client to poll the execution status of an operation. 

...