In cloud based deployments, it is common that the user identity is federated and managed externally by an identity provider (e.g Okta, PingIndentity, Azure AD). Integrating with such external identity providers (IDP) would help adoption and unlock use-cases where Hive is deployed in a cloud based environment and doesn't need user managed authentication mechanisms (e.g Ldap, Kerberos). There are primarily two authentication protocols which are standardized with such external identity providers namely (SAML 2.0 and OpenID connect). This design specifically focuses on SAML 2.0 but it can be extended to support OpenID connect as well in the future.
In this approach, we rely on using SAML 2.0 protocol to exchange the authentication messages between the client and IDP and generate a SAML response object which is sent to the HiveServer2 from the IDP via browser as per the SAML specification. HiveServer2 can reuse the same port for its http connection and have a separate http path on it to receive the SAML Response. SAML 2.0 protocol is mainly browser dependent since it depends a lot on the redirect URLs which browser supports to control the SSO workflow. For the desktop based use-cases (JDBC/ODBC Driver running locally on a machine which has browser access) JDBC driver code will open a browser to execute this workflow. At the end of SSO workflow, the IDP will send back a signed SAML assertion to the HiveServer2’s assertion consumer service url (ACS). When the assertion is validated, HiveServer2 will generate a short lived token which is returned to the driver by redirecting the browser using a x-www-form-urlencoded response to a localhost port. This approach is similar to what is described in RFC8252 section 7.3. Driver will add the token to its authorization bearer header to make subsequent calls. HiveServer2 will validate the bearer token and the client identifier to allow the client to open a session.
In order to make sure that the SAML assertions received by HiveServer2 are valid, HiveServer2 needs to establish a trust relationship with the IDP which is going to sign the SAML assertions. This can be done by manual configuration in the HiveServer2. In order to configure the HiveServer2 with the IDP metadata, following steps will need to be done before the HiveServer2 is started.
Following new configurations will be added to the hive-site.xml which would need to be configured by the clients.
<property>
<name>hive.server2.authentication</name>
<value>SAML</value>
</property>
This configuration will be set to SAML to indicate that the server will use SAML 2.0 protocol to authenticate the user.
<property>
<name>hive.server2.saml2.idp.metadata</name>
<value>path_to_idp_metadata.xml</value>
</property>
This configuration will provide a path to the IDP metadata xml file.
<property>
<name>hive.server2.saml2.sp.entity.id</name>
<value>test_sp_entity_id</value>
</property>
This configuration should be same the service provider entity id as configured in the IDP. Some identity providers require this to be same as the ACS URL.
<property>
<name>hive.server2.saml2.group.attribute.name</name>
<value>group_attribute_name</value>
</property>
This configuration will be used to map the SAML attribute in the response to the groups of the user. This should be configured in the identity provider as the attribute name for the group information.
<property>
<name>hive.server2.saml2.group.filter</name>
<value>comma_separated_group_names</value>
</property>
This configuration will be used to configure the allowed group names.
<property>
<name>hive.server2.saml2.sp.callback.url</name>
<value>callback_url_of_hiveserver2</value>
</property>
The http URL endpoint where the SAML assertion is posted back by the IDP. Currently this must be on the same port as HiveServer2’s http endpoint and must be TLS enabled (https) on secure setups.
On the driver side the following new jdbc connection parameters will be defined/allowed.
For example, in case of browser the URL will look like
jdbc:hive2://HiveServer2-host:10001/default;transportMode=http;httpPath=cliservice;auth=browser
A token based URL will look like:
jdbc:hive2://HiveServer2-host:10001/default;transportMode=http;httpPath=cliservice;auth=token;token=<token_string>
The Jdbc connection parameters will be passed in over the connection URL.
Driver makes sure that the SSO integration is only allowed when TLS enabled between driver and the server. If the SSL is not enabled, the driver should error out (or optionally warn the users based on a configuration override). Currently, both these modes will only be supported in http mode.
When the auth is browser the Driver will open a available port on the machine where the driver is running. Driver will make a http post request with this port number in the request header (it could be the same as the OpenSessionReq). This request will get a HTTP 302 response with the location set to the SSO URI. Additionally, the server will set a client identifier in the response header. This client identifier header must be sent by the driver in all the subsequent requests to the server.
Once the SSO URI is available, Driver must take the following actions:
If the status is success, the token must be present and the driver will add the token as a bearer authorization HTTP header for the subsequent connection to open session with the key “Authorization: Bearer :”
If the status is error, the message field will include an appropriate error message which can be logged by the driver.
The driver will have a configurable timeout value which will be used to error out the connection attempt in case no response is received on the localhost:port until the given timeout. This configurable timeout value could be provided as a connection parameter. Additionally, the connection URL can provide a specific port to bind to on the localhost on the connection URL. For example, following URL will open a port 12345 and have a timeout of 120 seconds to complete the SSO flow.
jdbc:hive2://HiveServer2-host:10001/default;transportMode=http;httpPath=cliservice;auth=browser;samlResponsePort=12345;samlResponseTimeout=120
In the initial version the token returned by the server will be used for a one-time validation within the default period of 30 seconds (token will be valid for only 30 seconds) which could be configurable. The token will be used by the server to set a cookie which will be used for further requests. However, this is a server side implementation detail which client does not need to be aware of. When the session expires the server will return a HTTP 401 status code which will be used by the clients to re authenticate using the browser flow again. Unfortunately, RFC 7231 does have an explicit status code for session expiration. However, the clients can detect if the response code of 401 is due to session expiry by keeping track of whether we are sending a token in the header in the previous request or not. In such case the JDBC Application can choose to close the connection or re-authenticate as per the application logic.