Authors: Mridul Muralidharan (mmuralidharan[at]linkedin.com), Chandni Singh (chsingh[at]linkedin.com)

September 23

TL/DR;

This document details the proposal for supporting security and authentication in Celeborn.

Introduction

Apache Celeborn is a scalable remote shuffle solution which supports multiple compute platforms like Apache Spark, Apache Flink, etc running on different resource managers (Yarn, Kubernetes, standalone cluster).

A current limitation in Apache Celeborn, which is explored in this proposal, is the lack of over the wire encryption and support for authentication.

Currently, all over the wire communication is in the clear - between Celeborn’s own infrastructure components (between masters and/or workers) and between clients and Celeborn servers (master/worker).

Additionally, there is no authentication to validate if an incoming request should be allowed - for example, app1 could request and/or modify the data for app2 - either accidentally or maliciously.

This document uses Apache Spark as the motivating example, but the discussion applies to other platforms as well.

Out of scope

  • Authentication between infrastructure components within Celeborn are out of scope.
  • Support for any TTL for shared secret negotiated between client and master.
  • SPI interfaces to extend/customize Celeborn (like adding new SASL mechanisms).

Proposal

The two main aspects covered in this proposal are:

  • Architectural changes to Celeborn for supporting Authentication.
  • Protocol changes to Celeborn - between client, master, and workers.

We strongly recommend leveraging TLS for all communication, to ensure over-the-wire encryption (and allow sharing authentication secrets), prevent eavesdropping, MITM attacks, etc.

Architectural changes

  • Every spark application attempt must go through a successful registration flow with Celeborn master before any other messages are exchanged with the system.
    • An application attempt must register only once with Celeborn master.
  • Registration involves authentication, and negotiation of shared-secret to be used for connection authentication.
    • SASL is used for authentication, using a negotiated mechanism supported by both client and server.
  • After successful registration, Celeborn Master will propagate shared-secret for application-attempt to workers in a secured way, while the Spark driver does it to executors.
  • Every subsequent connection (to master or worker) must go through connection authentication first, when authentication is enabled.
  • All connections should be encrypted with TLS.


Note, the protocol changes would be handled by the client and server implementation, and Celeborn users would typically configure them appropriately - not send the specific protocol messages. In future, they would possibly extend them via SPI to customize behavior (for example, adding a new SASL mechanism, etc).

Protocol changes

To allow for evolution from current Celeborn deployments to the newer model, a static configuration will be added in Celeborn - that will enable this proposed protocol at client and server. Once enabled, this is a backwardly incompatible change.

Before the first request is made from an application attempt to Celeborn master, the following registration steps are required.

  1. The LifecycleManager creates a (TLS) connection with the Celeborn Master.


  1. The LifecycleManager will send a “AuthenticationInitiationRequest” to the Master, which includes:
    • Client protocol version
    • Whether authentication is enabled.
    • If authentication is enabled, SASL mechanisms client supports.
      1. If authentication is enabled, there MUST be at least one mechanism for client authentication and connection authentication.


  1. The Master responds with “AuthenticationInitiationResponse”, which includes:
    • Server protocol version
    • Whether authentication is enabled
    • If authentication is enabled, SASL mechanisms that server supports.
      1. If authentication is enabled, there MUST be at least one mechanism for client authentication and connection authentication.


  1. Either client or server can abort a connection if they are incompatible.
    • The client will abort the connection if master does not:
  1. Support a compatible protocol version, or 
  2. Server does not support authentication and client requires it, or
  3. If authentication is enabled, but there is no common SASL mechanism for either client_auth or connection_auth.
  • The server will abort the connection (after sending “AuthenticationInitiationResponse”) if client does not:
    1. Support a compatible protocol version, or 
    2. Client does not support authentication and server requires it, or
    3. If authentication is enabled, but there is no common SASL mechanism for either client_auth or connection_auth.


  1. The LifecycleManager must pick a client_auth SASL mechanism which is supported by both client and server, and proceeds to authentication. 
    • There can be multiple challenge/response exchanges during SASL negotiation.
    • SASL will end in a success or failure from the server.
    • Note: Server will respond with authentication failure for unsupported mechanisms.


  1. After successful authentication, LifecycleManager sends a “RegisterApplicationRequest” to the Master, with the shared_secret for connection authentication.


  1. The Master sends a “RegisterApplicationResponse” to the LifecycleManager - the registration is now complete.
    • The TLS connection can be optionally closed.
    • Every subsequent connection between the client and the server will require a successful connection authentication.


Once RegisterApplicationResponse has completed, LifecycleManager propagates the selected connection_auth SASL Mechanism and shared secret to the executors while Celeborn Master will propagate it to workers.

Appendix

Example

We illustrate the protocol flow below, initiated by the first mapper’s registerShuffle action and resulting in the Spark application's Lifecycle Manager sending its first RequestSlots request. In this example, the driver has terminated the connection after receiving a successful RegisterApplicationResponse - but typically would not do so.

Flow

Step


Direction

Message

Direction


Application Registration

1.

LifecycleManager


TLS Handshake


Master

2.

LifecycleManager


AuthenticationInitiationRequest 

{

  "version": 1.0,

  "auth_enabled": true,

  "sasl_mechanisms":[

    {

      "ANONYMOUS": ["client_auth"]

    },

    {

      "DIGEST-MD5": ["connection_auth"]

    }

  ]

}


Master

3.

LifecycleManager


AuthenticationInitiationResponse

{

  "version": 1.0,

  "auth_enabled": true,

  "sasl_mechanisms":[

    {

      "ANONYMOUS": ["client_auth"]

    },

    {

      "DIGEST-MD5": ["connection_auth"]

    }

  ]

}


Master

5 a.

LifecycleManager


SASLRequest
{

  "mechanism": "ANONYMOUS",

  "type": "client_auth",

  "payload": <data>

}


Master

5 b.

LifecycleManager


SASLResponse2

{

  "payload": null (indicates success)

}


Master

6

LifecycleManager


RegisterApplicationRequest

{

  "id": "application-123_0",

  "shared_secret": <data>

}


Master

7.

LifecycleManager


RegisterApplicationResponse

{

  "status": "SUCCESS"

}


Master

The below flow assumes that the LifecycleManager will create a new connection with the Master. This will not be the usual case as the connection between the LifecycleManager and Master is long-lived and the client will use the already authenticated connection.

Request Slots

8 a.

LifecycleManager


SASLRequest1

{

  "mechanism": "DIGEST-MD5",

  "type": "connection_auth"

}


Master

8 b.

LifecycleManager


SASLResponse2

{

  "payload": <challenge>

}


Master

8 c.

LifecycleManager


SASLRequest

{

  "payload": <challenge_response>

}


Master

8 d.

LifecycleManager


SASLResponse2

{

  "payload": null (indicates success)

}


Master

9.

LifecycleManager


RequestSlots


Master

10.

LifecycleManager


RequestSlotsResponse


Master


Reserve Slots

11 a.

LifecycleManager


SASLRequest1

{

  "mechanism": "DIGEST-MD5",

  "type": "connection_auth"

}


Worker

11 b.

LifecycleManager


SASLResponse2

{

  "payload": <challenge>

}


Worker

11 c.

LifecycleManager


SASLRequest

{

  "payload": <challenge_response>

}


Worker

11 d.

LifecycleManager


SASLResponse2

{

  "payload": null (indicates success)

}


Worker

12.

LifecycleManager


ReserveSlots


Worker

13.

LifecycleManager


ReserveSlotsResponse


Worker

1 The application client should set the unique applicationId as the Sasl username. On the server, this username will be used to look up the secret of the application.

2 SASLResponse is either a RpcResponse message in case of success; or a RpcFailure message in case of a failure.

Background: Current Shuffle process in Celeborn

  1. Mappers lazily ask LifecycleManager to registerShuffle.
  2. LifecycleManager requests slots from Master.
  3. Workers reserve slots and create corresponding files.
  4. Mappers get worker locations from LifecycleManager.
  5. Mappers push data to specified workers.
  6. Workers merge and replicate data to its peers.
  7. Workers flush the disk periodically.
  8. Mapper tasks accomplish and trigger MapperEnd events.
  9. When all mapper tasks are complete, workers commit files.
  10. Reducers ask for file locations.
  11. Reducers read shuffle data.

Authentication Initialization

Authentication initialization provides a framework for both client and server to declare their supported SASL mechanisms, and allow for both deployments to customize what mechanism to use and Celeborn to add newer or deprecate/remove older authentication mechanisms gracefully.

Client Authentication

In YARN mode, Apache Spark propagates its SecurityManager’s shared secret to the external shuffle service (ESS) in Node Manager, so that incoming connections can be authenticated using SASL DIGEST-MD5. This is done using YARN specific api’s when registering a container with the Node Manager.

Deployments moving from ESS based shuffle to Celeborn, especially in secure environments, will require equivalent functionality - but Celeborn cannot leverage YARN api’s in order to share secrets.

Client authentication is used to authenticate, and negotiate the shared secret for connection authentication.

Client authentication itself, on other hand, would typically be starting with a lack of mutual trust between Celeborn and the applications - in which case SASL ANONYMOUS over a TLS secured connection can be used. Note - if an application identifier has already been registered, the request will fail - so existing registrations cant be hijacked.

Connection Authentication

Connection authentication is done per connection. When a connection is created, based on the SASL connection_auth mechanism the client has selected (during registration), the client and the server exchange SASL messages to complete authentication - before the connection can be used for exchanging other protocol messages.

Propagation of shared secret to Workers

“How” shared secret is propagated

Master will propagate the shared secret, for connection authentication, to the Workers which were selected for the slots.

We have 2 options:

  1. Preferred: Master listens on a TLS socket which workers connect to - and this information is exchanged over a secure connection.
  2. Use a shared key, among all the Celeborn servers - the application secret should be encrypted using this key, when propagating to Workers.

“When” shared secret is propagated

We have 2 options: 

  • Preferred: Master pushes the information to the workers asynchronously and does not wait for it to be received before responding to the driver.
    • When the client connects to the Worker, the worker may not have the shared secret to authenticate the connection.
    • In this case, the worker will have to fetch this from the Master in order to authenticate the request.
    • The delay for an application to reserve slots is minimal, however, it may increase the connection establishment time.
    • Enables graceful restarts for workers without requiring the storage of application secrets in LevelDB.
  • Master responds to the client with the slots information only after it propagates the application’s shared secret information to the selected workers successfully.
    • This can add delay for the application to reserve slots.


Sharing the secret between Celeborn Masters

Celeborn uses the RAFT consensus algorithm for fault tolerance and durability for the master, and relies on Apache Ratis for the implementation. Ratis supports mutual authentication, which will be leveraged to secure communication between Celeborn masters.

Ratis also saves state information in local files. Storing encrypted application secrets in the Ratis store is beyond the current scope. Once we add support for encryption at rest, we can then implement encrypted secret storage within Ratis.

Propagation of shared secret to Mappers and Reducers

The selected SASL mechanism for connection authentication and shared secret needs to be propagated to the mappers/reducers by the LifecycleManager.

We have 2 options:

  1. LifecycleManager can send the secret to mappers and reducers (in addition to other metadata) as a response to "RegisterShuffle" and "FileLocationRequest". 
    • There is no need to introduce new request/response messages.


  1. ShuffleClients have a dedicated RPC to fetch this information from the LifecycleManager. If the LifecycleManager hasn’t registered the application with the Celeborn Master, then it will register the application and reply to the ShuffleClients.

We need to investigate further to determine the better option.

Backward Compatibility

Changes will be backward compatible to ensure seamless rolling-upgrades in the production environment.

  • We will use different ports for secured communications between components.
  • We want to have separate netty servers on the Master - one serving client requests and one serving internal components (workers and other masters). These changes will be flag-guarded by a config.




  • No labels