Ignite-3: .NET Native Compute and future support for other platforms.

IDIEP-136
Author
Sponsor
Created

  

Status

COMPLETED


Motivation

As a user, I want to implement Compute jobs in C#/.NET (or another language of my choice).

Use Cases

Case 1: “.NET Shop”

  • All business logic should be in the same language
  • My team does not have Java expertise
  • Compute jobs need to reuse existing .NET code (libs, DTOs, utils, etc)

Case 2: “Cross-Team Interop”

  • Different teams use different languages in a big company
  • I can call Compute jobs written by other teams, no matter which language they use

Description

Requirements

  • Any client/embedded API can call a job in any language (C# -> Python, Java -> C++, etc)
  • Reusable mechanism/protocol to run Compute jobs written in any language
  • One Ignite node can run jobs in many languages at the same time
    • Avoid Ignite 2.x scenario which had “Java node”, “C# node”, “C++ node” where C++ node can’t run C# jobs

Public API

There are two parts - job execution on client (ICompute.SubmitAsync) and job implementation on server.

Job Execution

Job execution API does not need to change. 

  • A deployment unit can contain binaries in any supported language
  • JobDescriptor#jobClassName can refer to a type name in any language

Ignite needs to understand the job platform to invoke a correct executor (Java, .NET, Python, etc).

Add JobExecutionOptions.ExecutorType property

  • Explicitly set in the JobDescriptor (enum)
  • Automatically set in some cases (MyJob.class is passed in Java, typeof(MyJob) in .NET)

Pros:

  • Easy to implement
  • Explicit and clear to the user
  • Can mix multiple platforms in one deployment unit

Notes

  • Later, we could have ExecutorType.JavaSidecar, JavaRemote, etc
  • The same JobDescriptor can be used to run jobs on different executors => the property should be in JobExecutionOptions
JobExecutionOptions options = JobExecutionOptions.builder().executorType(JobExecutorType.DotNet).build();

JobDescriptor<Object, Object> jobDesc = JobDescriptor
       .builder("MyNamespace.MyJob, MyAssembly")
       .options(options)
       .build();

.NET New APIs: IComputeJob, IJobExecutionContext

public interface IComputeJob<TArg, TResult>
{
   IMarshaller<TArg>? InputMarshaller => null;
   IMarshaller<TResult>? ResultMarshaller => null;

   ValueTask<TResult> ExecuteAsync(IJobExecutionContext context, TArg arg, CancellationToken cancellationToken);
}

public interface IJobExecutionContext
{
   IIgnite Ignite { get; }
}

Example job implementation:

public class ToStringJob : IComputeJob<object?, string?>
{
   public async ValueTask<string?> ExecuteAsync(IJobExecutionContext context, object? arg, CancellationToken cancellationToken)
   {
       await context.Ignite.Tables.GetTablesAsync();

       return arg?.ToString();
   }
}

Interop: Client Protocol (TCP Socket)

  • Java server starts a .NET process which runs the client in a special mode that allows Compute job execution
    • Client passes the secret back as a handshake extension
    • Auth is bypassed
    • The process is started only once, on demand (first .NET job), then reused
    • Pass a secret as a process start arg to identify the “special” client
    • Later, we can allow “external” executors - in a separate container, remote machine, k8s replicaSet, etc
  • Client protocol already supports server -> client calls
  • Reusable approach for other languages
  • Separate process provides isolation
    • Process crashes or hangs - restart automatically
  • Multiple processes can be started if one is not enough to handle the load
  • SSL: supported
    • Requires additional configuration if client certs are required

Execution Flow

  • Java server identifies the job platform (language) from the options
  • Starts the special client on demand
    • .NET runtime not required, zero overhead until you actually try to run a .NET job
    • No configuration required
  • Server sends deployment units location, job class name, argument to the client
  • Client loads the binaries, executes the job and returns the result
  • Client process keeps running for reuse
  • Client process goes down with the parent in case of node shutdown
    • Shutdown by idle timeout can be added too

Later: More Efficient Transports

Client protocol reuse is the quickest way to add platform compute to Ignite, and the most universal (easily implemented in any language). However, it is not the most performant (~40us overhead per API call - see Performance section below).

We expect to add an alternative, more performant mechanism in the future (such as embedded CLR with JNI interop, similar to Ignite 2.x).

Deployment Unit Handling

  • Reuse existing execution flow in Java
    • Schedule, acquire deployment units, etc. Only at the last step we delegate to the platform executor.
  • Deployment unit lifetime in .NET and Java are linked
    • When undeploying a unit, notify all platform executors to clean up accordingly

Error Handling

Exceptions in .NET jobs should be captured together with the full stack trace:

  • Passed back to the Java server and logged
  • Passed back to the caller (stack trace is handled with respect to SendServerExceptionStackTraceToClient config)


Refer to Ignite 2.x which successfully intermixed .NET and Java stack traces in a way that is displayed nicely in most IDEs.

Packaging

  • Include the “special client” binary (.dll) with the database package (zip download)
  • The .dll is a “console app” in .NET terms - another project and assembly
  • The .dll is OS-independent and requires .NET runtime to run

.NET Runtime Dependency

  • Zip download - no runtime (no Java, no .NET)
  • RPM, DEB - no runtime (no Java, no .NET)
  • Docker - Slim, Medium, Fat images
    • Only Fat image includes all possible runtimes for Compute jobs

Implementing a .NET Compute Job

As a user, you follow the steps:

  • Create a new “class library” type project (dotnet new classlib)
    • Implement IComputeJob interface
    • Build the project
  • cli unit deploy myproj/bin/Release


.NET Job Execution Implementation Details

Assembly Loading and Versioning

Use AssemblyLoadContext to implement loading/unloading and support multiple versions of the same assembly. 

Every deployment unit should have a separate load context.

Expose Ignite API to Compute Jobs

Use the same “special” client instance that handles job execution and expose it in JobExecutionContext

Client Protocol Changes

  • New handshake extension COMPUTE_EXECUTOR_ID
  • New feature flag PLATFORM_COMPUTE_JOB (send non-Java jobs to server; all clients are supposed to support this)
  • New feature flag PLATFORM_COMPUTE_EXECUTOR (execute jobs requested by server; only .NET client will support this for now)
  • New response header flag SERVER_OP_FLAG (indicates a server->client request)
    • Builds on existing notification mechanism
    • Generic, allows different server ops (e.g. start job, cancel job, etc)
  • New operation ClientOp.SERVER_OP_RESPONSE (client responds to a request with SERVER_OP_FLAG)
  • ServerOp enum
    • COMPUTE_JOB_EXEC
    • COMPUTE_JOB_CANCEL
    • DEPLOYMENT_UNIT_UNDEPLOY

Limitations

No limitations. We expect to be able to run any user code as part of a Compute Job with this approach.

Performance

Benchmark            Mode  Cnt      Score      Error   Units
execJavaLocal        avgt    3     24.401 ±   11.221   us/op
execDotNetLocal      avgt    3     67.523 ±   61.175   us/op
execJavaLocalClient  avgt    3     70.879 ±   52.371   us/op
execJavaRemote       avgt    3    100.285 ±   45.470   us/op
execDotNetRemote     avgt    3    128.691 ±   73.251   us/op

~40 us overhead for a local socket roundtrip

  • Once for the job itself
  • Once for every API call from within the job

Risks and Assumptions

None. Client protocol change will be handled via feature flag, preserving backwards and forwards compatibility.

Discussion Links

Reference Links

Tickets

Key Summary T Created Updated Due Assignee Reporter P Status Resolution
Loading...
Refresh

  • No labels