Ignite-3: .NET Native Compute and future support for other platforms.
ID | IEP-136 |
Author | |
Sponsor |
|
Created | |
Status | |
As a user, I want to implement Compute jobs in C#/.NET (or another language of my choice).
- All business logic should be in the same language
- My team does not have Java expertise
- Compute jobs need to reuse existing .NET code (libs, DTOs, utils, etc)
- Different teams use different languages in a big company
- I can call Compute jobs written by other teams, no matter which language they use
- Any client/embedded API can call a job in any language (C# -> Python, Java -> C++, etc)
- Reusable mechanism/protocol to run Compute jobs written in any language
- One Ignite node can run jobs in many languages at the same time
- Avoid Ignite 2.x scenario which had “Java node”, “C# node”, “C++ node” where C++ node can’t run C# jobs
There are two parts - job execution on client (ICompute.SubmitAsync) and job implementation on server.
Job execution API does not need to change.
- A deployment unit can contain binaries in any supported language
- JobDescriptor#jobClassName can refer to a type name in any language
Ignite needs to understand the job platform to invoke a correct executor (Java, .NET, Python, etc).
- Explicitly set in the JobDescriptor (enum)
- Automatically set in some cases (MyJob.class is passed in Java, typeof(MyJob) in .NET)
Pros:
- Easy to implement
- Explicit and clear to the user
- Can mix multiple platforms in one deployment unit
- Later, we could have ExecutorType.JavaSidecar, JavaRemote, etc
- The same JobDescriptor can be used to run jobs on different executors => the property should be in JobExecutionOptions
JobExecutionOptions options = JobExecutionOptions.builder().executorType(JobExecutorType.DotNet).build();
JobDescriptor<Object, Object> jobDesc = JobDescriptor
.builder("MyNamespace.MyJob, MyAssembly")
.options(options)
.build();
.NET New APIs: IComputeJob, IJobExecutionContext
public interface IComputeJob<TArg, TResult>
{
IMarshaller<TArg>? InputMarshaller => null;
IMarshaller<TResult>? ResultMarshaller => null;
ValueTask<TResult> ExecuteAsync(IJobExecutionContext context, TArg arg, CancellationToken cancellationToken);
}
public interface IJobExecutionContext
{
IIgnite Ignite { get; }
}
Example job implementation:
public class ToStringJob : IComputeJob<object?, string?>
{
public async ValueTask<string?> ExecuteAsync(IJobExecutionContext context, object? arg, CancellationToken cancellationToken)
{
await context.Ignite.Tables.GetTablesAsync();
return arg?.ToString();
}
}
- Java server starts a .NET process which runs the client in a special mode that allows Compute job execution
- Client passes the secret back as a handshake extension
- Auth is bypassed
- The process is started only once, on demand (first .NET job), then reused
- Pass a secret as a process start arg to identify the “special” client
- Later, we can allow “external” executors - in a separate container, remote machine, k8s replicaSet, etc
- Client protocol already supports server -> client calls
- Reusable approach for other languages
- Separate process provides isolation
- Process crashes or hangs - restart automatically
- Multiple processes can be started if one is not enough to handle the load
- SSL: supported
- Requires additional configuration if client certs are required
- Java server identifies the job platform (language) from the options
- Starts the special client on demand
- .NET runtime not required, zero overhead until you actually try to run a .NET job
- No configuration required
- Server sends deployment units location, job class name, argument to the client
- Client loads the binaries, executes the job and returns the result
- Client process keeps running for reuse
- Client process goes down with the parent in case of node shutdown
- Shutdown by idle timeout can be added too
Client protocol reuse is the quickest way to add platform compute to Ignite, and the most universal (easily implemented in any language). However, it is not the most performant (~40us overhead per API call - see Performance section below).
We expect to add an alternative, more performant mechanism in the future (such as embedded CLR with JNI interop, similar to Ignite 2.x).
- Reuse existing execution flow in Java
- Schedule, acquire deployment units, etc. Only at the last step we delegate to the platform executor.
- Deployment unit lifetime in .NET and Java are linked
- When undeploying a unit, notify all platform executors to clean up accordingly
Exceptions in .NET jobs should be captured together with the full stack trace:
- Passed back to the Java server and logged
- Passed back to the caller (stack trace is handled with respect to SendServerExceptionStackTraceToClient config)
Refer to Ignite 2.x which successfully intermixed .NET and Java stack traces in a way that is displayed nicely in most IDEs.
- Include the “special client” binary (.dll) with the database package (zip download)
- The .dll is a “console app” in .NET terms - another project and assembly
- The .dll is OS-independent and requires .NET runtime to run
- Zip download - no runtime (no Java, no .NET)
- RPM, DEB - no runtime (no Java, no .NET)
- Docker - Slim, Medium, Fat images
- Only Fat image includes all possible runtimes for Compute jobs
As a user, you follow the steps:
- Create a new “class library” type project (dotnet new classlib)
- Implement IComputeJob interface
- Build the project
- cli unit deploy myproj/bin/Release
Use AssemblyLoadContext to implement loading/unloading and support multiple versions of the same assembly.
Every deployment unit should have a separate load context.
Use the same “special” client instance that handles job execution and expose it in JobExecutionContext
- New handshake extension COMPUTE_EXECUTOR_ID
- New feature flag PLATFORM_COMPUTE_JOB (send non-Java jobs to server; all clients are supposed to support this)
- New feature flag PLATFORM_COMPUTE_EXECUTOR (execute jobs requested by server; only .NET client will support this for now)
- New response header flag SERVER_OP_FLAG (indicates a server->client request)
- Builds on existing notification mechanism
- Generic, allows different server ops (e.g. start job, cancel job, etc)
- New operation ClientOp.SERVER_OP_RESPONSE (client responds to a request with SERVER_OP_FLAG)
- ServerOp enum
COMPUTE_JOB_EXEC
COMPUTE_JOB_CANCEL
DEPLOYMENT_UNIT_UNDEPLOY
No limitations. We expect to be able to run any user code as part of a Compute Job with this approach.
Benchmark Mode Cnt Score Error Units
execJavaLocal avgt 3 24.401 ± 11.221 us/op
execDotNetLocal avgt 3 67.523 ± 61.175 us/op
execJavaLocalClient avgt 3 70.879 ± 52.371 us/op
execJavaRemote avgt 3 100.285 ± 45.470 us/op
execDotNetRemote avgt 3 128.691 ± 73.251 us/op
~40 us overhead for a local socket roundtrip
- Once for the job itself
- Once for every API call from within the job
None. Client protocol change will be handled via feature flag, preserving backwards and forwards compatibility.
Key
|
Summary
|
T
|
Created
|
Updated
|
Due
|
Assignee
|
Reporter
|
P
|
Status
|
Resolution
|