ID | IEP-135 |
Author | |
Sponsor | |
Created |
|
Status | DRAFT |
Users with Apache Ignite 2 clusters will require a reliable process to move their existing data into an Apache Ignite 3 cluster.
This tool will focus only on data stored in Apache Ignite 2 persistent caches. Migration of memory-only data is out of scope. The conversion from Apache Ignite 2 caches to Apache Ignite 3 table schemas can be done based on cache configurations.
Installing and configuring an Ignite 3 cluster with similar configurations/topologies to an existing Ignite 2 cluster is outside the scope of this document.
This feature will be offered as a CLI tool that will roughly follow this process:
The steps will be discussed in more detail ahead.
This component must use a stripped-down version of Ignite 2 to read the persistence files without starting a full cluster. For instance, the network capabilities of Ignite are not required for this use case. The StandAloneGridKernalContext provides an example of a stripped-down version of the AI2 GridKernalContext, which can be used as a reference implementation.
The context should correctly load/restore the:
This component should not attempt to recover data from the Write Ahead Log (WAL) or execute any other maintenance process over the persistence files.
The target tables need to be explicitly created before streaming data to the Ignite 3 cluster. Unlike the previous version, Ignite 3 only allows creating tables explicitly using the SQL API. As a result, we need a component to translate the Ignite 2 cache configurations into a SQL `CREATE TABLE` statements.
The Ignite 2 cache configuration can be found in the node’s persistent work directory or the cluster configuration. In any case, the correct way to fetch these configurations would be through the GridKernalContext API after the context finished reading the metastore.
The Query Entities (QEs) in the cache configuration have information about key fields, which must be annotated with the primary key constraint. It also supports other constraints like ‘NOT NULL’, these should also be respected. Aliases should also be supported.
We have identified the following cache configuration scenarios:
Scenario 0:
Scenario 1:
Scenario 2:
Scenario 3:
Given each scenario, the SQL generator must:
Scenario 0:
Scenario 1:
Scenario 2:
Scenario 3:
The SQL Generator should also be exposed as a separate CLI command to enable the client to manually tune the table schema before migrating data.
The mapping between CacheDataRows and Ignite 3 Tuples can be done in two steps. First, the CacheDataRow key and value (KeyCacheObject and CacheObject) must be adapted into a tuple. This can be easily done after analyzing the CacheObject implementation. For BinaryObjectImpl instances, it is very easy to map the Row field names by retrieving the BinaryObjectImpl ‘rawType’. On the other hand, native Java types might not have a defined field name in the CacheObject, as discussed previously. In this case, we can map its value to a placeholder and resolve the actual field name in the next step.
The second step is to map the original field names to the target table schema. The schema can be retrieved using the ‘ClientTable#getLatestSchema’ method. During this step, mismatches between the Tuple’s field names or types and the schema can result in undefined behavior, which implies that we are dealing with Scenario 3 or that the schema was altered erroneously. As a result, we should allow the user to choose what to do with the mismatched columns/fields:
In either case, a detailed error message should be presented to the user.
After mapping the Tuples, the DataStreamerTarget interface will be used to publish to the Ignite 3 cluster.
Common parameters
parameter | description |
---|---|
--ignite2-directory | Work Directory of one or many Ignite 2 nodes |
--ignite2-node-consistent-id | Consistent ID of the Ignite 2 node |
--ignite2-config-xml | Ignite 2 configuration XML |
command spec | description |
---|---|
migration-tools cache list <--ignite2-directory> <--ignite2-node-consistent-id> <--ignite2-config-xml> | List available caches on the node |
command spec | description |
---|---|
| Migrate a cache from an Ignite 2 work dir into an Ignite 3 cluster |
Parameters
Parameter | description |
---|---|
--name | The cache name to migrate |
--cluster-endpoint-url | The URL of the Ignite 3 cluster; Must have at least one argument. |
--mode | Mapping error handling policy: ABORT (Default), SKIP_RECORD, IGNORE_COLUMN, PACK_EXTRA |
--rate-limiter | Limits the number of migrated records per second. Uses a very basic rate limiter implementation, and may be prone to bursts. |
--no-save-progress | Disables saving a progress file at the end of the run. |
--resume-from | Resume the migration based on the progress file provided. |
--retry-limit | Retries the migration up to N times on retrievable errors. 0 (Default) does not retry. Implies save |
--retry-backoff | Waits N seconds before retrying the next attempt at migrating the cache. Default: 0 (retry immediately). |
command spec | description |
---|---|
migration-tools cache ddl <--ignite2-directory> <--ignite2-node-consistent-id> <--ignite2-config-xml> [–-name] | Generate the corresponding SQL DDL creation script for persisted cache configurations |
Parameters
Parameter | description |
---|---|
--name | Generates the DDL statement for the provided cache. By default, all cache statements are generated |
The migration tools CLI should be distributed in a similar Zip package as the main Ignite CLI. The distribution should provide two binaries, for Windows and Unix-based systems.
The methods discussed in this document assume that all the applications and workloads have been stopped and the cluster has been shut down gracefully, meaning that the latest checkpoints contain the latest data before attempting to restore the Apache Ignite 2 data.
The migration cache process also requires a live Apache Ignite 3 cluster, the target cluster. Installing and configuring this cluster is out of the scope of this document. The migration tool should be able to recreate the caches/tables automatically for most use cases, using the SQL DDL Generator. However, the client may optionally create or alter the tables before running the persistent data migration tool. Nonetheless, the success of the migration is dependent on the compatibility of the AI3 table schema with the mapping process.
We decided not to rely on Snapshots for this tool.
// Links to discussions on the devlist, if applicable.
// Links to various reference documents, if applicable.