IDIEP-36
Author

Denis Magda

Sponsor

Denis Magda

Created

 

Status

IN PROGRESS


Motivation

Ignite codebase and release packages mix both core capabilities with 3rd party integrations. It leads to the following:

  • Cumbersome and continuously growing codebase with many 3rd-party dependencies. 
  • Some of the integrations are questionable and should no longer be supported by the community at all.
  • Integrations evolution is bound to Ignite release cycles even though no changes are needed in the core.
  • Ignite community has to support everything (test, release, fix, continue development) which requires to have particular integration experts on a permanent basis - doesn't work.


The goal of the IEP is to solve these challenges by doing the following:

  • Split existing Ignite codebase into the core and modules (key integrations to be supported by the Ignite community).
  • Select key modules which the community will continue to test and support. Move other integrations out of community support/control.
  • Ensure that modules can be released and evolve separately from the Ignite core.
  • Modules will be in their own repositories that will let to support different versions of a product Ignite is integrated with via branching (for instance there is a demand to support different Spark versions, but with the present monolithic architecture it is challenging). 

Description

Below you can find a definition of Ignite core, list of modules to be supported by the community and integrations that will go out of the community control.

Ignite Core

Ignite core is a set of features and components that define the project's key capabilities and benefits such as a distributed memory-centric storage, RDBMS acceleration, transactions, and more. Most of such components are developed from scratch by the community and have minimal dependencies on 3rd parties:

  • Memory-Centric Storage, Native Persistence, RDBMS Integration (CacheStore for RDBMS)
  • Key-Value APIs (NO JCache support)
  • SQL
  • Compute Grid
  • Service Grid
  • Machine Learning APIs
  • Advanced queries -scan, continuous
  • Transactions
  • Data Structures and Atomics
  • Ignite Messaging
  • Core Streaming APIs such as IgniteDataStreamer
  • Logging
  • Metrics and Tracing Framework
  • Command line tools and scripts such as Visor and control.sh
  • Standard (aka. thick clients) - Java, .Net, C++.
  • Spring Core - needed for configuration needs.

Ignite Modules

Ignite modules are integrations that will be developed, supported, and released by Ignite community. Such integrations are important ones and have either significant or growing demand. Presently, the list is as follows:

  • Spark Integration (to be discussed)
  • SpringData and SpringBoot
  • TensorFlow Integration
  • Cassandra Integration

Requirements and Installation

Each module has to satisfy the following criteria:

  • Each module should be located in a separate GitHub repository.
  • A module can be released separately from the core.
  • A module has to be tested with existing testing tools like TeamCity.
  • Each module is tested for every Ignite core release. A new version of a module is released together with the core release if changes are required.
  • Modules versioning doesn't need to be aligned with the core. A compatibility matrix between the core and a will be maintained.


A module needs to be released and packaged in the form of:

  • Maven/Nuget/Npmjs or another artifact depending on a programming language.
  • Docker image (for selected plugins, to be decided on a case by case).
  • Separate binary downloadable.


Even though the modules are released and packaged separately, there has to be an easy way to move modules' binaries to the Ignite core folder:

  • Users downloads ignite-module-x plugin (with all of the JARs needed)
  • A script like that is executed to add it to the Ignite core folder - ./ignite_plugin.sh install /Downloads/ignite-module-x
  • ignite-module-x libs/JARs are installed into IGNITE_HOME/libs

Thin Clients

Thin clients are a special type of Ignite modules:

  • Thin Clients for the following programming languages - Java, .Net, C++, Node.JS, Python, PHP.
  • JDBC and ODBC drivers


Thin clients should be stored in separate repositories and packaged in different forms - binaries, Maven/Nuget/Npmjs (depending on a language type). The clients can be released independently of the core.

Independent Integrations

Below is a list of existing integrations that won't be turned into Ignite modules but rather would be moved to separate Github repositories and won't be maintained by Ignite community for every core release. If later the community sees demand for an unsupported integration, it can be taken back and be officially supported (testing, dev, releases, compatibility with the core) as an Ignite module.

We discussed in dev list[1] and agreed on creating a new repository for hosting our Ignite integrations. 

`[1]` http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Proposal-for-Ignite-Extensions-as-a-separate-Bahir-module-or-Incubator-project-td44064.html

As discussed [2] with respect to releases all the extensions need to be verified for an upcoming release and updated if needed (with the version increase only for those updated)

`[2]` http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-dependencies-and-release-process-for-Ignite-Extensions-td44478.html

The list of integrations that to be moved to independent Github repositories and will not be supported by the community for every Ignite release:

Integration Name+1 (explain if needed)-1 (with explanation)
Kafka
Twitter




ZeroMQ

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov 

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Saikat Maitra

Alexey Zinoviev - all streaming tools/modules should be kept in one place (as part of AI or as Streaming AI separate project)

RocketMQ

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Saikat Maitra

Alexey Zinoviev - all streaming tools/modules should be kept in one place (as part of AI or as Streaming AI separate project)

Storm

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Saikat Maitra

Alexey Zinoviev - all streaming tools/modules should be kept in one place (as part of AI or as Streaming AI separate project)

Flume

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Saikat Maitra

Alexey Zinoviev - all ETL tools/modules should be kept in one place (as part of AI or as ETL AI separate project) I mean that Flume is tool for loading big datasets to AI

Flink

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Saikat Maitra

Alexey Zinoviev - all streaming tools/modules should be kept in one place (as part of AI or as Streaming AI separate project)

MQTT

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Alexey Zinoviev

Pavel Kovalenko

Saikat Maitra


Camel

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk

Alexey Zinoviev

Pavel Kovalenko

Saikat Maitra


Hibernate

Denis A. Magda - Spring Data gets much bigger adoption for Ignite deployments. Don't see a lot of traction with Hibernate. It's hard to maintain it in various variations - Ignite goes with several modules of different versions. Better to have as an independent Github project with forks for specific Hibernate versions.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Alexey Zinoviev - I suppose it's useful feature for wide adoption among Java Devs who use AI not like cache, but like database

JMS

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Saikat Maitra

Alexey Zinoviev - all streaming tools/modules should be kept in one place (as part of AI or as Streaming AI separate project). Also, I didn't see the Kafka Integration in this list

AOP-Based Grid

Denis A. Magda - low usage, better to have as an independent Github project that can be maintained by anybody.

Alexey Kuznetsov

Alexey Goncharuk Maybe drop it at all because moving this into a separate project may be a pain - a lot of internal API usages

Sergey Kozlov

Alexey Zinoviev 

Pavel Kovalenko


JSR-107(JCache)

Denis A. Magda - don't see any value in supporting this JSR rather than claiming that specification. It's better to have much cleaner Ignite key-value API without any dependencies influenced by the specification.

Alexey Goncharuk

Sergey Kozlov

Pavel Kovalenko

Anton Vinogradov

Alexey Zinoviev we should ask about that question the user-community, I have heard many times that that the JCache implementation  is important for Java Devs

Ivan Pavlukhin It is quite natural for me to imagine integration with Ignite using some kind of standard API. The situation with JCache is similar to JDBC. AFAIR Spring has a JCache integration. If we are going to evolve caching trait then we should support easy integartion with Spring. If there alternatives to JCache then we should consider them.

OSGi

Denis A. Magda - this integration is already broken and badly maintained. Haven't come across anybody who uses OSGi in the projects Ignite is targeted for.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Alexey Zinoviev

Pavel Kovalenko


YARN

Denis A. Magda - not sure it's useful any longer and should be supported by the community.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Alexey Zinoviev What was the purpose of this integration?

Pavel Kovalenko


Mesos

Denis A. Magda - not sure it's useful any longer and should be supported by the community.

Alexey Kuznetsov

Alexey Goncharuk

Sergey Kozlov

Alexey Zinoviev What was the purpose of this integration?

Pavel Kovalenko


.NET: Legacy Entity Framework and ASP.NET integrations

Denis A. Magda - outdated, needs to be replaced with a new one version.

Alexey Kuznetsov

Pavel Tupitsyn integrations with legacy technologies; also blocks .NET Core migration 

Sergey Kozlov

Alexey Zinoviev


Scalar

Alexey Goncharuk Not used, brings unnecessary dependency on scala, adds library conflicts

Sergey Kozlov

Alexey Zinoviev



The list of the integrations to be removed completely (don't even move them to an independent Github repo):

Integration+1 (explain if needed)-1 (with explanation)

Redis and Memcached protocols support

Denis A. Magda - not sure why these 2 were supported in the first place.

Sergey Kozlov - thin clients provide full and rich replacement

Alexey Zinoviev


 ignite-clients module

Denis A. Magda - we already have Thin Clients, duplicate features with fewer capabilities

Sergey Kozlov

Alexey Zinoviev


Hadoop Accelerator and IGFS

Denis A. Magda - community has already voted for the removal.

Sergey Kozlov

The community has already voted for the removal.

Alexey Zinoviev IGFS should be moved to the separate package in the first

AOP-Based Grid

Alexey Goncharuk Unused, hard to move to a separate module due to many internal API usages

Sergey Kozlov

Alexey Zinoviev


Ignite Schedule

Alexey Goncharuk same issues as with local caches

Sergey Kozlov - it can be implemeted in user code

Alexey Zinoviev

Pavel Kovalenko Ignite integration with distributed schedulers like Airflow or Oozie can be a better decision.


APIs for Removal

As part of the modularization, that needs to be considered for Ignite 3.0, it's worthwhile listing all the APIs that the community is planning to remove in Ignite 3.0. The APIs can belong to both Ignite core and modules that will stay in Ignite and will be officially supported by the community:


API+1 (explain if needed)-1 (with explanation)
Already deprecated APIs
Local caches

Denis A. Magda

Alexey Kuznetsov Can we "emulate" local cache by partitioned with "node filter to one node"?

Alexey Goncharuk Local cache is meaningless in a distributed system, especially when a transaction is involved: suppose a prepare step completed and a node with local cache goes down. According to 2PC, we cannot proceed until the node goes up again

Anton Vinogradov

Sergey Kozlov

Alexey Zinoviev

Pavel Kovalenko


Spatial indexes

Denis A. Magda - the API is broken and not suited for production. Can be designed from scratch with Ignite 3.1 and 3.2.

Alexey Goncharuk

Sergey Kozlov

Alexey Zinoviev It could be removed only in the 3.1 then the new implementation will be provided in other case we will poor with indices

Full-text search

Denis A. Magda - the API is broken and not suited for production. Can be designed from scratch with Ignite 3.1 and 3.2.

Alexey Goncharuk

Alexey Zinoviev It could be removed only in the 3.1 then the new implementation will be provided in other case we will poor with full-text indices

Checkpointing SPI

Denis A. Magda - Ignite caches/tables can be used to store the checkpoints. This API is redundant.

Alexey Goncharuk

Sergey Kozlov

Alexey Zinoviev 

Pavel Kovalenko


Ignite.lock

Alexey Goncharuk The distributed lock concept is broken (see corresponding Aphyr video). The usual lock usage pattern is lock - update cache - unlock. The same semantics can be achieved using pessimistic transactions with keys locking, but instead, for an end-user, it is clear how lock can fail in a distributed system (transaction rollback).

Anton Vinogradov

Sergey Kozlov

Pavel Kovalenko Very fragile API. Not tested well. Some failures can lead to unavailability of controlling such locks. Can be returned back and reworked after implementation of some consensus algorithm like Raft.

Denis A. Magda - unless an alternate solution is provided as part of existing Transactional APIs. There has to be a clear migration guide for Ignite.lock customers.

Alexey Zinoviev It could be removed only in the 3.1 then the new implementation will be provided 

Cache.lock

Alexey Goncharuk Same arguments as with Ignite.lock, but even more redundant API

Anton Vinogradov

Sergey Kozlov

Pavel Kovalenko

Alexey Zinoviev It could be removed only in the 3.1 then the new implementation will be provided 

Ignite Data Structures

Pavel Kovalenko Can be marked as not safe and reworked after implementation of some consensus algorithm like Raft.

Ivan Pavlukhin The implementation quality is really not sufficient. As for me data structures in current flavor should not be used in production. We can deprecate current implementation as a first step. Later on we can add all necessarry implementations as they are ready and a required development effort seems not trivial.

Denis A. Magda - that's basic functionality of every IMDG and in-memory cache. We can't remove it. Instead, let's plan through activities that improve or reconsider current implementations.

Pavel Tupitsyn

Sergey Kozlov

Alexey Zinoviev Agree with Denis A. Magdaalso we should extend the list of basic structures (ready to help)

GAR files

"Force server mode" for client nodes


Daemon nodes

Denis A. Magda - Visorcmd has to be preserved and updated to another protocol

Alexey Kuznetsov - Visorcmd should be merged with controls.sh

Alexey Goncharuk

Anton Vinogradov

Sergey Kozlov

Alexey Zinoviev

Pavel Kovalenko


CacheRebalanceMode.NONE and Rebalance Delay

Denis A. Magda - it can break data consistency in a cluster. Also, remove force rebalance mode as it can be used only if rebalance delay is set.

Alexey Goncharuk The mode does not make sense, cannot be explained to an end-used

Anton Vinogradov

Sergey Kozlov

Alexey Zinoviev

Pavel Kovalenko


Indexing SPI

Denis A. Magda - it's highly unlikely that anybody used this. The community supports all the querying engines on its own.

Alexey Goncharuk

Sergey Kozlov

Alexey Zinoviev

Pavel Kovalenko

Ivan Pavlukhin


QueryEntities and Annotations based configuration of SQL

Denis A. Magda - Alexey Goncharuk is going to propose an alternate API that unites these concepts.

Sergey Kozlov - require full redesign, it should be fully compatible JDBC SQL and visa-versa

Alexey Zinoviev

Pavel Kovalenko

Ivan Pavlukhin


@CentralizedAffinityFunction

Alexey Goncharuk This API is no longer needed after exchange merge was introduced

Sergey Kozlov

Alexey Zinoviev

Pavel Kovalenko


IgniteCache.localPeek, localEntries, localClear, localSize

Ivan Pavlukhin These methods look more like a debug stuff and have confusing semantics. At least should be moved outside of IgniteCache facade.

Anton Vinogradov Such API useful for tests and bug analysis


Risks and Assumptions

  • Challenges with a modification of existing build and testing procedures
  • Release policies have to be updated to ensure that modules & core versions compatibility matrix is updated regularly. 

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-Modularization-td42486.html

http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-Proposal-for-Ignite-Extensions-as-a-separate-Bahir-module-or-Incubator-project-td44064.html

http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSS-dependencies-and-release-process-for-Ignite-Extensions-td44478.html


Reference Links

This initiative is also related to the discussion of Apache Ignite APIs update/removal - Apache Ignite 3.0 Wishlist

Tickets


Key Summary T Updated Assignee Priority Priority Priority Priority P Status Resolution
Loading...
Refresh


  • No labels

2 Comments

  1. Anton Vinogradov, do you think that IgniteCache.localPeek (and friends) must be placed on IgniteCache facade?

    1. Ivan Pavlukhin I understand the idea that Ignite is distributed and it does not matter how it keeps the data, but why not?

      For example localSize is a good feature to see how many data you have at this node. Do we have another facade for it?

      Let's discuss each API method separately.

      Anyway, that's not a first priority question.