IDIEP-142
Author
Sponsor
Created

  

Status
DRAFT


Motivation

As a user, I want to leverage DNS to keep the client connected to all cluster nodes, even in the face of topology changes.

Ignite 2.x cluster discovery does not work for me: it uses cluster().nodes() where internal node IP addresses are not accessible by external clients due to network configuration.

(Current state: host name is resolved to a single IP once).

Use Cases

DNS-based Client Cluster Discovery

Provide one hostname in client config => the client connects to all nodes and handles topology changes.

In detail:

  • DNS is configured so that a single cluster hostname resolves to multiple IP addresses, each corresponding to an active cluster node.
  • DNS records are updated (either manually or via orchestration tooling, details are out of scope here) to reflect changes in cluster topology.
  • The client resolves the configured hostname into the set of IP addresses and re-resolves periodically to detect topology changes.
  • The client establishes connections to all resolved IP addresses to enable existing partition awareness failover mechanisms.

Real-World Scenario: Kubernetes Headless Service

K8s headless service approach is typically used for distributed systems (Redis, Cassandra, Kafka, etc).

  • One host name “my-db.default.svc.cluster.local” resolves to all pod addresses
    • Only “ready” pods are returned by default
      • If a node is not in the cluster yet, it is not “ready”
  • Pods come and go, their IPs can change, K8s handles that
  • From inside the container (e.g. on Ignite server) we don’t know the external host name/IP of the pod

Description

  1. When a host name is provided in IgniteClientConfiguration#addresses, the client should resolve it and use all returned IPs as potential node addresses
  2. Resolve all known host names again and connect to newly discovered addresses:
    1. On any connection error (might indicate node failure)
    2. On primary replica change (might indicate topology change)
    3. On timer (to handle DNS caching)


Note: we could use DNS record TTL value to trigger a refresh, but Java standard library does not have any APIs to get it, and we want to avoid extra third-party dependencies just for that.

Out of Scope

  • Reconnect/retry/failover - already implemented
  • Partition awareness - works over active connections, not concerned with discovery logic

Risks and Assumptions

N/A

Discussion Links

Tickets

Key Summary T Created Updated Due Assignee Reporter P Status Resolution
Loading...
Refresh

  • No labels