| ID | IEP-142 |
| Author | |
| Sponsor | |
| Created | |
| Status | DRAFT |
Motivation
As a user, I want to leverage DNS to keep the client connected to all cluster nodes, even in the face of topology changes.
Ignite 2.x cluster discovery does not work for me: it uses cluster().nodes() where internal node IP addresses are not accessible by external clients due to network configuration.
(Current state: host name is resolved to a single IP once).
Use Cases
DNS-based Client Cluster Discovery
Provide one hostname in client config => the client connects to all nodes and handles topology changes.
In detail:
- DNS is configured so that a single cluster hostname resolves to multiple IP addresses, each corresponding to an active cluster node.
- DNS records are updated (either manually or via orchestration tooling, details are out of scope here) to reflect changes in cluster topology.
- The client resolves the configured hostname into the set of IP addresses and re-resolves periodically to detect topology changes.
- The client establishes connections to all resolved IP addresses to enable existing partition awareness failover mechanisms.
K8s headless service approach is typically used for distributed systems (Redis, Cassandra, Kafka, etc).
- One host name “my-db.default.svc.cluster.local” resolves to all pod addresses
- Only “ready” pods are returned by default
- If a node is not in the cluster yet, it is not “ready”
- Pods come and go, their IPs can change, K8s handles that
- From inside the container (e.g. on Ignite server) we don’t know the external host name/IP of the pod
Description
- When a host name is provided in IgniteClientConfiguration#addresses, the client should resolve it and use all returned IPs as potential node addresses
- Resolve all known host names again and connect to newly discovered addresses:
- On any connection error (might indicate node failure)
- On primary replica change (might indicate topology change)
- On timer (to handle DNS caching)
Note: we could use DNS record TTL value to trigger a refresh, but Java standard library does not have any APIs to get it, and we want to avoid extra third-party dependencies just for that.
Out of Scope
- Reconnect/retry/failover - already implemented
- Partition awareness - works over active connections, not concerned with discovery logic
Risks and Assumptions
N/A
Discussion Links
Tickets
| Key
|
Summary
|
T
|
Created
|
Updated
|
Due
|
Assignee
|
Reporter
|
P
|
Status
|
Resolution
|