Current state: "Accepted"
Discussion thread: here
KAFKA-9648Getting issue details...
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
In large scale Kafka cluster which handles requests from massive clients, preferred leader election (e.g. upon restarting broker) could cause many clients to open connection to a broker in a short period.
Sometimes this causes Acceptor socket's SYN backlog to be filled up. In case this happens, further incoming connections will be handled differently depending on `tcp_syncookies` kernel parameter in Linux.
- Drop further SYN packets (`tcp_syncookies = 0`)
- Typically this should not be a critical problem since clients will attempt reconnecting (depending on `tcp_syn_retries` though)
- However, retries will cause certain delay until successful connection so should be avoided as far as possible
- SYN packets are handled with "SYN cookies" (`tcp_syncookies = 1`)
- In short, SYN cookies is a stateless way to handle SYN without consuming SYN backlog
- It's known that this could cause subtle bug that producer slowing down due to inconsistent window-scaling factor between client and broker
Both are undesirable, and can be mitigated by increasing backlog size passed to `ServerSocket#bind()` as necessary.
We propose a new KafkaConfig
- This may be an integer config and passed as the parameter for ServerSocket#bind()
- Add new integer integer config socket.listen.backlog.size with default value 50
- Pass socket.listen.backlog.size to ServerSocket#bind() when creating Acceptor
Compatibility, Deprecation, and Migration Plan
- No impact
- Increase static backlog size without introducing new config
- Increasing backlog size may consume more memory, so appropriate value depends on the environment