This page is a summary of results on the analysis I did on understanding the optimal value of max.inflight.requests.per.connection
as well as the performance impact of acks=all
.
Test Setup
- 3 brokers on AWS, d2.xlarge instances: 3x2TB locally attached disks. 32GB RAM, 4 Xeon cores
- 1 client machine in same availability zone.
- Each performance run produced 10GB of data.
Tests run against kafka commit 6bd73026.
Goal
- Understand the performance curve for different values of
max.in.flight.requests.per.connection.
We expect better throughput and latency for higher values of this variable. But when do the benefits tail off?
- If we want to support max.inflight > 1 when enabling idempotence, should we pick a single value and not allow further configuration? If so, what should this value be?
Understand the effect of acks=all when compared to acks=1. If it is slower why? Can we make acks=all the default?
Summary of results
p95 Latency
acks=1 | acks=all |
---|---|
Throughput
acks=1 | acks=all |
---|---|
Observations
- Throughput and latency show big improvements from max.inflight=1 to max.inflight=2, but the performance plateaus thereafter.
- Slight throughput degradation between acks=1 and acks=all.
- There is a major 2x degradation in p95 latency between acks=1 and acks=all except for 64 byte messages.
- Plots above are for 9 partitions. If you keep increasing the number of partitions, the difference between acks=1 and acks=all and max.inflight=1 and max.inflight=2 becomes smaller and smaller.
- This not surprising as as the number of partitions increases, the payload of each
ProduceRequest
is bigger, hence the relative overhead of additional operations per request is smaller.
- This not surprising as as the number of partitions increases, the payload of each
More on acks=1 and acks=all
For the run above, the p50 latency for acks=1 and acks=all is totally unintuitive.. it is actually better for acks=all, and also is worse for max.inflight=4 when compared to max.inflight=3
acks=1 | acks=all |
---|---|
At this time, there is nothing to explain the performance behavior of acks=all and acks=1:
- Broker metrics for both runs are similar (NetworkProcessorAvgIdlePercent, RequestHandlerIdlePercent, TotalProduceTime, etc.)
- GC logs are similar in terms of object allocations and the number of collections per second and the pause times.
Conclusion
From these tests, we can conclude the following:
- We should optimize the producer for max.inflight=2. The data suggests that there is really no benefit to any other value, especially when there is low latency between the client and the broker.
- We don't understand the behavior of acks=all and acks=1 across different workloads and across the entire latency spectrum. We should leave the default as is.