Scaling Kafka Processing

Scaling Problem

🧐How do we scale Kafka processing ?

Horizontally with number of consumers in a group.

🧐So can we have unlimited number of consumers?

No, number_of_consumers <= number_of_partitions

🧐So can we have a very large number of partitions?

  • Yes, but it is recommended to have some calculated number of partitions. Each partition has overhead on the the broker resources.
  • Choosing the right number of partition is the key. You want to keep it neither too small nor too big.
  • 😰 Going too small ? It means you won’t be able to achieve the required throughput and end up requiring more number of partitions. Increasing the number of partitions on a live application can be very challenging.
  • 😱 Going too big ? Overhead on the #broker resources i.e. #network bandwidth, disk space etc. Your data infra team won’t like it.

✌️How to calculate the number of partitions?

  • Figure out the expected processing #throughput for the entire topic. This depends on your production rate, which depends on the type of data you are producing to the topic.
  • Figure out the max throughput you can achieve per partition. This translates to processing a batch of events from a single partition. Usually consumers are slow, as they might involve calls to databases, other services etc. So the bottleneck can be the underlying database or the service.
  • Now apply the formula: number_of_partitions = (topic_throughput_expected / max_throughput_per_partition)
  • ex. (100Gb/sec) / (100Mb/sec) = 100 partitions this gives a rough calculation, so you can also go a bit higher like 150(best to check how much is the broker overhead increase with the infra team) to have even more room.

Note: While doing throughput calculation consider future traffic patterns and not only current traffic.


The GeekNarrator

Previous post
Next post
Related Posts
Leave a Reply

Your email address will not be published. Required fields are marked *