In the Land of Streams — Kafka Part 1: A Producer’s Message

A Kafka Streaming Ledger

https://www.vecteezy.com/free-vector/squirrel-cartoon
Message Lifecycle: PPC (Produce, Persist, Consume)
  1. Ingest data files (with click event data) into Kafka
  2. Explain how the producing side works
  3. Producer configuration tuning
  4. Throughput vs Latency

👀 Show me the Code 👀

Producing to Kafka
Producer Internals
  1. The message is serialized using the specified serializer
  2. The partitioner determines in which partition the message should be routed.
  3. Internally Kafka keeps message buffers; we have one buffer for each partition and each buffer can hold many batches of messages grouped for each partition.
  4. Finally, the I/O threads pick up these batches and sent them over to the brokers.
    At this point, our messages are in-flight from the client to the brokers. The brokers have sent/receive network buffers for the network threads to pick up the messages and hand them over to some IO thread to actually persist it on disk.
  5. On the leader broker, the messages are written on disk and sent to the followers for replication. One thing to note here is that the messages are first written on the PageCache and periodically are flushed on disk.
    (Note: PageCache to disk is an extreme case for message loss, but still you might wanna be aware of that)
  6. The followers (in-sync replicas) store and sent an acknowledgment back they have replicated the message.
  7. A RecordMetadata response is sent back to the client.
  8. If a failure occurred and we didn’t receive an ACK, we check if message retry is enabled and we need to resend it
  9. The client receives the response.

Let’s better illustrate this with an example.

Wrapping Up

  • Think of the requirements and try to tune between throughput and latency
  • Think of the guarantees you need your producers to provide; i.e for exactly once semantics idempotency and/or transactions might be your friends there.
  • One detail not mentioned before but is good to know is that If you want to create multi-threaded apps, its best to create one producer instance and share it among threads

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Giannis Polyzos

Streaming Data Architect @ Aiven ~ Event Streaming, Stateful Stream Processing and Cloud Native Data Architectures https://www.linkedin.com/in/polyzos/