Kafka

Apache Kafka

  • Apache Kafka is a publish/subscribe messaging system
  • Also known as a distributed commit log
  • Each commit can be time-stamped
  • Messages are queued into Topics
  • Each Topic may contain multiple Partitions

Producers

  • Messages may be written to random/specific Partitions
  • Fire-and-forget: Send a message to the server and don’t care if it arrives successfully or not.
  • Synchronous send: Send a message, and wait to see if the send was successful or not.
  • Asynchronous send: Send method with a callback function, which gets triggered when it receives a response from the Kafka broker.

Consumers

  • Subscribes to one or more topics
  • No guarantee of message time-ordering across a topic, just within a single partition
  • Commit offsets automatically or at certain intervals
  • Work as part of a consumer group, where each partition is only consumed by one member
  • If a consumer fails, the remaining group members will rebalance the partitions
  • Multiple consumer groups may consume same topic independently
  • Kafka buffers data and allows consumers to operate in asynchronous multirate systems

Broker

  • A single Kafka server is called a broker.
  • Partitions are replicated in different brokers. Default topic replication factor is 3. Redundancy of messages in case of broker failure.
  • One broker becomes leader for a partition. Leadership changes in case of broker failure.
  • Producers and consumers connect to the leader

Kafka configuration

Several notable settings in Kafka deployment are mentioned below.

  • Producer ConfigMap
      "message.max.bytes": 100000000 //100MB. Defaults to 1MB.
    
  • Consumer ConfigMap
      "group.id": group //Consumer group id 
      "auto.offset.reset": "earliest" //or “latest”  
    
  • Kafka Config
      KAFKA_LOG_CLEANUP_POLICY=delete #Cleanup policy for segments beyond the retention window
      KAFKA_LOG_RETENTION_MINUTES=1 #Minutes to keep a log file before deleting. Default 168 hours.
    
  • Always retrieve the latest message by resetting the committed offset. The example below is in Golang.
      // LatestOffset resets consumer offset to the latest message in the topic
      func LatestOffset(c *kafka.Consumer) {
    
          // Record the current topic-partition assignments
          tpSlice, _ := c.Assignment()
    
          //Obtain the last message offset for all topic-partition
          for index, tp := range tpSlice {
              _, high, _ := c.QueryWatermarkOffsets(*(tp.Topic), tp.Partition, 100)
              tpSlice[index].Offset = kafka.Offset(high)
          }
    
          //Consume the last message in topic-partition
          c.Assign(tpSlice)
      }
    

Updated:

Leave a comment