Using custom Kafka headers for advanced message processing

Tinybird is happy to announce that support for Kafka headers is coming next month! Kafka headers are key-value pairs that can be attached to Kafka messages to provide metadata and additional information about the message. Read on to learn more about the use cases headers support and how Tinybird will implement this new feature.
Alejandro Martín
Product Manager
May 16, 2023
 ・ 
  min read

One of the critical challenges in event-driven architectures is efficient message routing. Kafka headers are key-value pairs that can be attached to Kafka messages to provide metadata and additional information about the message, helping solve this problem.

Imagine a scenario where you need to route messages to specific services, applications, or tenants based on custom criteria. By storing routing information in headers, such as target service identifiers, you gain the ability to implement sophisticated routing and filtering mechanisms on the consumer side. Kafka headers enable seamless message distribution and streamline your event-driven workflows. 

To date, Tinybird has not supported Kafka headers within our native Kafka connector. But in the coming weeks, we’ll launch support for Kafka headers and the message routing efficiencies they enable.

Read on to learn more about the use cases Kafka headers support, and how we’ve chosen to implement this new feature in Tinybird. And if you happen to be at Kafka Summit, drop by our booth (S3) and say “hi.” We would love to meet you, and we have some fun goodies for you!

What are Kafka message headers?

Message headers are a key feature of Apache Kafka that allows you to attach metadata to individual records (or messages) in the form of key-value pairs. 

Kafka headers are analogous to HTTP headers, providing additional metadata about the request being transferred, such as content type, compression details, and rate limit status. This Kafka header metadata is separate from the actual message key and value, which typically represent the main content of the message. 

How can Kafka headers be used?

Custom headers in Apache Kafka can be tailored to specific application requirements, by adding additional context or metadata to your streaming data from Kafka. Here are some common use cases for message headers in Apache Kafka:

  • Message routing: Headers can be used to store routing information, such as the target service, application, or tenant. This enables more sophisticated routing and filtering mechanisms on the consumer side, based on the header information.
  • Message versioning: Storing the version of a message schema in a header can help manage schema evolution and backward compatibility. Consumers can use the version information to handle different message versions accordingly.
  • Correlation and tracing: Headers can store correlation IDs and trace information for distributed tracing systems, helping track the flow of messages across multiple services and applications. These metadata can help diagnose and troubleshoot issues in distributed systems.
  • User or client identifier: Storing a user or client identifier as a custom header can help track which user or system generated a particular message. This information can be useful for analytics, auditing, and troubleshooting purposes.
  • Content type: Indicating the type or format of the message value (e.g., JSON, Avro, XML) allows consumers to understand how to deserialize and process the message correctly.
  • Priority or importance level: Attaching a priority level to messages as a custom header can be beneficial for systems that need to process messages based on their urgency or importance. Consumers can use this information to prioritize or filter messages during consumption.
  • Compression: If a message payload is compressed, the compression algorithm (e.g., Gzip, Snappy, LZ4) can be specified in a header to inform consumers about the decompression method required.

In addition to the use cases shared above, you may have the need to add encrypted metadata or support feature flags and experiment IDs to aid your development process. If you have a system that handles multi-lingual or region-specific data, your customer header could specify a locale or language to help on the consumer side of your stream.

As you can see, there are many great uses for Kafka message headers, and utilizing them can improve the flexibility, traceability, and interoperability of your event-driven systems.

Note that while Kafka headers are completely customizable, you should be mindful of the trade-offs of incorporating them. Adding headers increases the size of your Kafka messages, and that will impact the storage and network overhead in your Kafka cluster. 

It's crucial to strike a balance between enriching messages with useful metadata and maintaining optimal performance and resource usage. Deciding which headers you use will depend on the unique requirements of your application.

An example of using Kafka headers effectively

Let’s say you have a ``contact`` entity in your Kafka streams, and that entity can be described with ``created``, ``updated``, and ``deleted`` events. It's common practice to use the same Kafka topic for all the event types that apply to the same entity so that Kafka can guarantee the correct order of delivery and ensure the appropriate status for the Contact.

In this case, you could use a custom Kafka header to indicate the type of event and help you analyze the data downstream. If you include the type of event (e.g. created, updated, or deleted) in the header, you could easily count the number of active contacts simply by subtracting the total number of ``deleted`` events from the total number of ``created`` events. Having this data available in the headers makes it much easier to identify these event types with minimal effort.

How will Tinybird support Kafka headers? 

Kafka headers are a powerful way to enhance streams with useful metadata. Since Kafka headers have so many use cases, and several customers have asked us to include them in Tinybird, we will be adding support for Kafka headers in all of our Kafka-based Connectors (support for Kafka, Confluent, and soon Redpanda).

{%tip-box title="Note"%}At Tinybird, we’re obsessed with developer experience. We want to make sure that all our new releases directly solve customer needs in a simple and intuitive manner. What we share below represents our current plan for implementing Kafka headers. If you think you’d want to use Kafka headers in your Tinybird Workspaces and have feedback on our approach, please let us know! You can reach us in our Slack community.{%tip-box-end%}

As of today, we already include optional metadata columns ``__topic``, ``__partion``, ``__offset``, ``__timestamp``, and ``__key`` for Kafka Data Sources. We also include an optional ``__value`` column which stores the entire unparsed contents of the Kafka message. These metadata columns already provide valuable information that can be used to enhance your analysis downstream.

To further enhance Kafka Data Sources, we’ll add an optional ``__headers`` column when you create a Kafka Data Source. When you set up a Kafka Data Source in Tinybird, you’ll see this optional column when you configure your Data Source schema.

A screenshot showing how Tinybird processes Kafka headers and raw values into a Data Source
Figure - Including Kafka header contents into a __header column will be analogous to including the message contents in a __value column.

If you choose to enable the ``__header`` column, your Kafka headers will be written to that column with a simple key-value JSON structure. Here is an example header that might be written to the ``__header`` column in an example ``banking_transactions`` Data Source:

We then recommend using the  ``JSONExtract()`` function to access individual key-value header entries in your SQL nodes:

{%tip-box title="Avro encoded headers"%}Avro encoded messages are already supported in Tinybird, and so encoded headers will also be supported. Regardless of the header format, it will always be rendered in JSON in your Tinybird Data Source.{%tip-box-end%}

From there, you can publish any queries you’ve built using Kafka headers as API endpoints. This can be especially useful when you want to map query parameters in your APIs to Kafka header metadata.

For example, using the example above, you could augment your ``JSONExtract()`` function to dynamically return results based on header metadata supplied through a query parameter, as below:

Conclusion

Overall, Kafka headers are a powerful way to enhance Kafka streams with useful metadata, providing greater flexibility, traceability, and interoperability in event-driven systems.

Custom headers are used to tailor messages to specific application requirements, such as user or client identification, message priority, message expiration, encryption or signing information, locale or language, and feature flags or experiment IDs.

With the upcoming addition of support for Kafka headers in Tinybird, you’ll be able to access them as a ``__header`` column in your Tinybird Data Sources and extract individual key-value pairs in your SQL queries to enhance your published APIs.

If you’re new to Tinybird, give it a try! You can start for free on our Build Plan - which is more than enough for small projects and has no time limit - and use the Kafka connector to quickly ingest data from your Kafka topics.