Topic compaction
To use compaction:
- You need to give messages keys, as topic compaction in Pulsar takes place on a per-key basis (i.e. messages are compacted based on their key). For a stock ticker use case, the stock symbol—-e.g. or
GOOG
-—could serve as the key (more on this below). Messages without keys will be left alone by the compaction process. - Compaction can be configured to run , or you can manually trigger compaction using the Pulsar administrative API.
- Your consumers must be to read from compacted topics (Java consumers, for example, have a
readCompacted
setting that must be set totrue
). If this configuration is not set, consumers will still be able to read from the non-compacted topic.
The classic example of a topic that could benefit from compaction would be a stock ticker topic through which consumers can access up-to-date values for specific stocks. Imagine a scenario in which messages carrying stock value data use the stock symbol as the key (GOOG
, AAPL
, TWTR
, etc.). Compacting this topic would give consumers on the topic two options:
- They can read from the compacted topic if they only want to see the most up-to-date messages.
Thus, if you’re using a Pulsar topic called stock-values
, some consumers could have access to all messages in the topic (perhaps because they’re performing some kind of number crunching of all values in the last hour) while the consumers used to power the real-time stock ticker only see the compacted topic (and thus aren’t forced to process outdated messages). Which variant of the topic any given consumer pulls messages from is determined by the consumer’s .
For example, to trigger compaction when the backlog reaches 100MB:
Configuring the compaction threshold on a namespace will apply to all topics within that namespace.
In order to run compaction on a topic, you need to use the topics compact command for the CLI tool. Here’s an example:
$ bin/pulsar-admin topics compact \
persistent://my-tenant/my-namespace/my-topic
The tool runs compaction via the Pulsar REST API. To run compaction in its own dedicated process, i.e. not through the REST API, you can use the command. Here’s an example:
$ bin/pulsar compact-topic \
--broker-conf /path/to/broker.conf \
--topic persistent://my-tenant/my-namespace/my-topic
$ bin/pulsar compact-topic \
--topic persistent://my-tenant/my-namespace/my-topic
When should I trigger compaction?
How often you will vary widely based on the use case. If you want a compacted topic to be extremely speedy on read, then you should run compaction fairly frequently.
Pulsar consumers and readers need to be configured to read from compacted topics. The sections below show you how to enable compacted topic reads for Pulsar’s language clients. If the
In order to read from a compacted topic using a Java consumer, the readCompacted
parameter must be set to true
. Here’s an example consumer for a compacted topic:
As mentioned above, topic compaction in Pulsar works on a per-key basis. That means that messages that you produce on compacted topics need to have keys (the content of the key will depend on your use case). Messages that don’t have keys will be ignored by the compaction process. Here’s an example Pulsar message with a key:
import org.apache.pulsar.client.api.Message;
import org.apache.pulsar.client.api.MessageBuilder;
Message<byte[]> msg = MessageBuilder.create()
.setContent(someByteArray)
The example below shows a message with a key being produced on a compacted Pulsar topic: