ZooKeeper and BookKeeper administration

    • ZooKeeper is responsible for a wide variety of configuration- and coordination-related tasks.
    • is responsible for persistent storage of message data.ZooKeeper and BookKeeper are both open-source projects.

    Each Pulsar instance relies on two separate ZooKeeper quorums.

    • Local ZooKeeper operates at the cluster level and provides cluster-specific configuration management and coordination. Each Pulsar cluster needs to have a dedicated ZooKeeper cluster.
    • operates at the instance level and provides configuration management for the entire system (and thus across clusters). The configuration store quorum can be provided by an independent cluster of machines or by the same machines used by local ZooKeeper.

    ZooKeeper manages a variety of essential coordination- and configuration-related tasks for Pulsar.

    Deploying a Pulsar instance requires you to stand up one local ZooKeeper cluster per Pulsar cluster.

    To begin, add all ZooKeeper servers to the quorum configuration specified in the file. Add a server.N line for each node in the cluster to the configuration, where N is the number of the ZooKeeper node. Here's an example for a three-node cluster:

    On each host, you need to specify the ID of the node in each node's myid file, which is in each server's data/zookeeper folder by default (this can be changed via the parameter).

    On a ZooKeeper server at zk1.us-west.example.com, for example, you could set the myid value like this:

    1. $ mkdir -p data/zookeeper
    2. $ echo 1 > data/zookeeper/myid

    On zk2.us-west.example.com the command would be echo 2 > data/zookeeper/myid and so on.

    Once each server has been added to the zookeeper.conf configuration and has the appropriate myid entry, you can start ZooKeeper on all hosts (in the background, using nohup) with the pulsar-daemon CLI tool:

    1. $ bin/pulsar-daemon start zookeeper

    Deploying configuration store

    The ZooKeeper cluster configured and started up in the section above is a local ZooKeeper cluster used to manage a single Pulsar cluster. In addition to a local cluster, however, a full Pulsar instance also requires a configuration store for handling some instance-level configuration and coordination tasks.

    If you're deploying a single-cluster instance, then you will not need a separate cluster for the configuration store. If, however, you're deploying a instance, then you should stand up a separate ZooKeeper cluster for configuration tasks.

    Single-cluster Pulsar instance

    If your Pulsar instance will consist of just one cluster, then you can deploy a configuration store on the same machines as the local ZooKeeper quorum but running on different TCP ports.

    To deploy a ZooKeeper configuration store in a single-cluster instance, add the same ZooKeeper servers used by the local quorom to the configuration file in using the same method for local ZooKeeper, but make sure to use a different port (2181 is the default for ZooKeeper). Here's an example that uses port 2184 for a three-node ZooKeeper cluster:

    1. clientPort=2184
    2. server.1=zk1.us-west.example.com:2185:2186
    3. server.2=zk2.us-west.example.com:2185:2186
    4. server.3=zk3.us-west.example.com:2185:2186

    As before, create the myid files for each server on data/global-zookeeper/myid.

    Multi-cluster Pulsar instance

    When deploying a global Pulsar instance, with clusters distributed across different geographical regions, the configuration store serves as a highly available and strongly consistent metadata store that can tolerate failures and partitions spanning whole regions.

    The key here is to make sure the ZK quorum members are spread across at least 3regions and that other regions are running as observers.

    For example, let's assume a Pulsar instance with the following clusters us-west,us-east, us-central, eu-central, ap-south. Also let's assume, each clusterwill have its own local ZK servers named such as

    In this scenario we want to pick the quorum participants from few clusters andlet all the others be ZK observers. For example, to form a 7 servers quorum, wecan pick 3 servers from us-west, 2 from us-central and 2 from us-east.

    This will guarantee that writes to configuration store will be possible even if oneof these regions is unreachable.

    The ZK configuration in all the servers will look like:

    Additionally, ZK observers will need to have:

    1. peerType=observer
    Starting the service

    Once your configuration store configuration is in place, you can start up the service using

    1. $ bin/pulsar-daemon start configuration-store

    ZooKeeper configuration

    In Pulsar, ZooKeeper configuration is handled by two separate configuration files found in the conf directory of your Pulsar installation: conf/zookeeper.conf for and conf/global-zookeeper.conf for configuration store.

    Local ZooKeeper

    Configuration for local ZooKeeper is handled by the conf/zookeeper.conf file. The table below shows the available parameters:

    Configuration Store

    Configuration for configuration store is handled by the conf/global-zookeeper.conf file. The table below shows the available parameters:

    BookKeeper is responsible for all durable message storage in Pulsar. BookKeeper is a distributed WAL system that guarantees read consistency of independent message logs called ledgers. Individual BookKeeper servers are also called bookies.

    BookKeeper provides persistent message storage for Pulsar.

    Each Pulsar broker needs to have its own cluster of bookies. The BookKeeper cluster shares a local ZooKeeper quorum with the Pulsar cluster.

    Configuring bookies

    BookKeeper bookies can be configured using the conf/bookkeeper.conf configuration file. The most important aspect of configuring each bookie is ensuring that the parameter is set to the connection string for the Pulsar cluster's local ZooKeeper.

    Starting up bookies

    You can start up a bookie in two ways: in the foreground or as a background daemon.

    To start up a bookie in the foreground, use the CLI tool:

    1. $ bin/bookkeeper bookie

    To start a bookie in the background, use the pulsar-daemon CLI tool:

    1. $ bin/pulsar-daemon start bookie

    This will create a new ledger on the local bookie, write a few entries, read them back and finally delete the ledger.

    Bookie hosts are responsible for storing message data on disk. In order for bookies to provide optimal performance, it's essential that they have a suitable hardware configuration. There are two key dimensions to bookie hardware capacity:

    • Disk I/O capacity read/write
    • Storage capacityMessage entries written to bookies are always synced to disk before returning an acknowledgement to the Pulsar broker. To ensure low write latency, BookKeeper isdesigned to use multiple devices:

    • A journal to ensure durability. For sequential writes, it's critical to have fast operations on bookie hosts. Typically, small and fast solid-state drives (SSDs) should suffice, or (HDDs) with a RAIDs controller and a battery-backed write cache. Both solutions can reach fsync latency of ~0.4 ms.

    • A ledger storage device is where data is stored until all consumers have acknowledged the message. Writes will happen in the background, so write I/O is not a big concern. Reads will happen sequentially most of the time and the backlog is drained only in case of consumer drain. To store large amounts of data, a typical configuration will involve multiple HDDs with a RAID controller.

    Configuring BookKeeper

    Configurable parameters for BookKeeper bookies can be found in the conf/bookkeeper.conf file.

    Minimum configuration changes required in conf/bookkeeper.conf are:

    1. # Change to point to journal disk mount point
    2. journalDirectory=data/bookkeeper/journal
    3. # Point to ledger storage disk mount point
    4. # Point to local ZK quorum
    5. zkServers=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181
    6. # Change the ledger manager type
    7. ledgerManagerType=hierarchical

    To change the zookeeper root path used by Bookkeeper, use zkLedgersRootPath=/MY-PREFIX/ledgers instead of zkServers=localhost:2181/MY-PREFIX

    In Pulsar, you can set persistence policies, at the namespace level, that determine how BookKeeper handles persistent storage of messages. Policies determine four things:

    • The number of acks (guaranteed copies) to wait for each ledger entry
    • The number of bookies to use for a topic
    • How many writes to make for each ledger entry
    • The throttling rate for mark-delete operations

    Set persistence policies

    You can set persistence policies for BookKeeper at the namespace level.

    pulsar-admin

    Use the set-persistence subcommand and specify a namespace as well as any policies that you want to apply. The available flags are:

    FlagDescriptionDefault
    -a, —bookkeeper-ack-quoromThe number of acks (guaranteed copies) to wait on for each entry0
    -e, —bookkeeper-ensembleThe number of to use for topics in the namespace0
    -w, —bookkeeper-write-quorumHow many writes to make for each entry0
    -r, —ml-mark-delete-max-rateThrottling rate for mark-delete operations (0 means no throttle)0
    Example
    1. $ pulsar-admin namespaces set-persistence my-tenant/my-ns \
    2. --bookkeeper-ack-quorom 3 \
    3. --bookeeper-ensemble 2

    REST API

    POST/admin/v2/namespaces/:tenant/:namespace/persistence

    Java

    1. int bkEnsemble = 2;
    2. int bkQuorum = 3;
    3. int bkAckQuorum = 2;
    4. double markDeleteRate = 0.7;
    5. PersistencePolicies policies =
    6. new PersistencePolicies(ensemble, quorum, ackQuorum, markDeleteRate);
    7. admin.namespaces().setPersistence(namespace, policies);

    You can see which persistence policy currently applies to a namespace.

    pulsar-admin

    Use the subcommand and specify the namespace.

    Example
    1. $ pulsar-admin namespaces get-persistence my-tenant/my-ns
    2. {
    3. "bookkeeperEnsemble": 1,
    4. "bookkeeperWriteQuorum": 1,
    5. "bookkeeperAckQuorum", 1,

    REST API

    GET/admin/v2/namespaces/:tenant/:namespace/persistence

    Java

    This diagram illustrates the role of ZooKeeper and BookKeeper in a Pulsar cluster: