Deploying Pulsar on Kubernetes

    Pulsar can be easily deployed in Kubernetes clusters, either in managed clusters on or Amazon Web Services or in .

    The deployment method shown in this guide relies on YAML definitions for Kubernetes . The kubernetes subdirectory of the holds resource definitions for:

    • A two-bookie BookKeeper cluster
    • A three-node ZooKeeper cluster
    • A three-broker Pulsar cluster
    • A monitoring stack consisting of , Grafana, and the
    • A pod from which you can run administrative commands using the CLI tool

    To get started, install a source package from the downloads page.

    Please note that the Pulsar binary package will not contain the necessary YAML resources to deploy Pulsar on Kubernetes.

    If you'd like to change the number of bookies, brokers, or ZooKeeper nodes in your Pulsar cluster, modify the replicas parameter in the spec section of the appropriate or StatefulSet resource.

    Pulsar on Google Kubernetes Engine

    Google Kubernetes Engine (GKE) automates the creation and management of Kubernetes clusters in (GCE).

    Prerequisites

    To get started, you'll need:

    • A Google Cloud Platform account, which you can sign up for at
    • An existing Cloud Platform project
    • The Google Cloud SDK (in particular the and kubectl tools).

    Create a new Kubernetes cluster

    You can create a new GKE cluster using the container clusters create command for gcloud. This command enables you to specify the number of nodes in the cluster, the machine types of those nodes, and more.

    As an example, we'll create a new GKE cluster for Kubernetes version in the us-central1-a zone. The cluster will be named pulsar-gke-cluster and will consist of three VMs, each using two locally attached SSDs and running on machines. These SSDs will be used bybookie instances, one for the BookKeeper and the other for storing the actual message data.

    By default, bookies will run on all the machines that have locally attached SSD disks. In this example, all of those machines will have two SSDs, but you can add different types of machines to the cluster later. You can control which machines host bookie servers using labels.

    Dashboard

    You can observe your cluster in the Kubernetes Dashboard by downloading the credentials for your Kubernetes cluster and opening up a proxy to the cluster:

    1. $ gcloud container clusters get-credentials pulsar-gke-cluster \
    2. --zone=us-central1-a \
    3. --project=your-project-name
    4. $ kubectl proxy

    By default, the proxy will be opened on port 8001. Now you can navigate to in your browser to access the dashboard. At first your GKE cluster will be empty, but that will change as you begin deploying Pulsar components using kubectl component by component,or using .

    You can run Kubernetes on Amazon Web Services (AWS) in a variety of ways. A very simple way that was involves using the Kubernetes Operations (kops) tool.

    You can find detailed instructions for setting up a Kubernetes cluster on AWS .

    When you create a cluster using those instructions, your kubectl config in ~/.kube/config (on MacOS and Linux) will be updated for you, so you probably won't need to change your configuration. Nonetheless, you can ensure that kubectl can interact with your cluster by listing the nodes in the cluster:

    1. $ kubectl get nodes

    If kubectl is working with your cluster, you can proceed to deploy Pulsar components using kubectl component by component,or using .

    Pulsar on a custom Kubernetes cluster

    Pulsar can be deployed on a custom, non-GKE Kubernetes cluster as well. You can find detailed documentation on how to choose a Kubernetes installation method that suits your needs in the guide in the Kubernetes docs.

    The easiest way to run a Kubernetes cluster is to do so locally. To install a mini local cluster for testing purposes, running in local VMs, you can either:

    • Use minikube to run a single-node Kubernetes cluster
    • Create a local cluster running on multiple VMs on the same machine
    • witha VM driver, e.g. kvm2 on Linux or hyperkit or VirtualBox on macOS.
    1. minikube start --memory=8192 --cpus=4 \
    2. --kubernetes-version=v1.10.5
    • Set kubectl to use Minikube.
    1. kubectl config use-context minikube

    In order to use the with local Kubernetes cluster on Minikube, run:

    1. $ minikube dashboard

    The command will automatically trigger open a webpage in your browser. At first your local cluster will be empty,but that will change as you begin deploying Pulsar components using kubectl component by component,or using .

    Multiple VMs

    For the second option, follow the for running Kubernetes using CoreOS on . We'll provide an abridged version of those instructions here.

    1. $ git clone https://github.com/pires/kubernetes-vagrant-coreos-cluster
    2. $ cd kubernetes-vagrant-coreos-cluster
    3. # Start a three-VM cluster
    4. $ NODES=3 USE_KUBE_UI=true vagrant up

    Create SSD disk mount points on the VMs using this script:

    1. NODES=3 vagrant ssh $vm -c "sudo mkdir -p /mnt/disks/ssd0"
    2. NODES=3 vagrant ssh $vm -c "sudo mkdir -p /mnt/disks/ssd1"
    3. done

    Bookies expect two logical devices to mount for journal and persistent message storage to be available. In this VM exercise, we created two directories on each VM.

    Once the cluster is up, you can verify that kubectl can access it:

    1. $ kubectl get nodes
    2. NAME STATUS AGE VERSION
    3. 172.17.8.101 Ready,SchedulingDisabled 10m v1.6.4
    4. 172.17.8.102 Ready 8m v1.6.4
    5. 172.17.8.103 Ready 6m v1.6.4
    6. 172.17.8.104 Ready 4m v1.6.4

    In order to use the with your local Kubernetes cluster, first use kubectl to create a proxy to the cluster:

    Now you can access the web interface at localhost:8001/ui. At first your local cluster will be empty,but that will change as you begin deploying Pulsar components using kubectl ,or using helm.

    Now that you've set up a Kubernetes cluster, either on or on a custom cluster, you can begin deploying the components that make up Pulsar. The YAML resource definitions for Pulsar components can be found in the kubernetes folder of the .

    In that package, there are different sets of resource definitions for different environments.

    • deployment/kubernetes/google-kubernetes-engine: for Google Kubernetes Engine (GKE)
    • deployment/kubernetes/aws: for AWS
    • deployment/kubernetes/generic: for a custom Kubernetes clusterTo begin, cd into the appropriate folder.

    Deploy ZooKeeper

    You must deploy ZooKeeper as the first Pulsar component, as it is a dependency for the others.

    1. $ kubectl apply -f zookeeper.yaml

    Wait until all three ZooKeeper server pods are up and have the status Running. You can check on the status of the ZooKeeper pods at any time:

    1. $ kubectl get pods -l component=zookeeper
    2. NAME READY STATUS RESTARTS AGE
    3. zk-0 1/1 Running 0 18m
    4. zk-1 1/1 Running 0 17m
    5. zk-2 0/1 Running 6 15m

    This step may take several minutes, as Kubernetes needs to download the Docker image on the VMs.

    Initialize cluster metadata

    Once ZooKeeper is running, you need to initialize the metadata for the Pulsar cluster in ZooKeeper. This includes system metadata for and Pulsar more broadly. There is a Kubernetes job in the cluster-metadata.yaml file that you only need to run once:

    1. $ kubectl apply -f cluster-metadata.yaml

    For the sake of reference, that job runs the following command on an ephemeral pod:

    1. $ bin/pulsar initialize-cluster-metadata \
    2. --cluster local \
    3. --zookeeper zookeeper \
    4. --configuration-store zookeeper \
    5. --web-service-url http://broker.default.svc.cluster.local:8080/ \
    6. --broker-service-url pulsar://broker.default.svc.cluster.local:6650/

    Once cluster metadata has been successfully initialized, you can then deploy the bookies, brokers, monitoring stack (, Grafana, and the ), and Pulsar cluster proxy:

    1. $ kubectl apply -f bookie.yaml
    2. $ kubectl apply -f broker.yaml
    3. $ kubectl apply -f monitoring.yaml
    4. $ kubectl apply -f admin.yaml

    You can check on the status of the pods for these components either in the Kubernetes Dashboard or using kubectl:

    1. $ kubectl get pods -w -l app=pulsar

    Set up properties and namespaces

    Once all of the components are up and running, you'll need to create at least one Pulsar tenant and at least one namespace.

    You can create properties and namespaces (and perform any other administrative tasks) using the pulsar-admin pod that is already configured to act as an admin client for your newly created Pulsar cluster. One easy way to perform administrative tasks is to create an alias for the tool installed on the admin pod.

    1. $ alias pulsar-admin='kubectl exec pulsar-admin -it -- bin/pulsar-admin'

    Now, any time you run pulsar-admin, you will be running commands from that pod. This command will create a tenant called ten:

    1. $ pulsar-admin tenants create ten \
    2. --admin-roles admin \
    3. --allowed-clusters local

    This command will create a ns namespace under the ten tenant:

    To verify that everything has gone as planned:

    1. $ pulsar-admin tenants list
    2. public
    3. ten
    4. $ pulsar-admin namespaces list ten
    5. ten/ns

    Now that you have a namespace and tenant set up, you can move on to experimenting with your Pulsar cluster from within the cluster or using a Pulsar client.

    Experimenting with your cluster

    First, create an alias to use the pulsar-perf tool via the admin pod:

    1. $ alias pulsar-perf='kubectl exec pulsar-admin -it -- bin/pulsar-perf'

    Now, produce messages:

    1. $ pulsar-perf produce persistent://public/default/my-topic \
    2. --rate 10000

    Similarly, you can start a to subscribe to and receive all the messages on that topic:

    1. $ pulsar-perf consume persistent://public/default/my-topic \
    2. --subscriber-name my-subscription-name

    You can also view stats for the topic using the tool:

    1. $ pulsar-admin persistent stats persistent://public/default/my-topic

    Monitoring

    The default monitoring stack for Pulsar on Kubernetes has consists of , Grafana, and the .

    If you deployed the cluster to Minikube, the following monitoring ports are mapped at the minikube VM:

    • Prometheus port: 30003
    • Grafana port: 30004
    • Dashboard port: 30005

    You can use minikube ip to find the ip address of the minikube VM, and then use their mapped portsto access corresponding services. For example, you can access Pulsar dashboard at http://$(minikube ip):30005.

    Prometheus

    All Pulsar metrics in Kubernetes are collected by a Prometheus instance running inside the cluster. Typically, there is no need to access Prometheus directly. Instead, you can use the that displays the data stored in Prometheus.

    Grafana

    In your Kubernetes cluster, you can use to view dashbaords for Pulsar namespaces (message rates, latency, and storage), JVM stats, , and BookKeeper. You can get access to the pod serving Grafana using kubectl's command:

    1. $ kubectl port-forward \
    2. $(kubectl get pods -l component=grafana -o jsonpath='{.items[*].metadata.name}') 3000

    You can then access the dashboard in your web browser at localhost:3000.

    Pulsar dashboard

    While Grafana and Prometheus are used to provide graphs with historical data, Pulsar dashboard reports more detailed current data for individual .

    For example, you can have sortable tables showing all namespaces, topics, and broker stats, with details on the IP address for consumers, how long they've been connected, and much more.

    You can access to the pod serving the Pulsar dashboard using kubectl's port-forward command:

    1. $ kubectl port-forward \
    2. $(kubectl get pods -l component=dashboard -o jsonpath='{.items[*].metadata.name}') 8080:80

    You can then access the dashboard in your web browser at .

    Once your Pulsar cluster is running on Kubernetes, you can connect to it using a Pulsar client. You can fetch the IP address for the Pulsar proxy running in your Kubernetes cluster using kubectl:

    1. $ kubectl get service broker-proxy \
    2. --output=jsonpath='{.status.loadBalancer.ingress[*].ip}'

    If the IP address for the proxy were, for example, 35.12.13.198, you could connect to Pulsar using pulsar://35.12.13.198:6650.

    You can find client documentation for:

    Deploying Pulsar components (helm)

    Pulsar also provides a Helm chart for deploying a Pulsar cluster to Kubernetes. Before you start,make sure you follow to install helm.

    Assum you have cloned pulsar repo under a PULSAR_HOME directory.

    Minikube

    • Go to Pulsar helm chart directory
    • Install helm chart to a K8S cluster on Minikube.
    1. helm install --values pulsar/values-mini.yaml ./pulsar
    • Web service url:
    • Pulsar service url: pulsar://$(minikube ip):30002/