OpenEBS for Cassandra
Apache Cassandra is a distributed NoSQL database management system designed to handle large amounts of data across nodes, providing high availability with no single point of failure. It uses asynchronous masterless replication allowing low latency operations for all clients. Cassandra is deployed usually as a on Kubernetes and requires persistent storage for each instance of Cassandra. OpenEBS provides persistent volumes on the fly when Cassandra instances are scaled up.
Advantages of using OpenEBS for Cassandra database:
- No need to manage the local disks, they are managed by OpenEBS
- Large size PVs can be provisioned by OpenEBS and Cassandra
- Cassandra sometimes need highly available storage, in such cases OpenEBS volumes can be configured with 3 replicas.
- If required, take backup of the Cassandra data periodically and back them up to S3 or any object storage so that restoration of the same data is possible to the same or any other Kubernetes cluster
Note: Cassandra can be deployed both as deployment
or as . When Cassandra deployed as statefulset
, you don’t need to replicate the data again at OpenEBS level. When Cassandra is deployed as , consider 3 OpenEBS replicas, choose the StorageClass accordingly.
Deployment model
As shown above, OpenEBS volumes need to be configured with three replicas for high availability. This configuration work fine when the nodes (hence the cStor pool) is deployed across Kubernetes zones.
Install OpenEBS
Configure cStor Pool
After OpenEBS installation, cStor pool has to be configured. If cStor Pool is not configured in your OpenEBS cluster, this can be done from . During cStor Pool creation, make sure that the maxPools parameter is set to >=3. Sample YAML named openebs-config.yaml for configuring cStor Pool is provided in the Configuration details below. If cStor pool is already configured, go to the next step.
Create Storage Class
You must configure a StorageClass to provision cStor volume on given cStor pool. StorageClass is the interface through which most of the OpenEBS storage policies are defined. In this solution we are using a StorageClass to consume the cStor Pool which is created using external disks attached on the Nodes. Since Cassandra is a StatefulSet application, it requires only one replication at the storage level. So cStor volume
replicaCount
is 1. Sample YAML named openebs-sc-disk.yamlto consume cStor pool with cStor volume replica count as 1 is provided in the configuration details below.
Post deployment Operations
Monitor OpenEBS Volume size
It is not seamless to increase the cStor volume size (refer to the roadmap item). Hence, it is recommended that sufficient size is allocated during the initial configuration.
Monitor cStor Pool size
As in most cases, cStor pool may not be dedicated to just Cassandra database alone. It is recommended to watch the pool capacity and add more disks to the pool before it hits 80% threshold. See .
openebs-config.yaml
openebs-sc-disk.yaml