In the realm of Kubernetes, managing stateful workloads can be a formidable challenge. This article stems from a prior experience where we encountered and successfully resolved the complex issue of deploying Elasticsearch pods on NVMe-based Local SSDs within I3 machines. This task proved significantly more demanding compared to the relatively straightforward process of deploying Elasticsearch pods on EBS (Elastic Block Store). Inspired to share this knowledge, I’ve decided to document it now. The challenge primarily revolves around the persistent problem of ephemeral disk locality in our Kubernetes cluster. This issue has impeded our efforts to scale our infrastructure and boost operational efficiency. Despite extensive research, I’ve found a noticeable lack of a comprehensive technical guide that addresses local storage provisioning in Kubernetes clusters.
The Problem We Faced
Our scenario involves deploying Elasticsearch hot pods on NVMe-based I3 instances, a scenario demanding an exceptional level of performance due to Elasticsearch’s read and write-heavy nature. We had three main requirements:
- Data Persistence: It was crucial to ensure that local data wouldn’t be deleted if a pod was restarted. We needed to attach the data to the pod to maintain data integrity.
- Scalability: Our future plans included running multiple hot pods on the same machine. To achieve this, we needed a solution that could bifurcate multiple volumes on the disk and fulfill relevant pod requests through the underlying volumes.
- Performance: Given Elasticsearch’s read and write-intensive nature, performance was a paramount requirement for our storage solution. It needed to meet the demands of Elasticsearch workloads. Hence we decided to use the native storage capabilities of I3 machines instead of going ahead with EBS which in turn posed a lot of challenges.
Solutions Evaluated
We explored three different storage provisioner options in Kubernetes to address our requirements. Here are the solutions we considered:
1. Using HostPath:
HostPath is a relatively straightforward solution for local storage in Kubernetes. It allows you to use the local file system of the node where your pod is running as the storage backend. This means you can directly access and use the host’s file system.
Also The hostPath volume type is single-node only, meaning that a pod on one node cannot access the hostPath volume on another node. One way to get around this limitation may be to either create a StatefulSet or Daemonset which could force pods to always deploy to the same node(s), or force a deployment’s pods to always be deployed to the same node via nodeAffinity / nodeSelector.
Also host-paths do not offer any volume limitations mechanism so in case of pod tries to take up large amounts of disk another pod on the same node would also be impacted.
apiVersion: v1
kind: Pod
metadata:
name: elasticsearch-pod
spec:
volumes:
- name: elasticsearch-data
hostPath:
path: /var/data/elasticsearch
containers:
- name: elasticsearch-container
image: elasticsearch:7.9.3
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
Pros:
- Relatively easy to set up
Cons:
- Not suitable for managing multiple pods on the same machine sharing the same hostpath.
- Lack of advanced features for storage management.
2. Kubernetes Local Storage Provisioner:
Kubernetes Local Storage Provisioner helps manage local storage on the nodes where your applications run. Local storage means the hard drives or storage space directly available on the physical servers or computers in your Kubernetes cluster.
Here’s what it does:
- Local Volumes: It deals with storage devices like hard drives, partitions, or directories that are physically present on the machine where Kubernetes is running. These local volumes can only be used as storage that’s set up in advance via pv / pvc, not on-the-fly.
- Node-Aware: Unlike some other ways of using local storage, this system is aware of which node (or computer) in your cluster has the storage. It takes this into account when deciding where to run your applications. Eg. Let’s say we have a PV of 50 GB on a node and then pod has a PVC claim of 40 GB , then the kubernetes control plane will schedule the pod on the node on which has 50 GB PV.
- Node Locking: The significant advantage here is that Kubernetes ensures that if your application uses this local storage, it always runs on the same computer where that storage is available. This helps prevent data loss because it won’t randomly move your application to a different computer. This happens because of the claim is associated with the Persistent Volume on the node.
In the above diagram , we can see the following
- StatefulSet 1 uses underlying partition for storage needs
- Hence after restarts, pod remains on the same node
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
spec:
serviceName: "elasticsearch-service"
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: registry.k8s.io/elasticsearch
command:
- "/bin/sh"
args:
- "-c"
- "sleep 100000"
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/test-pod
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "local-storage"
resources:
requests:
storage: 30Gi
Pros:
- Supports creating PVCs on local ephemeral storage.
- Provides basic deprovisioning and disk space management.
- Open source and relatively easy to manage.
Cons:
- May not offer the level of control required for bifurcating volumes and fulfilling specific pod requests.
- Capacity Management is not baked in
3. OpenEBS with Local LVM Provisioner:
OpenEBS is an open-source project that offers more advanced storage management capabilities. It’s designed for scenarios like the one you’re facing, where we require data persistence, scalability, and high performance. OpenEBS allows you to create storage classes that can be used to provision local storage volumes on nodes with local SSDs. It also offers features like volume bifurcation and fine-tuning for performance. While it provides powerful capabilities, it does introduce some added complexity in terms of setup and configuration.
As one can see in the diagram
- StatefulSets can be assigned to different block devices or different partitions
- On restarts, StatefulSets are assigned to the same node due to existing PV claims
# Install OpenEBS via Helm or other methods.
apiVersion: openebs.io/v1alpha1
kind: StorageClass
metadata:
name: openebs-local-lvm
provisioner: openebs.io/local
parameters:
storageType: local
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
spec:
serviceName: "elasticsearch-service"
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: registry.k8s.io/elasticsearch
command:
- "/bin/sh"
args:
- "-c"
- "sleep 100000"
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/test-pod
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: openebs-local-lvm
resources:
requests:
storage: 30Gi
Pros:
- Offers advanced storage management capabilities.
- Allows for bifurcating volumes and fulfilling specific pod requests.
- Open source and manageable.
Cons:
- Can introduce complexity because of too many services which are involved compared to other solutions.
- Requires more setup and configuration.
Evaluation Against Requirements
Let’s evaluate these solutions against your specific requirements:
Requirement | HostPath | Kubernetes Local Storage Provisioner | OpenEBS with Local LVM |
---|---|---|---|
Support for Creating PVCs on Local Ephemeral Storage | Can be achieved | Yes | Yes |
Disk Space Management | Limited | Basic management | Strong management |
Disk Performance | Same as the disk performance on the node | Same as the disk performance on the node | Same as the disk performance on the node |
Open Source | Open source | Open source | Open source |
Easy to Manage | Yes | Yes | Somewhat difficult |
Conclusion
Choosing the right storage provisioner for your Elasticsearch hot pods on NVMe-based I3 instances is a critical decision. The choice depends on your specific needs and trade-offs between complexity and control.
- HostPath: Simple but limited scalability and volume management.
- Kubernetes Local Storage Provisioner: Basic support with some limitations.
- OpenEBS with Local LVM: Offers advanced control, scalability, and performance, but requires more configuration.
Considering our requirements, OpenEBS with Local LVM came out on top. It provides the flexibility and control needed for your use case, including the ability to bifurcate volumes and fulfill specific pod requests.