Storage in Kubernetes

Introduction

Storage is very important in every statefull application. In a typical on-premise setup, we usually have some dedicated storage e.g. a network file server, disk arrays etc.

These storages will have bunch of volumes. We then expose those volumes to our servers and server will use those volumes as local disks and our applications are happy to use those without knowing how those or implemented and are being available.

This model works very well for important business applications, as now these external storages can do some advanced things such as replication, back-ups etc. and all this complexity is hidden from our app that just think its reading and writing to a local disk in the server.

That was then. Along come the docker and kubernetes and despite massively changing the way that we build, ship and run our apps, the storage element really haven’t changed.

Kubernetes itself doesn’t do storage. So, what we ended up with is a similar model like on-premise setup that an app leverage storage from external system. And this is a good thing. Kubernetes can keep levering all those field tested persistence storage mechanisms and technology from specialized storage systems.

Its very common, if you are running on public cloud, that external storage service is gonna be the storage services that your cloud offers e.g. EBS from AWS or Azure Files etc.

In this post, we will learn some basics of how to handle storage in kubernetes. I will be using Elastic Kubernetes Service to run my cluster, but principles are same even if you use a different cloud provider or even use on-prem NFS server etc.

Kubernetes Volumes

So our app is running inside a container, which is inside a POD. As we know that container file system is structured in layers. Some layers are read-only (image layers) and container can only write data to writeable container layer. However, container writable layer is not suitable for statefull applications because the data will be lost, if container is removed.

Kubernetes solves this issue with concept of volume. Volume is a piece of storage, which is mounted inside one or more containers and accessible from a mount path. That way, container can read and write data in the volume through a mount path. This way the data is not lost even container stops and starts again. This mechanism is very similar to docker volumes.

Kubernetes supports many types of volumes. A Pod can use any number of volume types simultaneously. Ephemeral volume types e.g. emptyDir or ConfigMap have a lifetime of a pod, but persistent volumes e.g. nfs, ebs exist beyond the lifetime of a pod. When a pod ceases to exist, Kubernetes destroys ephemeral volumes; however, Kubernetes does not destroy persistent volumes. For any kind of volume in a given pod, data is preserved across container restarts.

As you can see that there are different plugins for different storages. However, kubernetes is abstracting the way we can access those storages with volume plugins. They are declared the same way. Only the parameters are different. Most of these plugins are built-in to kubernetes. If you want more information about these plugins, you can check kubernetes docs on this link.

Define Volumes

There couple of ways to define the volume in kubernetes:

  • Define it directly in the Pod (this comes with portability issue coz due to this coupling, we have to change Pod definitions if only the storage is changed)
  • Define it using storage objects (mention below) statically or dynamically and independent of Pod
    • PersistentVolume
    • PersistentVolumeClaim
    • StorageClass
    • StatefulSet (to scale stateful application)

To keep things simple, we will got with first option and define it directly inside the pod definition. In later posts we will cover other options.

Setting the Scene

For this post, I am assuming, you are already familiar with basics of docker, kubernetes and EKS. For some background info, Please check the previous post, where we deployed an EKS kubernetes cluster and an Angular, .NET Core and Postgres accounting application.

The application source code, docker files and kubernetes YAML can be downloaded from this git repository (branch: k8storage). This branch uses code from previous post as a starting point.

I also updated docker-compose file and removed the docker volume part from it (volume will be managed by kubernetes). Notice that I also updated version numbers for docker images as well.

These container images are pushed to docker-hub public repo, so you can use them directly for your testing.

Next, I also updated the image version number in corresponding kubernetes YAML files in eks folder:

Now, I will be using EKS from AWS. You may be using a different cloud provider and a different application, but storage principles are same.

With all these changes in place, I can deploy the application to EKS cluster using following command:

kubectl apply -f eks/

and we can see that our various kubernetes resources are created and we can access the application using public address of load-balancer:

and here is our application, which we can navigate around and add some data.

You can access the application on this link. However, I may delete the cluster and the link might now work. Then, you have all the source code from repo and you can spin it up yourself, if you want.

This is almost the same setup, we did in previous post. With this base line setup, we can now explore kubernetes storage next.

Kubernetes Storage

So, our application is running and we can enter some data to it. However, sooner or later container is going to restart because the pod has to migrate or the node must be maintained or simply container crashes. With current setup, where we do not have any volumes defined, all the data entered now, will be lost.

To demo that, add some new data to the application and then we can kill the container inside the Pod with following command which uses pod-name and container-name to run a bash command:

kubectl exec pod/accounting-db-d7585d9d7-2f4z6 -c accounting-db-ctr -- /bin/sh -c "kill 1"

Here is the output when a pod is updated:

and now if we refresh the web application, we can see that data which we entered is gone:

So, without any volume setup, if container restarts, the data is gone. For statefull application, this is not gonna work.

Docker Volume with EmptyDir

Lets start with very simple type of volume i.e. EmptyDir. EmptyDir is also an ephemeral storage and is not useful to persist applications. However, it is simple to start with and will help us to understand the first steps in setting up volumes.

There are some use cases for EmptyDir type volume e.g. it can be used to save cache or configurations in certain use cases, can be used to share data b/w containers or uses cases where u want the data to be gone after sometime.

Following is the updated YAML file with EmptyDir volume setup:

First we defined a volume by giving it a name and with emptyDir type. Then we Mount this volume inside container using VolumeMounts. For Mount we reference the volume by name and also set a MountPath for location inside the container.

we can now apply the changes:

Now, kill the database container again inside using the command shown before and notice that this time data is not lost even container restarts:

However, if we delete the pod itself, we will lose the data. Lets demo this by deleting the Pod as shown in the example below:

kubectl delete pod accounting-db-d7585d9d7-2f4z6

as there is a deployment behind this pod and it will start a new pod as a replacement and this new pod will not have the data we entered. So, if we refresh the browser, we will see that data is gone.

Summary

In this post, we learned that kubernetes doesn’t provide storage itself. However, it has a nice plugin system to work with external storages using volumes in a similar way to on-premise solutions. This keeps kubernetes free from storage implementation concerns. There are different types of volumes some as for temporary storage and others for more robust persistence. Then there are different vendors and their specific implementations. Kubernetes have nice ways to deal with these setups and in this post we took the first step and started with EmptyDir type volume.

We also learned that EmptyDir is an ephemeral storage and not suitable for statefull applications. In next posts, we will continue from this point onward and learn about other volume types and how to use those in kuberetes.

You can download the source code from this git repository (branch: k8storage).

Let me know if you have some questions or comments. Till next time, Happy Coding.


Discover more from Hex Quote

Subscribe to get the latest posts sent to your email.

1 thought on “Storage in Kubernetes”

Comments are closed.

Discover more from Hex Quote

Subscribe now to keep reading and get access to the full archive.

Continue reading