1.5. Scaling

In this lab, we are going to show you how to scale applications on Kubernetes. Furthermore, we show you how Kubernetes makes sure that the number of requested Pods is up and running and how an application can tell the platform that it is ready to receive requests.

Task 1.5.1: Scale the example frontend application

Our example frontend is not really production ready with only 1 replica running. Let us scale it to 3.

If we want to scale our example application, we have to tell the Deployment that we want to have three running replicas instead of one. Let’s have a closer look at the existing ReplicaSet:

kubectl get replicasets --namespace <namespace>

Which will give you an output similar to this:

NAME                            DESIRED   CURRENT   READY   AGE
example-frontend-755c89fcd8     1         1         1       110s

Or for even more details:

kubectl get replicaset <replicaset> -o yaml --namespace <namespace>

The ReplicaSet shows how many instances of a Pod are desired, current and ready.

Now we scale our application to three replicas:

kubectl scale deployment example-frontend --replicas=3 --namespace <namespace>

Check the number of desired, current and ready replicas:

kubectl get replicasets --namespace <namespace>
NAME                            DESIRED   CURRENT   READY   AGE
example-frontend-755c89fcd8     3         3         3       4m33s

Look at how many Pods there are:

kubectl get pods --namespace <namespace>

Which gives you an output similar to this:

NAME                                   READY   STATUS    RESTARTS   AGE
example-frontend-755c89fcd8-d2nbz      1/1     Running   0          5m2s
example-frontend-755c89fcd8-f6hkb      1/1     Running   0          31s
example-frontend-755c89fcd8-qg499      1/1     Running   0          31s

As we changed the number of replicas with the kubectl scale deployment command, the example-frontend Deployment now differs from your local deployment_example-frontend.yaml file. Change your local deployment_example-frontend.yaml file to match the current number of replicas and update the value replicas to 3:

[...]
metadata:
  labels:
    app: example-frontend
  name: example-frontend
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example-frontend
[...]

Check for uninterruptible Deployments

Let’s look at our existing Service. We should see all three corresponding Endpoints:

kubectl describe service example-frontend --namespace <namespace>
Name:                     example-frontend
Namespace:                <namespace>
Labels:                   app=example-frontend
Annotations:              <none>
Selector:                 app=example-frontend
Type:                     ClusterIP
IP:                       10.43.91.62
Port:                     <unset>  5000/TCP
TargetPort:               5000/TCP
Endpoints:                10.36.0.10:5000,10.36.0.11:5000,10.36.0.9:5000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------

Scaling of Pods is fast, as Kubernetes simply creates new containers.

You can check the availability of your Service while you scale the number of replicas up and down in your browser: http://example-frontend-<namespace>.<appdomain>.

Now, execute the corresponding loop command for your operating system in a second terminal (Terminal -> New Terminal).

URL=$(kubectl get ingress example-frontend -o go-template="{{ (index .spec.rules 0).host }}" --namespace <namespace>)
while true; do sleep 1; curl -s https://${URL}/pod/; date "+ TIME: %H:%M:%S,%3N"; done

Scale from 3 replicas to 1. The output shows which Pod is still alive and is responding to requests:

example-frontend-755c89fcd8-d2nbz TIME: 16:42:55,485
example-frontend-755c89fcd8-d2nbz TIME: 16:42:56,501
example-frontend-755c89fcd8-f6hkb TIME: 16:42:57,518
example-frontend-755c89fcd8-f6hkb TIME: 16:42:58,533
example-frontend-755c89fcd8-d2nbz TIME: 16:42:59,544
example-frontend-755c89fcd8-f6hkb TIME: 16:43:00,553
example-frontend-755c89fcd8-w8kbx TIME: 16:43:01,560
example-frontend-755c89fcd8-d2nbz TIME: 16:43:02,567
example-frontend-755c89fcd8-d2nbz TIME: 16:43:03,583
example-frontend-755c89fcd8-f6hkb TIME: 16:43:04,599
example-frontend-755c89fcd8-d2nbz TIME: 16:43:05,611
example-frontend-755c89fcd8-f6hkb TIME: 16:43:06,617
example-frontend-755c89fcd8-w8kbx TIME: 16:43:07,624
example-frontend-755c89fcd8-d2nbz TIME: 16:43:08,632
example-frontend-755c89fcd8-w8kbx TIME: 16:43:09,639
example-frontend-755c89fcd8-w8kbx TIME: 16:43:10,645
example-frontend-755c89fcd8-d2nbz TIME: 16:43:11,653

The requests get distributed amongst the three Pods. As soon as you scale down to one Pod, there should be only one remaining Pod that responds.

Let’s make another test: What happens if you start a new Deployment while our request generator is still running? Go back to your first terminal and execute:

kubectl rollout restart deployment example-frontend --namespace <namespace>

During a short period we won’t get a response:

example-frontend-755c89fcd8-f6hkb TIME: 16:48:06,960
example-frontend-755c89fcd8-w8kbx TIME: 16:48:07,967
example-frontend-755c89fcd8-d2nbz TIME: 16:48:08,978
example-frontend-755c89fcd8-d2nbz TIME: 16:48:09,994
example-frontend-755c89fcd8-w8kbx TIME: 16:48:11,002
example-frontend-755c89fcd8-w8kbx TIME: 16:48:12,009
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
 TIME: 15:12:10,261
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html> TIME: 16:48:13,015
example-frontend-755c89fcd8-d2nbz TIME: 16:48:14,021
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
 TIME: 15:12:10,261
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html> TIME: 16:48:15,032
example-frontend-755c89fcd8-d2nbz TIME: 16:48:16,048
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
 TIME: 15:12:10,261
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html> TIME: 16:48:17,062
 TIME: 16:48:18,076
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
 TIME: 15:12:10,261
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
example-frontend-6c8744cdc8-k8vmw TIME: 16:48:31,183
 TIME: 16:48:32,190
example-frontend-6c8744cdc8-t5654 TIME: 16:48:33,199
example-frontend-6c8744cdc8-srbzf TIME: 16:48:34,206
example-frontend-6c8744cdc8-srbzf TIME: 16:48:35,212
example-frontend-6c8744cdc8-t5654 TIME: 16:48:36,219
example-frontend-6c8744cdc8-srbzf TIME: 16:48:37,226

In our example, we use a very lightweight Pod. If we had used a more heavyweight Pod that needed a longer time to respond to requests, we would of course see an even larger gap.

In the following chapter we are going to look at how a Service can be configured to be highly available.

Uninterruptible Deployments

The rolling update strategy makes it possible to deploy Pods without interruption. The rolling update strategy means that the new version of an application gets deployed and started. As soon as the application says it is ready, Kubernetes forwards requests to the new instead of the old version of the Pod, and the old Pod gets terminated.

Additionally, container health checks help Kubernetes to precisely determine what state the application is in.

Basically, there are two different kinds of checks that can be implemented:

  • Liveness probes are used to find out if an application is still running
  • Readiness probes tell us if the application is ready to receive requests (which is especially relevant for the above-mentioned rolling updates)

These probes can be implemented as HTTP checks, container execution checks (the execution of a command or script inside a container) or TCP socket checks.

In our example, we want the application to tell Kubernetes that it is ready for requests with an appropriate readiness probe.

Our example application has a health check implemented under the path /health.

Task 1.5.2: Availability during deployment

In our deployment we can add the update strategy. There we define that 75% off our application always has to be available during an update: maxUnavailable: 25%

Please add the following lines to your spec. in the file deployment_example-frontend.yaml:

...
  selector:
    matchLabels:
      app: example-frontend
  # start to copy here
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  # stop to copy here
...

Besides the update strategie we also define a check that our container only accepts traffic once it is ready. The so called readiness probe.

Insert the readiness probe at .spec.template.spec.containers above the resources line in your local deployment_example-frontend.yaml File as well:

...
containers:
  - image: quay.io/songlaa/example-web-python:latest
    imagePullPolicy: Always
    name: example-frontend
    # start to copy here
    readinessProbe:
      httpGet:
        path: /health
        port: 5000
        scheme: HTTP
      initialDelaySeconds: 10
      timeoutSeconds: 1
    # stop to copy here
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
...

apply the file with:

kubectl apply -f deployment_example-frontend.yaml --namespace <namespace>

Wait until all pods of the new deployment are ready. We are now going to verify that a redeployment of the application does not lead to an interruption.

Set up the loop again to periodically check the application’s response (you don’t have to set the $URL variable again if it is still defined):

URL=$(kubectl get ingress example-frontend -o go-template="{{ (index .spec.rules 0).host }}" --namespace <namespace>)
while true; do sleep 1; curl -s https://${URL}/pod/; date "+ TIME: %H:%M:%S,%3N"; done

Restart your Deployment with:

kubectl rollout restart deployment example-frontend --namespace <namespace>

Self-healing

Via the Replicaset we told Kubernetes how many replicas we want. So what happens if we simply delete a Pod?

Look for a running Pod (status RUNNING) that you can bear to kill via kubectl get pods.

Show all Pods and watch for changes:

kubectl get pods -w --namespace <namespace>

Now delete a Pod (in another terminal) with the following command:

kubectl delete pod <pod> --namespace <namespace>

Observe how Kubernetes instantly creates a new Pod in order to fulfill the desired number of running instances.