Kubernetes manifest to prevent service unavailable
Once build the CI/CD which runs automatically from build to deployment, developers are afraid that some problematic container image will be deployed and it will cause service unavailable. Especially web sites should be checked by some scraping tool like Selenium but it’s hard to do that for most start-up companies because of human resources. However, adding some settings to the Kubernetes manifest will reduce possibilities of problems.
First, configure the rolling update settings like this.
1 strategy:
2 type: RollingUpdate
3 rollingUpdate:
4 maxSurge: 1
5 maxUnavailable: 0
Second, configure the liveness probe as follows.
1 livenessProbe:
2 tcpSocket:
3 port: 80
4 initialDelaySeconds: 15
5 periodSeconds: 20
If the running application is simple, it’s enough until here but if it’s complex, should add some configuration. For example, cache server’s pods which have to do the pre cache loading process before service-in should add a readiness probe.
Create an empty dir to share the readiness probe file in advance.
1 volumes:
2 - name: "readiness-check"
3 emptyDir: {}
And then, add a sidecar to run the script which executes pre cache loading and then create an empty file for readiness check.
1 - name: init-cache
2 image: myregistry/webcache-init::202007301510
3 command: ["/bin/ash", "-c", "/init.sh && tail -f /dev/null"]
4
5 volumeMounts:
6 - mountPath: "/tmp/readiness-check"
7 name: "readiness-check"
And also, add the mount setting and the readiness check to the main app container setting.
1 - name: webcache
2 image: myregistry/webcache:202007301510
3 imagePullPolicy: IfNotPresent
4 ports:
5 - containerPort: 80
6 readinessProbe:
7 exec:
8 command:
9 - cat
10 - /tmp/readiness-check/init_cache_ready
11 initialDelaySeconds: 900
12 periodSeconds: 5
13
14
15 volumeMounts:
16 - mountPath: "/tmp/readiness-check"
17 name: "readiness-check"
By the way, from version 1.16, there's a new feature "Startup Probe." It looks good to configure more flexibly. Actually I haven't tried yet because my production cluster is 1.14.
Ok, here is the full YAML file.
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: webcache
5 namespace: app
6 labels:
7 app: webcache
8spec:
9 replicas: 4
10 selector:
11 matchLabels:
12 app: webcache
13 strategy:
14 rollingUpdate:
15 maxSurge: 1
16 maxUnavailable: 0
17 type: RollingUpdate
18 template:
19 metadata:
20 labels:
21 app: webcache
22 spec:
23 affinity:
24 podAntiAffinity:
25 requiredDuringSchedulingIgnoredDuringExecution:
26 - labelSelector:
27 matchExpressions:
28 - key: app
29 operator: In
30 values:
31 - webcache
32 topologyKey: "kubernetes.io/hostname"
33
34 containers:
35 - name: webcache
36 image: myregistry/webcache:202007301510
37 imagePullPolicy: IfNotPresent
38 ports:
39 - containerPort: 80
40 readinessProbe:
41 exec:
42 command:
43 - cat
44 - /tmp/readiness-check/init_cache_ready
45 initialDelaySeconds: 900
46 periodSeconds: 5
47
48 resources:
49 requests:
50 cpu: 10m
51 memory: 64Mi
52 limits:
53 cpu: 1000m
54 memory: 1000Mi
55
56 livenessProbe:
57 tcpSocket:
58 port: 80
59 initialDelaySeconds: 15
60 periodSeconds: 20
61
62 lifecycle:
63 preStop:
64 exec:
65 command: ["/bin/ash", "-c", "sleep 10"]
66
67 volumeMounts:
68 - mountPath: "/tmp/readiness-check"
69 name: "readiness-check"
70
71 - name: init-cache
72 image: myregistry/webcache-init::202007301510
73 command: ["/bin/ash", "-c", "/init.sh && tail -f /dev/null"]
74
75 resources:
76 requests:
77 cpu: 10m
78 memory: 16Mi
79 limits:
80 cpu: 100m
81 memory: 32Mi
82
83 volumeMounts:
84 - mountPath: "/tmp/readiness-check"
85 name: "readiness-check"
86
87 volumes:
88 - name: "readiness-check"
89 emptyDir: {}