Application Observability and Maintenance (15%)¶
This domain covers monitoring, debugging, and maintaining applications in Kubernetes.
Probes¶
Liveness Probe¶
Determines if a container is running. If it fails, the container is restarted.
apiVersion: v1
kind: Pod
metadata:
name: app-with-liveness
spec:
containers:
- name: app
image: myapp:v1
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
Readiness Probe¶
Determines if a container is ready to receive traffic.
apiVersion: v1
kind: Pod
metadata:
name: app-with-readiness
spec:
containers:
- name: app
image: myapp:v1
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Startup Probe¶
Used for slow-starting containers. Disables liveness/readiness until it succeeds.
apiVersion: v1
kind: Pod
metadata:
name: app-with-startup
spec:
containers:
- name: app
image: myapp:v1
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
Probe Types¶
| Type | Description |
|---|---|
httpGet | HTTP GET request to specified path and port |
tcpSocket | TCP connection to specified port |
exec | Execute command in container |
grpc | gRPC health check |
# TCP Socket probe
livenessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
periodSeconds: 10
# Exec probe
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
# gRPC probe
livenessProbe:
grpc:
port: 50051
initialDelaySeconds: 10
Probe Parameters¶
| Parameter | Description | Default |
|---|---|---|
initialDelaySeconds | Delay before first probe | 0 |
periodSeconds | How often to probe | 10 |
timeoutSeconds | Probe timeout | 1 |
failureThreshold | Failures before action | 3 |
successThreshold | Successes to be considered healthy | 1 |
Logging¶
Viewing Logs¶
# View pod logs
kubectl logs nginx
# View specific container logs
kubectl logs nginx -c sidecar
# Follow logs
kubectl logs -f nginx
# View previous container logs (after restart)
kubectl logs nginx --previous
# View last N lines
kubectl logs nginx --tail=100
# View logs since time
kubectl logs nginx --since=1h
kubectl logs nginx --since-time=2024-01-01T00:00:00Z
# View logs from all pods with label
kubectl logs -l app=nginx
# View logs from all containers in pod
kubectl logs nginx --all-containers
Logging Architecture¶
┌─────────────────────────────────────────────────────┐
│ Node │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Pod 1 │ │ Pod 2 │ │ Pod 3 │ │
│ │ stdout/err │ │ stdout/err │ │ stdout/err │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Container │ │
│ │ Runtime │ │
│ └─────┬─────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Log Files │ │
│ │/var/log/ │ │
│ └───────────┘ │
└─────────────────────────────────────────────────────┘
Debugging¶
Debug Commands¶
# Describe pod (events, status)
kubectl describe pod nginx
# Get pod details
kubectl get pod nginx -o yaml
kubectl get pod nginx -o wide
# Check events
kubectl get events --sort-by='.lastTimestamp'
kubectl get events --field-selector involvedObject.name=nginx
# Execute command in container
kubectl exec nginx -- ls /app
kubectl exec -it nginx -- /bin/sh
# Copy files to/from container
kubectl cp nginx:/var/log/app.log ./app.log
kubectl cp ./config.yaml nginx:/app/config.yaml
# Port forward
kubectl port-forward pod/nginx 8080:80
kubectl port-forward svc/nginx 8080:80
# Debug with ephemeral container
kubectl debug nginx -it --image=busybox --target=nginx
Common Issues¶
| Issue | Debug Steps |
|---|---|
| ImagePullBackOff | Check image name, registry access, pull secrets |
| CrashLoopBackOff | Check logs, probe configuration, resource limits |
| Pending | Check events, node resources, taints/tolerations |
| OOMKilled | Increase memory limits |
| CreateContainerConfigError | Check ConfigMaps, Secrets references |
Pod Status Phases¶
| Phase | Description |
|---|---|
Pending | Pod accepted but not running |
Running | Pod bound to node, containers running |
Succeeded | All containers terminated successfully |
Failed | All containers terminated, at least one failed |
Unknown | Pod state cannot be determined |
Monitoring¶
Resource Metrics¶
# View node resource usage
kubectl top nodes
# View pod resource usage
kubectl top pods
kubectl top pods -A
kubectl top pods --containers
# Sort by CPU/memory
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory
Metrics Server¶
Required for kubectl top commands:
# Check if metrics server is running
kubectl get pods -n kube-system | grep metrics-server
# Install metrics server (if needed)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Application Maintenance¶
Updating Applications¶
# Update image
kubectl set image deployment/nginx nginx=nginx:1.22
# Update environment variable
kubectl set env deployment/nginx ENV=production
# Update resources
kubectl set resources deployment/nginx --limits=cpu=200m,memory=512Mi
# Patch resource
kubectl patch deployment nginx -p '{"spec":{"replicas":5}}'
Scaling¶
# Manual scaling
kubectl scale deployment nginx --replicas=5
# Autoscaling
kubectl autoscale deployment nginx --min=2 --max=10 --cpu-percent=80
HorizontalPodAutoscaler¶
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Key Concepts to Remember¶
- Liveness - Is the container running? Restart if not
- Readiness - Is the container ready for traffic?
- Startup - For slow-starting containers
- kubectl logs - View container output
- kubectl describe - Detailed resource info with events
- kubectl top - Resource usage (requires metrics-server)
Practice Questions¶
- What happens when a liveness probe fails?
- How do you view logs from a previous container instance?
- What is the difference between readiness and liveness probes?
- How do you execute a command in a running container?
- What probe type would you use for a database container?
← Previous: Application Deployment | Back to CKAD Overview | Next: Application Environment, Configuration and Security →