HACKER Q&A
📣 nk-devtron

How to automate Kubernetes application debugging process?


No brainer things i find myself doing again and again while debugging kubernetes issues

1. debugging pod crashloopbackoff

2. Checking events, logs, labels, pods, services

3. Looking for older events which are gone :(

4. Frequently logging into cloud providers dashboard to figure out if there is any issue with cloud provider.

5. Traffic is not being received by downstream applications

6. Ensuring services are selecting right pods

7. Launching pod to execute curl/dnsutils/awscli

8. For externally exposed service figuring out if ingress is routing traffic correctly, there isn’t other config superseding it

9. Doing exec into pod to check configmap/secret changes are reflected or not or killing the pod if feeling too lazy to check

10. Figuring out why node is not ready

11. Checking RAM and CPU utilization

12. Figuring out how this application is deployed: helm , argocd, flux, tecton, wf

13. Checking if manifest has changed recently and comparing for manifest misconfiguration

14. Comparing manifest with other env manifest to be sure if new config parameter has not been missed

15. Building mental model for applications context boundary

Have you felt the same? I wanted to automate it, which feature should i implement first?


  👤 bg24 Accepted Answer ✓
You may have captured these as part of some bullet points already - 16/ Network connectivity. Make sure that the resource is accessible. It could be api server, a controller, or an application pod. 18/ What changed in the sw versions - install/remove/update 19/ Is your DNS working correctly? 20/ Does the pod have right permissions (RBAC)

👤 streetcat1
You should write an operator for inner checks.

This might be helpful:

https://learnk8s.io/troubleshooting-deployments

I am planning to do the same for a platform that I am building, and is deployed on prem. Let me know if this is an open source project.


👤 prakarsh
Following this thread, will post some Kubernetes debugging issues.