As someone who hopped on the K8s bandwagon back in the early days (circa-early 2017), _do not_ go into production with Kubernetes if you're still asking this question.
Just a few of the issues I've run into over the past 2 1/2 years or so:
- Kubernetes DNS flaking out completely
- Kubernetes DNS flaking out occasionally (for ~5 percent of queries)
- Giving out too many permissions, causing pods to be deleted without a clear reason why, often taking down production traffic or logging with it
- Giving out too few permissions, making our deployment infrastructure depend on a few lynchpins rather than sharing the production burden
- probably a dozen different logging aggregation systems, none of which strike a balance between speed and CPU cost
- probably a half-dozen different service meshes, all of which suck (with the exception of linkerd, which is actually quite good)
- teams with bad santization practices leaking credentials all over the place
- Running Vault in Kubernetes (really, don't ever do this)
- Disks becoming unattached from their pods for no discernable reason, only to be re-attached minutes later again with no explanation
- At least one major production outage on every single Kubernetes-based system I've built that can be directly attributed to Kubernetes
- Etcd failovers
- Etcd replication failures
- Privilege escalation due to an unsecured Jenkins builder causing credential exfiltration (this one was _super_ fun to fix)
Kubernetes is a powerful tool, and I've helped run some massive (1000+ node, 5000+ pod x 3 AZ's) systems based on K8s, but it took me a solid year of experimenting and tinkering to feel even remotely comfortable putting anything based on K8s into production. If you haven't run into any "major" issues, you're going to very soon. I can only wish you good luck.
Use tools like kustomize to reduce proliferation of duplicate k8s resource files.
Do make sure you are using health and liveness checks.
Definitely take care to specify resource requests and limits.
Do use annotations to control provider resources, rather than manually tweaking provider resources that are auto generated by basic k8s files with no annotations.
Aggregate your logs.
- If you're planning on using GKE, you'll have to expose your apps using Ingress (this way you can use GCP's L7 Load Balancing with HTTPs). However, this architecture has many limits (e.g. a hard limit of 1000 forwarding rules (FW) per project, each ingress creates an FW and k8s ingress can't refer to another namespace), so make sure you use namespaces wisely.
- Try to learn and teach people on your team about requests and limits. If you don't use it carefully, you'll end up wasting a lot of resources. Also, make sure you have Prometheus and Grafana set up, to give you some visibility.
- Setup Heptio's Velero, it's a lifesaver, especially when running in a managed environment where you have no access to etcd. It can be used to backup your whole cluster and migrate workloads between clusters. If, for some reason, you end up deleting a cluster by mistake, it will be easier to recover its workloads using Velero.
If you're asking these kinds of questions you shouldn't be using kubernetes.
If you are going to use it, be ready to have an engineer on your time be full time devops. Or be ready to hire someone who knows k8. It'll be around 110k to 140k.
But really, don't use it. The gospel you hear is from engineers who already invested their careers in it. Buyer beware.
Use namespaces and logically bounded clusters. Get your monitoring, and tracing and a dashboard to visualize this figured out now.
Kubernetes is a hell of a lot configurable, so your environment matters a lot on the must and nice to have.
If not managed, make sure you go through all components flags and configure things like reserved resources, forbid hostpath usage, pod security policies (do not allow root), etc
Also, avoid service meshes until you fully understand how to use “vanilla” Kubernetes, don’t add this complexity from day 1 because debugging cluster issues can get a lot harder.
Rolling out K8s should not be the goal. It’s a toolset, an expensive a bleeding edge one. It’s also very much geared for operators not developers so you likely need to build guide rails on top of it.
There are lots of good reasons to use K8s but make sure you know why you are.