Simplified K8s Troubleshooting with App-centric Abstraction

Feature of the month

Simplified K8s Troubleshooting with
App-centric Abstraction

Go from struggling with cryptic K8s errors to troubleshooting with easy, actionable messages

July 31, 2020 - by Jayalakshmi Elango

k8s troubleshooting & kubernetes developer tools

Imagine this. You have just crossed the hurdle of deploying your app onto Kubernetes successfully. You build a docker image, push the image to docker hub, do the security scanning, run through several unit tests, and finally hit deploy.  Before you catch your breath, your ops team notifies a bug, or Kubernetes pod running but not ready or an error with the cluster.

So you thought deployment to K8s is the complicated part, here comes troubleshooting to prove you wrong.

As most of us know, deployment failure messages on Kubernetes are complex and not intuitive for debugging. Users need to identify the errors from pod logs, pod events(describe pod), pod status, etc. Developers spend a lot of time and effort demystifying the cryptic messages K8s throws up, perfectly valuable time that could have otherwise gone in building core functionalities of the application.

Does this sound familiar? We hear you!

K8s is complex. This makes application deployment to K8s challenging, time-consuming, tedious and error-prone. But it doesn’t have to be that way.

We relate to this pain point because of our first hand experiences deploying to K8s, and we thought to ourselves what if there was an abstraction layer on top of K8s that can simplify deployment in an application-centric way. This abstraction could bring higher-level objects & actions that are intuitively understood by developers & devops professionals alike making deployments, maintenance & troubleshooting a breeze!

How HyScale simplifies K8s Troubleshooting

HyScale’s core offering is an open source tool that helps you deploy apps to Kubernetes within minutes effortlessly. When an issue occurs during deployment, HyScale’s app centric abstraction can help users rapidly troubleshoot. Such an abstraction can automatically run the necessary steps behind the scenes to determine the potential cause of the error in general developer terminology. For example, if all the desired pods are not running as expected, then it will check if these desired pods are initialized, if not then it will throw an error stating that, ‘Init-containers failed to execute, check your pre-start commands in hspec’. If the pods are initialized, then it will continue to check various conditions like if the pods are scheduled, are there any pending pods, if the cluster is full, any pending PVC and so on.

The tool tries to identify the potential problem and recommend possible fixes for the identified problem. Essentially, HyScale abstracts complex K8s error messages by translating them into developer-friendly language.

The following flow chart explains the Kubernetes troubleshooting flow for any HyScale deployment and to run applications seamlessly. Some of the errors that occur in a plain kubernetes deployments can be prevented through HyScale because they automate the process of manifest generation. These kinds of prevention errors are greyed out in the flowchart.

Concept Image Credit: Click here to download in PDF format

HyScale enables App Centric Troubleshooting in 2 ways.

1) Deployment: This helps an end-user troubleshoot the entire HyScale deployment. This happens along with the deployment workflow incase of deployment failure.

2) Service status: The end-user will be able to make out what exactly went wrong with their service in the case of a failed deployment.

For example, let us look into a scenario of a service status workflow and deployment workflow. In the former, if there exists any other resources of the service status, it will check the manifests of kubectl apply log . For the deployment workflow, if the deployment is successful, and the pod has an IP address as per the Kubernetes networking model, HyScale detects that there may be an issue with the kubelet and it directs to contact cluster administrators to avoid networking problems.

Behind the Scenes: Troubleshooting Abstraction

Let us take a look at what happens once the application is deployed, for HyScale to abstract the troubleshooting process. Whenever Kubernetes sends an error message, the HyScale tool runs through a flow chart automatically to check various things. After a complete scan, HyScale reports back on where things could have gone wrong like invalid start command or invalid health check etc. with simplified error messages that are easily understandable and fixable.

Just to put things in perspective, here is a comparison chart of the error messages K8s throws up Vis a Vis what HyScale shows you.

K8s error messageHyScale error message
CrashLoopBackOffService observed to be crashing. Please verify the startCommands in hspec or CMD in Dockerfile
CrashLoopBackOffService container exited abruptly Possible errors in ENTRYPOINT/ CMD in Dockerfile or missing ENTRYPOINT
CrashLoopBackOff ⇄ RunningHealth checks specified for service failed 3 times in succession
ImagePullBackOffIncorrect registry credentials
ImagePullBackOff/ErrImagePullInvalid Image name or tag provided. Recheck the image name or tag in service spec
ImagePullBackOff/ErrImagePullMissing target registry credentials for
PendingCannot accommodate new services as the cluster is full. Please contact your cluster administrator to add cluster capacity or deploy to a different cluster
PendingCannot provision new volumes, no storage class configured in your cluster. Please contact your cluster administrator
PendingDeployment is still in progress, service is not yet ready. Try querying after sometime
OOMKilledOut of memory errors. Not enough memory to run . Increase the memory limits in service spec and try redeploying
ErrorService startup commands failed with exitcode 1. Possible errors in startCommands in service spec or ENTRYPOINT/CMD in Dockerfile

Here are a couple of HyScale commands for easy debugging:

  • hyscale get service status -s ‘<myservice>’ -n ‘<my-namespace>’ -a ‘<my-app-name>’ is used to view the status of a particular deployed service
  • Users can check for pod logs using HyScale to get service logs -s <service_name> -a <app_name> -n <namespace> for troubleshooting

With simplified troubleshooting comes faster time to recovery and the flattening of the learning curve for dev teams as they get to fix deployment errors like kubernetes deployment not creating pods or troubleshoot ingress kubernetes quickly. HyScale’s app-centric troubleshooting auto-analyzes various types of cryptic Kubernetes errors and converts them into understandable and actionable alerts that save time and effort. Not just that, developers can also easily decode complicated terms, translate and bind them together with different YAMLs with the right labels for easy debugging.

Do you have any troubleshooting related feature requests/feedback that can make HyScale better? Let us know on our Github community. Here is a simple guide to get you started with HyScale.

Simplify K8s deployments with
App-centric Troubleshooting