-1688504748.jpg)
Senior Consultant - DevOps | MLOps
InfosysGitOps Consultant and Kubernetes Developer
CapgeminiCloud DevOps Consultant
Tata Consultancy ServicesAzure
Azure DevOps Server

Kubernetes
.jpg)
Terrafrom

PowerShell

Bash

Python

Azure Cloud

AKS

EKS
.png)
Docker

Helm
.png)
Jenkins

Nexus

SonarQube

ArgoCD

Ansible
Yes. So I am Adi. I have around 6, 7 years of experience in DevOps as a DevOps engineer. I am working on a project currently. I have designed the deployment architecture for the application and the CICD architecture. So the project was a microservice application that needed to be deployed on the Azure Kubernetes cluster. For that, we used Azure Kubernetes service. And the deployment architecture we created was simple. We used the Azure application gateway as the entry point of the application. Behind the Azure application gateway, we used Azure Kubernetes service. Inside the Kubernetes service, we had around 20 microservices. We deployed all 20 microservices onto the Azure Kubernetes service. For each microservice, we created a deployment YAML file and a service YAML file. We created a deployment of replicas 3 for high availability for each microservice. We used a pressure IP service to create a load balancer kind of thing. We used the ingress controller and the ingress object to expose it. There were config maps and secrets also used for this. In the deployment component of the YAML file, we implemented readiness groups. Before the point where our application took around 2, 3 minutes to be started, the pod started in a running state. The pod started when you said it, but our application would be ready after 3 minutes. It takes some time. There are some processes that it does. So we used the readiness probe inside the container, and that readiness probe was used to prevent traffic before the application was in the ready state. So, the second thing we did was create the CICD pipelines for different microservices, and we used Azure pipelines for that. Our code is stored in Azure repositories, Azure DevOps. Our tasks and user stories and sprints were created in Azure Boards. So, we used Azure DevOps product completely. Let me tell you the pipeline details. In the pipelines, we have multiple stages. The first stage was to create the repository. The second stage was to do the docker build. Then we had a SonarQube scan static code analysis. Then we scanned the docker image using the Trivy vulnerability scanner. Then we uploaded that image to the Azure Kubernetes service. This was the complete deployment architecture and the CICD architecture, what I planned and what we created. If I tell you about the branching strategy, we used the Git flow branching strategy for the applications. In that branching strategy, whenever a new requirement was worked on, the developer created a new feature branch. Once the feature was developed, he launched a develop branch. That develop branch was deployed onto the dev environment. Once the dev environment was good enough, he merged it to a release branch, and then the release branch was deployed to a QA environment. The QA environment was tested by our testing team. They were doing manual testing, and there was some automation testing. For automation testing, there is a pipeline.
To implement security measures to prevent unauthorized access in a Kubernetes cluster, I will detail the following steps. To prevent unauthorized access to the Kubernetes cluster, what I will do is implement Role-Based Access Control (RBAC) in the cluster so that even if someone accesses the cluster, they would have limited access. We will be enabling RBAC. For example, if we want to give access to some users, they can only create pods, so we'll only give them create pod access. If we have users who only want to have read-only access, we will give them only read-only access to the system. So, we will plan it properly and do that. Another thing we will need to implement is securing the token. The token used to communicate with the API server needs to be secured. If we are using a managed Kubernetes cluster on Azure or AWS, we can receive the token using the Azure CLI or the AWS CLI, which protects it. However, if we are hosting it on a bare metal server, we need to protect the master server and prevent public connectivity to the master node of the cluster. There are two major things to do. First, we need to restrict access to the master node, and second, we need to enable RBAC in the cluster.
How do you configure horizontal for data scaling? Based on custom metrics. Yeah. So, by default, horizontal for data scaling, we will be setting up by using the metrics, such as CPU and memory. These are the basic metrics. But if you want to use custom metrics, then there is an option in the HPI settings. HPI can be used. And one thing that I use is. So what I have done in my project, we have an Azure event app. So, the requirement was, whenever the application traffic increased, the application was reading the messages that were continuously coming in the event topic. Okay. So, what was happening, when the topic's number of messages increased, then we wanted to scale up the number of ports. And if the number of messages went down, we wanted to scale down the number of ports. For that, we used the KEDA service, and we deployed the KEDA in the cluster. And that KEDA was continuously authenticated to the event hub. We gave the credentials to the event hub. So that KEDA was trying to see the number of messages currently that event hub was getting, and once everything was set up. The horizontal auto scaler was configured to check if the number of messages was going above 200 per minute. Then, we would increase the number of ports. This kind of setup can be done using the KEDA that we have set up in our product.
What strategies you would you employ to ensure 0 downtime deployments? Transitioning from Tanzu to Kubernetes to AKS. Okay, so why transitioning to Tanzu Kubernetes to AKS? The thing is, both are bare-metal only, and we need to manage everything. From an unmanaged cluster, Tanzu Kubernetes, to a completely managed cluster, AKS. So what we have to do is first, we need to access the environment. Okay, so first, we need to see what application is running, what number of ports, what deployments, what cached rules, and what config maps, what secrets, and whatever Kubernetes objects are running, whatever Ingress controllers. We have to plan what Ingress controllers were used in Tanzu Kubernetes and what ingress controller we will be using in Azure Kubernetes Service. So, one by one, we need to understand the existing environment, what is running in the Kubernetes. And once everything is understood, we can start migrating the things one by one. For example, first, let's deploy the cluster rules and everything required. Then, let's try to see what config maps are running there. We will just get the config maps and run on the AKS cluster. And, one by one, we will be migrating the objects to the Azure Kubernetes Service. And in the Azure Kubernetes Service, at last, we will be migrating all the deployments and the pods and the replica sets and the stateful sets and the daemon sets. Whatever workloads are running in Tanzu Kubernetes, one by one, we will be completely migrating into. And when we are doing the migrating, we will not make the application down. What will happen? We will set up the complete application again on the AKS. Okay. At last, so for example, our application is running on one domain name, and that domain is linked to the Tanzu Kubernetes Ingress controller IP address, public IP address. So now, what we need to do, once the complete application is deployed and the Ingress controller is deployed in the AKS and everything is set up, the ingress object, ingress rules, and everything is written in AKS, then, at last, I will try to access it using the kube proxy, like, the kube proxy command. I will try to access the application running in the AKS using the kubelet port forward. And if I see everything is fine, we have done the migration, the complete application is running, find in AKS, then, we will be doing a DNS switch from Tanzu Kubernetes to the AKS. So this will ensure 0 downtime deployment. The time the DNS was pointing to Tanzu Kubernetes, the application will be delivered from there. The customers will be able to use it from there. But once the application is completely set up on AKS, then we will do the DNS switch. And once the DNS is switched to the AKS, the users will be directly accessing the application from the edge of the Kubernetes cluster.
What are the benefits of using Helm charts in Kubernetes, and how would you manage dependencies in a Helm chart? So, the benefits of using a Helm chart in Kubernetes is for example, if you will see, we need to write a deployment of YAML file. And for four things, we have to write a deployment of YAML file. The thing is, there is no way we can create any variables in that deployment of YAML files or any Kubernetes manifest. We cannot create any variables. So, from the templating perspective, this Helm chart will come into the picture. For example, let's suppose an example of a microservice application. There, we have four microservices. And for four microservices, we have a deployment YAML, a service YAML, an ingress YAML, and a config YAML. All the required things, whatever we need for a specific application. So all the things are there. So if we are not using the Helm chart, then we have to create separate deployments of YAML and everything. In the case of Helm charts, we get a template. Helm charts are a kind of templating engine. So, what it will do, we can create variables. Like, for example, if we want to create a variable for the image, then we can place the variable there in the deployment YAML file, and we can control the values of those variables using a single values YAML file. So now, what we can do is, like, for example, we have four environments, then we will create the complete Helm chart of our microservice application. And then we will have the same Helm chart. The only difference that would be is we will be creating different values YAML files. And, like, once it's set up and once it's done, then it's the Helm chart will be deployed. Using a single Helm chart and different values YAML files, we can deploy on the dev, QA, prod, and so on. This helps to manage dependencies. So, to manage the dependencies, like, for example, the first microservice is dependent on the second, the second is dependent on the third. This can be done also using the chart. The proper syntax while writing the templates folder of the Helm chart, while writing the Helm charts, while writing the templates, we can plan this, and we can manage the dependencies properly. It's possible in the Helm chart.
What is the Kubernetes operator, and how does it simplify cluster application management? The Kubernetes operator is a concept that simplifies cluster application management. Okay. To understand this, let's take an example of Prometheus. So if you want to deploy a Prometheus application in the cluster, it would be very difficult. We need to set up all the things. The we need to create a deployment. We need to create a service. We have to create all the things, whatever is required for running the Prometheus. But what happens is there is a Prometheus operator. The Prometheus operator will help us to deploy the complete Prometheus package in the cluster. And for that, we just need to write one YAML file, one manifest file with a kind of Prometheus, and that will deploy the complete Prometheus application inside the cluster, and that we can use it for. It makes it very simple. So let's take an example. For example, we have to deploy four Prometheus instances in our cluster, for different namespaces. We want four different Prometheus instances. Before the Kubernetes operator, before the Prometheus operator, what we need to do, we need to install and manage the Prometheus in separate namespaces, one by one. But after using the Kubernetes operator, after using the Prometheus operator, what we can do, we just need to deploy the Prometheus operator in the cluster, and it will give us new CRDs. And once we have the new CRDs in place, we can just write the API version. We can give the kind as Prometheus. In the spec, we can give some details about the Prometheus instance, what we want, and we can create this YAML file, the manifest file of the Prometheus application in all the four different namespaces to deploy the Prometheus easily and manage it in a proper way. This helps. Like, this operator, the Kubernetes operator helps a lot in managing the application. I just took an example of Prometheus. But if you want, we can also create our own Kubernetes operator. For example, if we have an application, we want to deploy it in the Kubernetes, we can create our own operator. We can deploy the operator in the cluster. It will give different CRDs. We need to create the CRDs for that. And then if you want to deploy the application, whenever you're going to deploy the application, we just need to give the kind and that application name. And once we apply that manifest file, it will deploy whatever is required for that application and will do that.
The concept of a call life cycle and the states of a call can be in is not directly related to the provided text, which appears to be about the life cycle of a pod in a Kubernetes cluster. Here is the corrected transcript text: So, the call, like, if I talk about the call, so pod is first of all, the pod is the smallest unit in a Kubernetes cluster. Okay. The second thing is the pod when we deploy it. So, the first time when we deploy the pod, when we create the pod, whatever we say, the pod will be in the pending state. So, the first state of the pod is the pending state. Once it is in the pending state, something is happening, it will be in the pending state. The scheduler will be checking what node is available and all, it will be there. So, first, pending. Then, second, if everything is done, it will be scheduled. Like, the pod will be in the scheduled state, then it will be in a running state. So, if everything goes well, the pod will be in the running state and, yeah, there are if the task is complete, like, for example, in the pod, if we are running one command. So, if that command is completed, then the pod will be in the completed state when the command is executed. Okay. But, if the pod is running continuously, then it will be continuously running. So, that is to deliver. Once the application is exited, if something happens, like, an exception occurs or something happens, then the pod will be in the failed state. There can be different states like crash to back off, image pull back off. There can be different states of a pod. But, majorly, the pod will be in a failed state, we can say. And in the failed state, we have the crash loop back off. If something happens in the application, in the container running inside the pod, something happens to that, then the pod will be in the crash loop. Then we need to check the logs to fix and understand what is the issue that I've offered. And second, what can happen is image pull back off. Like, for example, while pulling the image from the container registry, there is some issue. It might be the authentication issue. Might be the container registry is not working. Something any issue. Might be we are not able to connect to the container registry. For whatever the issue, we will be getting the image back off, and then you can check the events. Like, Kubernetes will give the events for that one, and we can check the using the kubectl describe command, and we can get all the details. So, this is the thing. And for example, if the pod is running, then there is one more state, evicted. But, let's take an example. For example, let's see if the pod is running. Okay. And while running, it's running on one of the nodes, and that node goes down. Okay. Or the CPU memory for that node is unavailable. Something happens to that node. So, at that time, the pod will be marked as evicted. That means, the CPU memory that were required for that pod were not available due to some issue in the node or something. So, that at that time, it will be marked as. So, there are total, like, 5 stages. We started from pending, then it could be running, completed, failed, and evicted. And in the failed, we have seen, like, 2, 3 more, crash to back off image black off, and there it can be more.
To create a persistent volume claim in Kubernetes, what we need to consider is where we want to create a persistent volume. So, first, we need to determine where we want to create a persistent volume. For example, if we are using Azure Kubernetes service, we have different options, such as creating a storage on the Azure storage account, creating a block, or creating an Azure disk. First, we need to plan where we want to create a persistent volume claim. According to that, we need to see if the storage driver is available in our cluster. Once we know that the storage type is available, like if we plan to use the Azure disk and attach it to a pod as a persistent volume, then we need to see the storage driver. We have seen that we have a storage driver for Zodays that is available to use, and it is authenticated to Azure. This storage driver should be authenticated to Azure and should be able to provision a resource in Azure, such as a new disk. Otherwise, it won't work. If that is already done, we'll just write the manifest file for the persistent volume claim and the persistent volumes, and then we will be attaching the persistent volumes to the deployments or pods where they are required, which will help.
How could you handle the disaster recovery and backup strategies for stateful applications running on Kubernetes in Azure? Okay, so for disaster recoveries for stateful applications, what we'll do is create a stateful set. First, we need to create a stateful set for example, if we have a stateful application known as SonarQube. SonarQube is also a stateful application. So let's suppose we have one stateful application running in a stateful set. So, a stateful set will give us a few good things that will help us for the stateful application to run properly. First, stateful applications are low application. For example, we have a database also, and the database is also stateful. So whenever we have a database, we want to create a cluster of database. What happens is that one instance of the database needs to know what the other instance of the database is doing. So for that, we need to have the names of the pods to be the same so that one pod can connect to another very easily. And that is done by the application itself. But it should be like this: if we create a deployment for that, then the pod IDs would be changed, the port name would be changed. That's a problem. So for that, we'll be using a stateful set. And then what we'll be doing is attaching a volume. In the stateful set, we will be attaching the volume onto the ports that are running so that, for example, one pod is running and one pod, let's suppose, like there are two pods running our Sonar application. Okay, so on the pod 1, it's running SonarQ pod number 1. On the pod number 2, we have the SonarQ pod number 2. Okay. Now on each pod, we have one volume attached. Attached. Okay, so pod 1 is attached with one volume, and pod 2 is attached with one volume. Now what will happen is that if the pod is down, if a pod one is down, okay. This is the you want to set up the backup recovery only. Backup and disaster recovery only to we need to discuss on that only. So let's suppose if the pod 1 goes down, what will happen? When the pod 1 goes down, then the stateful set will create a new pod with the same name connected to the same volume. So what will happen is that the new pod will be up and running with the same name connected to the same volume. And once it is done, it will be very simple that the pod with the application will pick up the volume. Whatever is there in the volume, it will read it, and it will continue working from there only. It will not start the work again. It will read the volume from where the previous pod was, and it will start working from the same place. So like this, we will do it. And one thing we can do for disaster recovery is we can run backups of the vault. We can take backups of the vault. Let's suppose, like, let's assume that these vaults are in the storage account. So then, these are the block storage. These are the block containers that are connected as a persistent volume in those ports. So then, what we can do is enable the GRS replication in that storage account. So the data will be replicated in three regions. So that will be a totally, thing we can use from Azure. So to make the data or the volume as highly available in the case of disaster also.
Implementing a service mesh in a Kubernetes environment offers several advantages. For instance, in a microservice-based environment, we need to know which application's traffic is going to which microservice and which application is communicating with which application. We need to know the complete flow of the application network flow and traffic flow. This is where a service mesh comes into play. A service mesh like Linkerd provides visibility into the communication between microservices. When we install Linkerd on a cluster, a sidecar is created for each port running in the cluster. This sidecar acts as an intermediary between the main application container and the external world. Whenever traffic is sent from one pod to another, it goes through the sidecar first. The sidecar then creates a complete map of the mesh on its UI, allowing us to see which pod is sending traffic to which application and how communication is happening between them. This provides several benefits, including: - Visibility into communication between microservices - Ability to create ingress and services to expose them - A complete map of the mesh on the Linkerd dashboard Considering the choice of a service mesh, we need to consider factors such as: - The complexity of our application and the level of visibility required - The performance impact of introducing an additional layer of communication - The ease of use and management of the service mesh - The level of support and integration with our existing Kubernetes environment
How do you approach performance testing for deployments in Kubernetes, and how does it influence capacity planning? For doing the performance testing for deployments, we have deployed our application. Then we set up a different tool on a different virtual machine. From that machine, we sent fake traffic to the application running in Kubernetes using the g meter. We were monitoring the application's performance in the Grafana dashboard. Once we ran the JMeter test, which we had written to simulate a specific amount of traffic, we checked the Grafana dashboard to see how the application was performing. We monitored the CPU and memory usage to determine if the cluster was able to handle the load. According to the results of the performance test, we planned the capacity of the cluster. For example, if we were sending 10,000 users traffic from the g meter, we would check the Grafana dashboard to see if the CPU and memory usage was within normal limits. If it was, then the cluster was able to handle the load. Otherwise, we would know that we needed a larger cluster to handle the increased traffic. If we wanted to cater to more users, we would increase the number of nodes in the cluster. We could also set up auto scaling to automatically add more nodes as needed. Alternatively, we could increase the resources allocated to each node, either by using horizontal scaling or vertical scaling. Horizontal scaling involves increasing the number of nodes, while vertical scaling involves increasing the resources allocated to each node. For example, if our application was handling a large amount of traffic, we would use horizontal scaling to add more nodes. However, if our application was a machine learning model that required a lot of CPU and memory, we would use vertical scaling to increase the resources allocated to each node. In some cases, we might need to use both horizontal and vertical scaling to ensure that the application was able to handle the increased traffic. By running refined performance tests, we are able to determine the capacity of our cluster and plan for future growth. We can then increase the number of nodes or resources allocated to each node to ensure that the application is able to handle the increased traffic.