
Fateh Khan Demonstrates proficiency in a wide array of technologies including Terraform, Git, Helm Charts, and Kubernetes, among others, in the role of DevOps and SRE Engineer Manager at V4You Technologies. Showcasing exceptional performance as an Infrastructure SRE, delivering reliable 24x7 infrastructure and application operations, meeting business expectations and serving as a management escalation point during major issues. Expertise in automating infrastructure using Terraform, implementing CI/CD pipelines with Git, GitHub Action & Jenkins CICD pipelines and maintaining Helm charts for application deployment. Implemented Kubernetes Tanzu to optimize container orchestration, ensuring the security and availability of Microservices. Experienced in Infrastructure as Code (IaC) using Ansible, with hands-on experience in AWS, GCP, and Azure Cloud platforms, with a strong background in server hardening, networking, and troubleshooting. Skilled in disaster recovery planning, system administration, automation, and performance tuning in Unix environments. Designed & implemented disaster recovery plans, ensuring business continuity and data integrity in high-pressure environments. Led a diverse team of global application reliability, infrastructure, and operations engineers; delivering effective talent management practices; fostered a continuous learning culture.
DevOps and SRE Engineer Manager
V4YOU TechnologiesSr. DevOps Engineer & Release Engineer
Intelly Labs Private LimitedServer Administrator & DevOps Engineer
IDS Logic Pvt. Ltd.IT Executive
Ryddx Pharmetry (P) LtdSystem Administrator
Mindz Technology
Terraform

Git

Helm Charts

Kubernetes

Ansible

AWS

GCP
Azure

GitHub Action
.png)
Docker

Docker-Compose

Helm

Prometheus
.jpg)
Grafana

Loki

Zabbix

Cloud Watch

Vercel

ArgoCD

Nginx

HA-proxy

IIS

SQL

NoSQL

SonarQube

ElasticSearch

Varnish

VPN

Proxmox

VMware

Hyper-V

Vagrant

VirtualBox
Hi, my name is Fadek Khan, and I'm working as a DevOps engineer and SRE manager for 7 years. Along with that, I've worked with multiple organizations where I held roles as a senior server administrator and a marketing engineer as well. I have experience in Kubernetes, where I've worked closely with GKE and EKS, along with Kubernetes. I have expertise in monitoring and deploying applications in both containerized and non-containerized ways. I'm very proficient in scaling up infrastructure using Terraform and other tools like Calliope in Python. Additionally, I have experience in GitOps, where I advised on administering Git and managing node DevOps practices, along with monitoring. I also have experience in databases, where I've provided administration for both SQL and non-SQL. Thank you. So, Helm is a package manager that helps manage your Kubernetes application. You can populate the configuration using values and install it on any environment you want. We just need to change the input variables using the helm command. The best benefit is that we don't have to work with manifests again and again, and helm will take care of and roll back if there's an issue. Other than that, we can also utilize Helm in GitOps practices using ROCD, where every single component of Helm will be managed by ROCD itself.
So if you want to attach a storage class to a stateless application and if you are using GKE or EKS, we do get the option to utilize services like the assistant volume in GCP and your EFS and EBS services in GKE and EKS itself, where you can attach the volumes using a volume claim and attach volumes as a disk to the pod itself. The moment any pod dies and it can spin up, the moment any new pod gets spun up, the remaining data will remain in the same process in this, and it will be attached to the newer pod, which will be available for the deployment itself. Other than that, it is also possible to attach port I mean, the persistent storage at a runtime in the pod itself. First, we have to claim the storage, and then we have to attach it as a PVC.
So, we can use horizontal pod scaling to scale up the environment if the traffic if the defined threshold gets crossed. We can use metrics over there, and we have to define a manifest for the deployment which will work at the level of selectors. Let's say if the deployment's having the selector label as application 1, we will be going to define HPA with the API. The kind will be horizontal for scaling, the name, the selector, and then the metrics. We can define the metrics as per CPU level and as per RAM level. Other than that, we can also define the capacity, how much we want. The port should begin to scale up with the number of replica sets. So anytime anything happens, let's say the threshold that crosses the defined threshold, it will scale up the deployment itself.
So we can use blue-green deployment to deploy the application. A new deployment takes place in a grouping manner every time. We have to update the DNS to make it happen. Once the deployment is successfully done, what we can do is deploy the application. First, we have to deploy the application, and once that application is deployed, we can update the DNS. Once we've tested everything, it's running fine.
So what are the strategies you would imply to ensure zero downtime during the admin transition? So zero downtime is nothing but just a practice where we deploy the application and transfer the data and transfer the traffic to the newer version. So zero time, we can create the same deployment set and same application deployment over the AKS side and point the DNS entries over there. Once the DNS is pointed, the application will be running from AKS itself. We will keep the Tanzu application running until we verify that everything is fine or not.
So when we set up a Kubernetes pipeline, we have to be sure about which application we are deploying, whether it will be using Helm or a manifest, and whether GitOps operations are involved if they are. If the GitOps operations are involved, we need to decide which application we will use, whether it will be Spinach, Argo CD, GitHub Actions itself, or Jenkins. Other than that, we also have to look at the deployment replica set and the storage set. If the application is being deployed and there has been an API decommission on the Kubernetes upgrade side, we need to make conditions that if the cluster version is this, we will install this version of the API, and if the cluster version is that, we will install that version of the API. So, while deploying the application, we also have to make sure that the current stable version is running absolutely fine after doing smooth tests. We also have to ensure that the end charts are properly running. Then after that, we can proceed with the deployment.
I'm unsure about this consideration.
Kubernetes relies heavily on a stable environment. It can be containerized for the prior environment, 21. It was running on Docker, and then after that, it started replacing the Docker mechanism from the cluster itself. Now, they are running containerd as a default and type it. And, the deployment is getting managed by the deployment, which is getting managed by a API server proxy, basically, which sends all the inputs and outputs to the API server. The scheduler is responsible for deploying the application on the side of the node, and the API server is responsible for managing and replacing the current deployments. An etcd data store is there to contain the name and key of every deployment that has taken place within the cluster itself.
So service mesh is implementing. It will give you more control on the service side where you can control the entire traffic flow and the network, basically, where the request you want to send. It also gives you the entire network diagram like Kaldi as a dashboard if we are using a SKU. And then, other than that, it basically works with service discovery. As long as service discovery is working, the system will be kept running, and the SEO will be sending the data on the port side only after the service gets successfully initialized. So the best advantage of using link network mesh technology is that it allows you to fully control the network. And you can describe that if the request is coming from a particular resource, so you can block it or allow it for a particular service. These are the best practices and the features that SEO can provide for network mesh.