profile-pic
Vetted Talent

Fateh Khan

Vetted Talent

Fateh Khan Demonstrates proficiency in a wide array of technologies including Terraform, Git, Helm Charts, and Kubernetes, among others, in the role of DevOps and SRE Engineer Manager at V4You Technologies.  Showcasing exceptional performance as an Infrastructure SRE, delivering reliable 24x7 infrastructure and application operations, meeting business expectations and serving as a management escalation point during major issues. Expertise in automating infrastructure using Terraform, implementing CI/CD pipelines with Git, GitHub Action & Jenkins CICD pipelines and maintaining Helm charts for application deployment. Implemented Kubernetes Tanzu to optimize container orchestration, ensuring the security and availability of Microservices. Experienced in Infrastructure as Code (IaC) using Ansible, with hands-on experience in AWS, GCP, and Azure Cloud platforms, with a strong background in server hardening, networking, and troubleshooting. Skilled in disaster recovery planning, system administration, automation, and performance tuning in Unix environments. Designed & implemented disaster recovery plans, ensuring business continuity and data integrity in high-pressure environments. Led a diverse team of global application reliability, infrastructure, and operations engineers; delivering effective talent management practices; fostered a continuous learning culture.

  • Role

    Senior DevOps Engineer

  • Years of Experience

    7 years

Skillsets

  • DevOps - 6 Years
  • Kubernetes - 6 Years
  • Docker - 6 Years
  • AWS - 4 Years
  • CI/CD - 5 Years
  • GKE - 3 Years
  • Ansible
  • Terraform
  • Google Cloud
  • Monitoring
  • Kubernetes
  • AWS
  • Git
  • Jenkins
  • Linux
  • Kubernetes

Vetted For

14Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Kubernetes Support Engineer (Remote)AI Screening
  • 50%
    icon-arrow-down
  • Skills assessed :Ci/Cd Pipelines, Excellent problem-solving skills, Kubernetes architecture, Strong communication skills, Ansible, Azure Kubernetes Service, Grafana, Prometheus, Tanzu, Tanzu Kubernetes Grid, Terraform, Azure, Docker, Kubernetes
  • Score: 45/90

Professional Summary

7Years
  • Mar, 2024 - Present2 yr 2 months

    DevOps and SRE Engineer Manager

    V4YOU Technologies
  • Aug, 2021 - Feb, 2022 6 months

    Sr. DevOps Engineer & Release Engineer

    Intelly Labs Private Limited
  • Jan, 2020 - Jul, 20211 yr 6 months

    Server Administrator & DevOps Engineer

    IDS Logic Pvt. Ltd.
  • Aug, 2017 - Sep, 20181 yr 1 month

    IT Executive

    Ryddx Pharmetry (P) Ltd
  • Sep, 2018 - Jan, 20201 yr 4 months

    System Administrator

    Mindz Technology

Applications & Tools Known

  • icon-tool

    Terraform

  • icon-tool

    Git

  • icon-tool

    Helm Charts

  • icon-tool

    Kubernetes

  • icon-tool

    Ansible

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    Azure

  • icon-tool

    GitHub Action

  • icon-tool

    Docker

  • icon-tool

    Docker-Compose

  • icon-tool

    Helm

  • icon-tool

    Prometheus

  • icon-tool

    Grafana

  • icon-tool

    Loki

  • icon-tool

    Zabbix

  • icon-tool

    Cloud Watch

  • icon-tool

    Vercel

  • icon-tool

    ArgoCD

  • icon-tool

    Nginx

  • icon-tool

    HA-proxy

  • icon-tool

    IIS

  • icon-tool

    SQL

  • icon-tool

    NoSQL

  • icon-tool

    SonarQube

  • icon-tool

    ElasticSearch

  • icon-tool

    Varnish

  • icon-tool

    VPN

  • icon-tool

    Proxmox

  • icon-tool

    VMware

  • icon-tool

    Hyper-V

  • icon-tool

    Vagrant

  • icon-tool

    VirtualBox

Work History

7Years

DevOps and SRE Engineer Manager

V4YOU Technologies
Mar, 2024 - Present2 yr 2 months
    • Designing, implementing & maintaining systems and infrastructure to ensure high reliability, availability, and performance.
    • Developing & deploying automation tools and frameworks to automate repetitive tasks and streamline operations.
    • Setting up and managing monitoring systems to track the health and performance of services; configuring alerts and escalations to quickly respond to incidents and minimize downtime.
    • Leading incident response and post-mortem analysis to identify root causes of outages and implement preventive measures.
    • Conducting capacity planning and performance tuning to ensure systems can handle current and future loads.
    • Scaling infrastructure as needed to accommodate growth.
    • Implementing infrastructure as code (IaC) practices using tools like Terraform, Ansible, or Chef to provision and manage infrastructure in a consistent and repeatable manner.
    • Collaborating with development teams to automate deployment processes and implement continuous integration and continuous deployment (CI/CD) pipelines.
    • Delivering day-to-day backup and recovery support activities including server availability & administrative processes.
    • Implementing security best practices to protect systems and data; performing security audits and vulnerability assessments.
    • Working closely with cross-functional teams including developers, system administrators, and product managers to understand requirements, prioritize tasks, and deliver solutions.
    • Building, managing, and improving the build infrastructure for global software development engineering teams including implementation of build scripts, continuous integration infrastructure and deployment tools.
    • Managing Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins.
    • Leading and mentoring a team of 9 members; guiding them to perform better.

Sr. DevOps Engineer & Release Engineer

Intelly Labs Private Limited
Aug, 2021 - Feb, 2022 6 months
    • Administered monitoring and alerting systems for smooth maintenance of engineering environments.
    • Ensured quick restoration of services during outage through strong troubleshooting skills.
    • Responded to alerts and perform root cause analysis; validated changes within specified maintenance windows.

Server Administrator & DevOps Engineer

IDS Logic Pvt. Ltd.
Jan, 2020 - Jul, 20211 yr 6 months
    • Created freestyle & pipeline CI Jenkins projects to deploy applications like Node-JS, PHP.
    • Configured LAMP, LEMP Server at Ubuntu & Centos Servers using Ansible.
    • Designed Bash scripts to take backup of Production and Staging servers.
    • Configured Ubuntu systems using Bash scripts; hosted Web Application on testing server and production server.
    • Used Docker to provide ready to build environment for dev team.
    • Administered Windows 10 & Server 2012, IIS, DHCP, DNS Active Directory; resolved error in the windows environment.
    • Deployed code of ASP.net at Staging and Production server.
    • Delivered server support to web hosting clients both (foreign clients & domestic clients); configured Linux Testing servers.
    • Worked with Apache, Nginx, Mysql, and UFW Firewall and Server panels like Plesk, Cpanel.
    • Managed company's internal SonicWALL Firewall; configured VPN, mapped ports, and applied network policies. Resolved
    • Network related issue like Duplicate IP Addresses, IP Address Exhaustion, DNS Problems, so on.

System Administrator

Mindz Technology
Sep, 2018 - Jan, 20201 yr 4 months
    • Worked on Git, Docker, Jenkins, GitLab and maintained Version Control; deployed Jenkins freestyle project.
    • Configured Windows 7/8/10/ Server 2012, Windows Server for SQL, DHCP, and AD.
    • Applied policy on Domain Users; managed active directory users; reset password and assigned Shared drive in users account.
    • Installed & configuring user environment variable Windows, Linux; Web servers like LAMP and XAMPP; Software.
    • Hosted Web Application on testing server and production server; obtained backup of SQL server, restoring, created databases.
    • Supported Fortigate 60e Firewall; assigned IP addresses to users by setting up policy on IP; created policy on firewalls.

IT Executive

Ryddx Pharmetry (P) Ltd
Aug, 2017 - Sep, 20181 yr 1 month
    • Maintained all systems up to date; installed and upgraded software; resolved computer and user issues.
    • Assembled & disassembled desktop and laptop; configured mail in Thunderbird, Outlook, Apple Mail; set up ESSL Biometric.

Achievements

  • Successfully delivered projects for Oyo Japan, Dunzo and Cars24.
  • Awarded as Best Employee for 3 months.
  • Implemented cost optimization strategies that led to annual savings exceeding $20,000 for two clients.
  • Leading companys three major project at once with best practices and approach.
  • Implemented comprehensive monitoring solution using Grafana, Prometheus, Data Dog for V4You Technologies clients.
  • Optimized CI/CD pipelines, reducing deployment time and increasing overall efficiency.
  • Introduced automated server provisioning, implemented robust DR plans and conducted security audits.

Major Projects

5Projects

OYO Japan Tabist

Mar, 2024 - Present2 yr 2 months
    Utilized Terraform for managing and provisioning AWS infrastructure resources. Created and maintained infrastructure components such as Lambda functions, DNS configurations, EKS clusters, RDS databases, AWS Secrets Manager, Parameter Store, ECR repositories, EC2 instances, and OIDC configurations. Implemented CI/CD pipelines using GitHub Actions for automating build, test, and deployment processes; GitOps practices using ArgoCD for Kubernetes and Git repositories.

Digiboxx

Jan, 2023 - Jan, 20241 yr
    • Managed data center operations to ensure continuous system availability.
    • Migrated the entire application from Bare Metal VM to VMware Tanzu.
    • Implemented Kubernetes, Helm, and Argo CD from scratch to streamline application management.
    • Set up Grafana, Prometheus, and Loki for comprehensive monitoring of the entire application.
    • Converted freestyle Jenkins jobs into Pipelines, enabling seamless deployment without reliance on the DevOps or SRE team.
    • Set up a new Mini-O cluster on bare metal infrastructure to expand current storage capabilities.
    • Implemented diverse backup solutions; deployed backup systems using software options, Proxmox Backup, and Acronis. Collaborated
    • With Developers and SecOps team and kept companys other products updated with all the possible practices

Cars24

ANNEXURE
Jan, 2022 - Jan, 20231 yr
    • Collaborated with Google cloud and migrated service from AWS to GCP.
    • Created pipelines for GKE & ECS and deployed EKS using Jenkins & TeamCity.
    • Optimized cost for GCP and AWS.
    • Set up services: RDS, Cloud Run, Cloud Function, Coud build, Vault, LB with templates, Cloud Scheduler, Pub/Sub, EnventArc.
    • Upgraded current cluster to latest Kubernetes version.
    • Created dashboard and monitored on Datadog.

Go Empyrean

Jan, 2022 - Dec, 2022 11 months
    • Managed companys on prem infra using Proxmox cluster.
    • Added node to the cluster Creating VMs, managing running application on provision VMs.
    • Configured MySQL master salve setup for the application; deployed PHP and Java applications using Jenkins.
    • Used Nginx and HA-proxy to act as load balancer and webserver.
    • Patched up and upgraded servers of Ubuntu from 14.04 to 22 version.

MSK (Memorial Sloan Kettering Cancer Center)

ANNEXURE
Jan, 2021 - Jan, 20221 yr
    • Used Kubernetes Tanzu for container orchestration and management.
    • Developed and deployed microservices using Helm charts and ArgoCD on VMware Tanzu infrastructure.
    • Scaled up and managed Kubernetes clusters in a production environment.
    • Maintained high availability, scalability, and security of microservices deployed on Kubernetes.
    • Analyzed application requirements and designed efficient deployment strategies.
    • Automated infrastructure provisioning and configuration using Infrastructure as Code (IaC) principles.
    • Monitored and optimized resource utilization within Kubernetes clusters.
    • Troubleshot issues related to containerized applications and infrastructure.
    • Managed OpenVPN server.

Education

  • Master of Computer Applications

    Vivekanand Global University (2024)
  • Bachelor of Computer Applications (BCA)

    JECRC University (2023)

AI-interview Questions & Answers

Hi, my name is Fadek Khan, and I'm working as a DevOps engineer and SRE manager for 7 years. Along with that, I've worked with multiple organizations where I held roles as a senior server administrator and a marketing engineer as well. I have experience in Kubernetes, where I've worked closely with GKE and EKS, along with Kubernetes. I have expertise in monitoring and deploying applications in both containerized and non-containerized ways. I'm very proficient in scaling up infrastructure using Terraform and other tools like Calliope in Python. Additionally, I have experience in GitOps, where I advised on administering Git and managing node DevOps practices, along with monitoring. I also have experience in databases, where I've provided administration for both SQL and non-SQL. Thank you. So, Helm is a package manager that helps manage your Kubernetes application. You can populate the configuration using values and install it on any environment you want. We just need to change the input variables using the helm command. The best benefit is that we don't have to work with manifests again and again, and helm will take care of and roll back if there's an issue. Other than that, we can also utilize Helm in GitOps practices using ROCD, where every single component of Helm will be managed by ROCD itself.

So if you want to attach a storage class to a stateless application and if you are using GKE or EKS, we do get the option to utilize services like the assistant volume in GCP and your EFS and EBS services in GKE and EKS itself, where you can attach the volumes using a volume claim and attach volumes as a disk to the pod itself. The moment any pod dies and it can spin up, the moment any new pod gets spun up, the remaining data will remain in the same process in this, and it will be attached to the newer pod, which will be available for the deployment itself. Other than that, it is also possible to attach port I mean, the persistent storage at a runtime in the pod itself. First, we have to claim the storage, and then we have to attach it as a PVC.

So, we can use horizontal pod scaling to scale up the environment if the traffic if the defined threshold gets crossed. We can use metrics over there, and we have to define a manifest for the deployment which will work at the level of selectors. Let's say if the deployment's having the selector label as application 1, we will be going to define HPA with the API. The kind will be horizontal for scaling, the name, the selector, and then the metrics. We can define the metrics as per CPU level and as per RAM level. Other than that, we can also define the capacity, how much we want. The port should begin to scale up with the number of replica sets. So anytime anything happens, let's say the threshold that crosses the defined threshold, it will scale up the deployment itself.

So we can use blue-green deployment to deploy the application. A new deployment takes place in a grouping manner every time. We have to update the DNS to make it happen. Once the deployment is successfully done, what we can do is deploy the application. First, we have to deploy the application, and once that application is deployed, we can update the DNS. Once we've tested everything, it's running fine.

So what are the strategies you would imply to ensure zero downtime during the admin transition? So zero downtime is nothing but just a practice where we deploy the application and transfer the data and transfer the traffic to the newer version. So zero time, we can create the same deployment set and same application deployment over the AKS side and point the DNS entries over there. Once the DNS is pointed, the application will be running from AKS itself. We will keep the Tanzu application running until we verify that everything is fine or not.

So when we set up a Kubernetes pipeline, we have to be sure about which application we are deploying, whether it will be using Helm or a manifest, and whether GitOps operations are involved if they are. If the GitOps operations are involved, we need to decide which application we will use, whether it will be Spinach, Argo CD, GitHub Actions itself, or Jenkins. Other than that, we also have to look at the deployment replica set and the storage set. If the application is being deployed and there has been an API decommission on the Kubernetes upgrade side, we need to make conditions that if the cluster version is this, we will install this version of the API, and if the cluster version is that, we will install that version of the API. So, while deploying the application, we also have to make sure that the current stable version is running absolutely fine after doing smooth tests. We also have to ensure that the end charts are properly running. Then after that, we can proceed with the deployment.

I'm unsure about this consideration.

Kubernetes relies heavily on a stable environment. It can be containerized for the prior environment, 21. It was running on Docker, and then after that, it started replacing the Docker mechanism from the cluster itself. Now, they are running containerd as a default and type it. And, the deployment is getting managed by the deployment, which is getting managed by a API server proxy, basically, which sends all the inputs and outputs to the API server. The scheduler is responsible for deploying the application on the side of the node, and the API server is responsible for managing and replacing the current deployments. An etcd data store is there to contain the name and key of every deployment that has taken place within the cluster itself.

So service mesh is implementing. It will give you more control on the service side where you can control the entire traffic flow and the network, basically, where the request you want to send. It also gives you the entire network diagram like Kaldi as a dashboard if we are using a SKU. And then, other than that, it basically works with service discovery. As long as service discovery is working, the system will be kept running, and the SEO will be sending the data on the port side only after the service gets successfully initialized. So the best advantage of using link network mesh technology is that it allows you to fully control the network. And you can describe that if the request is coming from a particular resource, so you can block it or allow it for a particular service. These are the best practices and the features that SEO can provide for network mesh.