profile-pic
Vetted Talent

Rishabh Khandelwal

Vetted Talent
Having experience of 3+ years, skilled in different tools and technologies used in today's world for agile development. Worked as SRE in DevOps, HPC, Cloud Computing, Data-Protection, CI/CD Workflows and still willing to learn more.
  • Role

    Software Engineer

  • Years of Experience

    6 years

Skillsets

  • AWS
  • Grafana
  • Hyper-V
  • Jenkins
  • Livy
  • MLFlow
  • OpenShift
  • Oracle virtualbox
  • Prometheus
  • Rancher
  • Terraform
  • Zabbix
  • Kubeflow
  • GitLab CI/CD
  • Azure
  • Elasticsearch
  • ELK Stack
  • GCP
  • Kasten K10
  • Kibana
  • Linux
  • Logstash
  • RHEL
  • Spark
  • Ubuntu
  • Veeam
  • Docker
  • Docker - 5 Years
  • Kubernetes - 5 Years
  • Kubernetes - 4 Years
  • Python - 5 Years
  • Github - 3 Years
  • GitLab
  • VMware vSphere
  • Windows Server
  • Python - 5 Years
  • Docker
  • Kubernetes
  • Python
  • Docker - 5 Years
  • Kubernetes
  • Python
  • Docker
  • Kubernetes
  • Kubernetes
  • ArgoCD
  • AWS CloudFormation
  • Bash
  • C
  • C++
  • Git
  • GitHub Actions

Vetted For

8Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    DevOps Engineer (Remote)AI Screening
  • 44%
    icon-arrow-down
  • Skills assessed :AWS Certified DevOps Engineer, Certified Kubernetes Administrator, financial applications, RabbitMQ, Teraform, AWS, GCP, Kubernetes
  • Score: 40/90

Professional Summary

6Years
  • Jul, 2025 - Present 11 months

    Senior System Engineer

    Zensar Technologies
  • Jun, 2021 - Jul, 20254 yr 1 month

    Software Engineer

    SAN Data Systems
  • Sep, 2020 - Apr, 2021 7 months

    Jr. DevOps Engineer

    HOPS HEALTHCARE
  • May, 2020 - Jul, 2020 2 months

    DevOps Assembly Lines

    LinuxWorld Informatics Pvt Ltd
  • Jul, 2020 - Sep, 2020 2 months

    Flutter Training

    LinuxWorld Informatics Pvt Ltd
  • Jul, 2020 - Sep, 2020 2 months

    Ansible

    LinuxWorld Informatics Pvt Ltd
  • May, 2019 - Jun, 2019 1 month

    Electrical Engineering Intern

    National Engineering Industries Ltd. (NBC Bearings)

Applications & Tools Known

  • icon-tool

    Git

  • icon-tool

    Python

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Google Cloud Platform

  • icon-tool

    Azure

  • icon-tool

    Azure Active Directory

  • icon-tool

    Terrafrom

  • icon-tool

    Jenkins

  • icon-tool

    Helm

  • icon-tool

    Spinnaker

  • icon-tool

    Zabbix

  • icon-tool

    Terraform

  • icon-tool

    Ansible

  • icon-tool

    Veeam

  • icon-tool

    Github

  • icon-tool

    Rancher

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    OpenStack

  • icon-tool

    Ubuntu

  • icon-tool

    CentOS

  • icon-tool

    Windows

  • icon-tool

    Tomcat

  • icon-tool

    Nginx

  • icon-tool

    ArgoCD

  • icon-tool

    Hyper-V

  • icon-tool

    VMware ESXi

  • icon-tool

    vSAN

  • icon-tool

    Terraform

  • icon-tool

    ELK Stack

  • icon-tool

    Prometheus

  • icon-tool

    Grafana

Work History

6Years

Senior System Engineer

Zensar Technologies
Jul, 2025 - Present 11 months

Software Engineer

SAN Data Systems
Jun, 2021 - Jul, 20254 yr 1 month
    Automated the deployment and lifecycle management (LCM) operations of Azure Local (Formerly Microsoft Azure Stack HCI) for Arc enabled VMs, AKS Clusters, SQL-MI and AVD using Python with REST APIs. Reducing manual setup, time and improving consistency. Run build to build tests in HPE EZAI Essentials for continuous improvement. Work with developers in troubleshooting system and error log analysis to test new features in the MLOps tools to reduce QA efforts and build faster. Design POC solution for hybrid HPC Cloud Bursting solution, enabling the seamless migration of high- computing workloads from on-premise data centers to AWS and GCP, ensuring private connectivity and storage redundancy. Architected and deployed scalable data storage solutions using WEKA and Scality Clusters, facilitating efficient data hydration between on-prem infrastructure and AWS cloud storage. Responsible for designing POC solution for HPE ProLiant DL380 Gen10 servers in Google Anthos. Provide technical support to manage on-prem VMware vSphere infrastructure for both production and development environments, optimizing performance and resource allocation. Perform analysis and test functionalities in Hitachi Virtual Storage as a Service (vSTaaS), generate test reports based on outcomes. Provided Quality Assurance (QA) and functionality testing support for the development of Hitachi Kubernetes Service (HKS), identifying and resolving bugs to improve product stability. Design and test comprehensive Data Protection as a Service (DPaaS) offering utilizing Veeam and Kasten K10 for Kubernetes, ensuring robust backup and disaster recovery for critical workloads.

Jr. DevOps Engineer

HOPS HEALTHCARE
Sep, 2020 - Apr, 2021 7 months
    Streamlined software delivery by managing daily product deployments to QA and Production servers, utilizing CI/CD principles which improved release frequency by 40%. Led the migration of a monolithic Docker application to a microservices architecture on Kubernetes within on-premise and AWS infrastructures, enhancing scalability and system resilience. Drive and established a centralized server monitoring system, providing real-time metrics and alerting for over 100 servers to ensure 99.9% uptime for production application. Deploy and configured an Intrusion System (IPS/IDS) to monitor network traffic and generate security reports, improving the security posture of live production servers.

Ansible

LinuxWorld Informatics Pvt Ltd
Jul, 2020 - Sep, 2020 2 months

Flutter Training

LinuxWorld Informatics Pvt Ltd
Jul, 2020 - Sep, 2020 2 months

DevOps Assembly Lines

LinuxWorld Informatics Pvt Ltd
May, 2020 - Jul, 2020 2 months

Electrical Engineering Intern

National Engineering Industries Ltd. (NBC Bearings)
May, 2019 - Jun, 2019 1 month

Achievements

  • Google Cloud Skills Boost
  • Google Cloud Skills Boost https://www.cloudskillsboost.google/public_profiles/d6ceb27c-6740-4d47-a965-046efe7b0804

Major Projects

5Projects

Automated Workload Provisioning and Lifecycle Management Operations

    Created python scripts to automate the provisioning of workloads and lifecycle management operations using REST API for Azure Local workloads: Virtual Machines, AKS Clusters, SQL-MI and AVD session hosts.

Hybrid Cloud Bursting for High-Performance Computing (HPC)

    Architected a managed services solution for bursting HPC virtual machine workloads from on-premises environments to AWS/GCP during peak demand, achieving zero downtime for critical business operations.

Data Protection as a Service (DPaaS)

    Designed architecture and POC for physical, virtual and multi-cloud infrastructure enterprise data for backups and Disaster Recovery. Tested comprehensive offering utilizing Veeam and Kasten K10 for Kubernetes, ensuring robust backup and disaster recovery for critical workloads.

Centralized Monitoring and Alerting System

    Implemented a central metrics monitoring and alerting system using Zabbix for physical & virtual Linux and Windows Servers to continuously monitor application performance and resources utilization.

MLOps Pipeline for Automated Model Training

    Created an end-to-end MLOps pipeline using Jenkins and Git to automate the training, validation, and deployment of a CNN machine learning model, ensuring consistent and reproducible results.

Education

  • Bachelor of Engineering, Electronics and Electrical Engineering

    M.B.M. Engineering College (2020)

Certifications

  • AWS

    Amazon Web Services (Dec, 2021)

    Credential URL : Click here to view
  • Linux

    CKAD - CNCF (Jan, 2022)

    Credential URL : Click here to view
  • Google

    CKA - CNCF (Sep, 2021)

    Credential URL : Click here to view
  • Aws cloud practitioner

  • Microsoft az-900

  • Kcna

  • Expertise in docker

  • Openshift applications do101

  • Aviatrix multi-cloud associate

  • Cka

Interests

  • Badminton
  • Games
  • Watching Movies
  • AI-interview Questions & Answers

    For straightforward services, we use a persistent storage system and create a storage class in Kubernetes, which will persist all the persistent data in Kubernetes and maintain data consistency. The basic requirement here is defining a storage class and setting up the persistent volumes and behind the persistent volume claims for every or all the required services, which are required for the persistent data.

    So while migrating from ECS to GKE, first, we can ensure that the stateful application is always up. And the consideration here is that if we have some virtual machines running in ACS, and during migration, not all machines can be stopped at the same time to provide a sudden downtime in the application, which will lead to downtime for the application. So we can just decrease the number of instances, or compute nodes, gradually, and while increasing the number of nodes in GKE, so that when the application goes down in ECS, the same parts and the same application will be starting up in the GKE console side, so there is no downtime in the application during this migration.

    Container resource limits. Here, we define the container's resource limits for memory and CPUs that the container is using. So, some using sidecar ports or some sidecar ports, we can continuously monitor the resource limits in an application, which will predict or define the uses of the current container. And based on this data, you can set resource limits for that container so that we can get optimum performance based on cost and which can lead to cost savings as well.

    The message processing in the rabbit time queue, we can ensure that the data is incoming on one side. And while accessing it, the data should be properly accessible to the customer, and it's accessible in parallel to multiple users. This allows for different parallelism setups, which can reduce latency.

    Setting up those auto scaling in GCP, we can grab the data and the metrics to monitor CPU and RAM utilization. And according to those utilization, the load balancing methods can be implemented. And, CPU, RAM, and disk storage are the three basic things where the metrics can be obtained, and the rescaling can be adjusted based on it.

    The second task in that is debugging and variable engine installation is not, which tends to be a potential failure, and can lead to the failure of the complete playbook code.

    So in this container, I can see that the memory is limited to 512 megabytes, and the CPU is limited to 200 or 2 CPUs. So once the request or once the utilization of this port exceeds these limits, then it can lead to potential failure of this port. And because beyond that, if this container is utilizing some higher resources beyond these limits, then it will tend to failure of the board.

    For interservice communication in STO, the first thing to consider is that all the services need to be of the cluster IP service type. There is no need to use the node port or any other load balancer service type while using Steel. And the second thing would be to set up the NGINX ingress endpoint, so that it points to only one endpoint.