profile-pic
Vetted Talent

Rishabh Khandelwal

Vetted Talent
Having experience of 3+ years, skilled in different tools and technologies used in today's world for agile development. Worked as SRE in DevOps, HPC, Cloud Computing, Data-Protection, CI/CD Workflows and still willing to learn more.
  • Role

    Software Engineer

  • Years of Experience

    4.8 years

Skillsets

  • Jenkins
  • Bash
  • C
  • C++
  • ELK Stack
  • GCP
  • Git
  • GitHub Actions
  • GitLab CI/CD
  • Grafana
  • Hyper-V
  • Azure
  • Linux
  • Livy
  • MLFlow
  • OpenShift
  • Oracle virtualbox
  • Prometheus
  • Rancher
  • Terraform
  • Zabbix
  • Spark oss
  • Kubernetes
  • Docker - 5 Years
  • Kubernetes - 5 Years
  • Kubernetes - 4 Years
  • Python - 5 Years
  • Github - 3 Years
  • GitLab
  • VMware vSphere
  • Windows Server
  • Python - 5 Years
  • Docker
  • Docker - 5 Years
  • Python
  • Docker
  • Kubernetes
  • Python
  • Docker
  • Kubernetes
  • Kubernetes
  • ArgoCD
  • AWS
  • AWS CloudFormation

Vetted For

8Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    DevOps Engineer (Remote)AI Screening
  • 44%
    icon-arrow-down
  • Skills assessed :AWS Certified DevOps Engineer, Certified Kubernetes Administrator, financial applications, RabbitMQ, Teraform, AWS, GCP, Kubernetes
  • Score: 40/90

Professional Summary

4.8Years
  • Jun, 2021 - Present4 yr 3 months

    Software Engineer

    SanData System / RedCloud Computing Pvt. Ltd.
  • Sep, 2020 - May, 2021 8 months

    Jr. DevOps Engineer / System Administrator

    SLK Techlabs Pvt. Ltd.

Applications & Tools Known

  • icon-tool

    Git

  • icon-tool

    Python

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Google Cloud Platform

  • icon-tool

    Azure

  • icon-tool

    Azure Active Directory

  • icon-tool

    Terrafrom

  • icon-tool

    Jenkins

  • icon-tool

    Helm

  • icon-tool

    Spinnaker

  • icon-tool

    Zabbix

  • icon-tool

    Terraform

  • icon-tool

    Ansible

  • icon-tool

    Veeam

  • icon-tool

    Github

  • icon-tool

    Rancher

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    OpenStack

  • icon-tool

    Ubuntu

  • icon-tool

    CentOS

  • icon-tool

    Windows

  • icon-tool

    Tomcat

  • icon-tool

    Nginx

  • icon-tool

    ArgoCD

  • icon-tool

    Hyper-V

  • icon-tool

    VMware ESXi

  • icon-tool

    vSAN

  • icon-tool

    Terraform

  • icon-tool

    ELK Stack

  • icon-tool

    Prometheus

  • icon-tool

    Grafana

Work History

4.8Years

Software Engineer

SanData System / RedCloud Computing Pvt. Ltd.
Jun, 2021 - Present4 yr 3 months
    Automated deployment and lifecycle management operations of Azure Local using Python and REST APIs. Designed and implemented solutions for hybrid HPC Cloud Bursting and scalable data storage using AWS and GCP. Provided technical support for VMware vSphere and performed QA testing for Kubernetes services and data protection solutions.

Jr. DevOps Engineer / System Administrator

SLK Techlabs Pvt. Ltd.
Sep, 2020 - May, 2021 8 months
    Managed product deployments using CI/CD principles, migrated monolithic Docker applications to microservices architecture on Kubernetes, established centralized monitoring systems, and configured IPS/IDS systems to improve server security.

Achievements

  • Google Cloud Skills Boost
  • Google Cloud Skills Boost https://www.cloudskillsboost.google/public_profiles/d6ceb27c-6740-4d47-a965-046efe7b0804

Major Projects

7Projects

Automated Workload Provisioning and Lifecycle Management Operations

    Create python scripts to automate workload provisioning and lifecycle management operations using REST APIs for Azure Local workloads.

Hybrid Cloud Bursting for High Performance Computing (HPC)

    Designed a solution for migrating HPC workloads from on-premises to AWS/GCP to achieve zero downtime.

Kubernetes Application Operations and Services (KAOPS)

    Developed a platform for deploying and managing Kubernetes applications with monitoring and AIOps on OpenShift.

Google Anthos Managed Services

    Designed a SaaS platform using Google Anthos for managing multiple Kubernetes clusters with security constraints.

Data Protection as a Service (DPaaS)

    Architected a solution for enterprise data backup and disaster recovery using Veeam and Kasten K10 for Kubernetes.

Centralized Monitoring and Alerting System

    Implemented centralized monitoring using Zabbix for physical and virtual servers to ensure performance and uptime.

MLOps Pipeline for Automated Model Training

    Built an end-to-end MLOps pipeline using Jenkins and Git to automate CNN model training and deployment.

Education

  • Bachelor of Engineering

    M.B.M. Engineering College, Jodhpur (2020)

Certifications

  • AWS

    Amazon Web Services (Dec, 2021)

    Credential URL : Click here to view
  • Linux

    CKAD - CNCF (Jan, 2022)

    Credential URL : Click here to view
  • Google

    CKA - CNCF (Sep, 2021)

    Credential URL : Click here to view
  • Aws cloud practitioner

  • Microsoft az-900

  • Kcna

  • Expertise in docker

  • Openshift applications do101

  • Aviatrix multi-cloud associate

  • Cka

Interests

  • Badminton
  • Games
  • Watching Movies
  • AI-interview Questions & Answers

    So here for straightforward services, uh, we use a persistent storage system and create a and define a storage class in Kubernetes, and which will persist all the persistent data in Kubernetes and maintaining the data consistency. So the basic requirement here is the the it's defining a storage class and, uh, setting up the persistent volumes and, uh, uh, behind the persistent volume claims for every or for all the required services, which are, uh, required for the persistent data.

    So while migrating from ECS to GKE, uh, first, when she let the we can ensure that the application the stateful application is always up, is always up. And, uh, the consideration here is that, uh, if we, uh, uh, let's suppose we have some virtual machines and are running in ACS, and, uh, during migration, all of the machines cannot be stopped at a time so that it will provide a sudden downtime in the application, which will lead to the downtime on the application. So we can tick, uh, just decrease the number of instances, the number of compute no compute nodes, uh, gradually, and while increasing the number of the nodes in GKE side so that, uh, when application goes down in e c ECS, uh, in in the meantime, uh, same parts and same parts and same application will be starting up in the, uh, g GCE console side so that there is no downtime in the application during this migration.

    Container resource limits. Uh, here, we we define the container's resource limits for the memory and the CPUs that the container is being container is using. So uh, some using some sidecar ports or some sidecar ports, uh, we can, uh, continuously monitor the resource limits in a in an application, uh, which will predict or which will not, which will define first the uses of current, uh, app current container. And based on these data, you can define how much limits resource limits can be set for that container so that, uh, we can get a optimum performance based on the cost and, uh, which which can lead to the cost saving as well.

    The message processing in rabbit time queue, uh, we can ensure that what the only in one side, the data is incoming. And, uh, while accessing it, uh, the data should be properly, uh, accessible to the customer, and it's in while accessing it, multiple users for multiple users, uh, the different parallelism can be set up so that, uh, it can be reduced in the latency.

    Setting up those auto scaling in GCP, uh, we can grab the data and the metrics is, uh, uh, where we we can monitor the CPU and the RAM utilization. Uh, and according to those utilization, the load balancing methods can be implemented. And, uh, CPU, RAM, and the disk storage, these are 3 basic things with, uh, where the metrics can be obtained and based, and the rescaling can be adjusted according to it.

    Here, the second task in that that is debug and variable engine installation is not, uh, can be a potential failure, uh, which tends to the failure of the complete playbook code.

    So in this container, I can see that the memory is limited to 512 megabytes, and the CPU is limited to 200 or 2 CPUs. So once the request or once the utilization of this port goes above these limits, then it can lead to potential failure of this port. And because beyond that, if this container is, uh, util is requiring, uh, some higher resources beyond these limits, then it will tend to failure of the board.

    For interservice communication in STO, first thing is to consider is that all the services, uh, are needed to be, uh, of the of the cluster IP service type. There is no need to use the note port or any other load balancer service type while using Steel. And, uh, the second thing would be to set up the NGINX, the ingress one ingress endpoint. So so that it will point only, uh, one endpoint.