profile-pic
Vetted Talent

Rottela Sudheer Kumar

Vetted Talent

Completed post-graduation in Master of Computer Applications (MCA). I used to listen motivational songs, watch mind blowing movies, read self-help and spiritual books. I'm a mentor for new aspirants and help in their career growth. Received 2nd rank during intermediate and achieved gold medal during graduation. I also used to write poems and received 2nd prize in Infosys poem competition.

Received customer delight award, game changer award, and many appreciations during

career journey.


Worked as a Site Reliability Engineer at Infosys BPM. The following are my key roles and responsibilities:


- Implementing practices like SLA's for accessibility, availability, and response time of systems/services to make them reliable.

- Creating automated processes from operational aspects by calculating and evaluating whether the systems/services are with in SLA or not.

- Monitoring and logging to measure performance of systems/services and detect issues before or at early stages.

- Providing on-call support to find the improvements required in existing systems.

- Finding Root Cause Analysis (RCA's) while detecting the issues and provide additional protection to systems.

- Documenting post incident reviews after issue or after outage for future reference.

- Working towards same principles/goals of DevOps for fast releases while allowing fast changes.

- Providing KT's for new joiners and guide them with domain specific technologies.

  • Role

    AWS DevOps Engineer (SRE)

  • Years of Experience

    5 years

Skillsets

  • Bash
  • ServiceNow
  • ELK Stack
  • Splunk
  • Unix
  • Puppet
  • Prometheus
  • Nagios
  • Grafana
  • Jira
  • Ansible - 1 Years
  • AWS
  • MySQL
  • Python - 2 Years
  • Kubernetes - 2 Years
  • Jenkins - 2 Years
  • Git - 2 Years
  • Docker - 1 Years
  • Terraform - 1 Years

Vetted For

26Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Software Engineer - Site Reliability(Remote)AI Screening
  • 28%
    icon-arrow-down
  • Skills assessed :Chef, gitlabci, OpenShift, PagerDuty, Pingdom, Puppet, Salt, smashtest, TravisCI, twelve factor development, Agile Methodology, Ansible, CircleCI, Infrastructure as Code (IaC), NewRelic, Terraform, C#, Cloud Server (Google / AWS), Docker, Git, JavaScript, Jenkins, Kubernetes, PHP, Python, Ruby
  • Score: 25/90

Professional Summary

5Years
  • Dec, 2023 - Present1 yr 9 months

    AWS DevOps Engineer (SRE)

    JPMC (Payroll Company: Snapminds)
  • Mar, 2019 - Nov, 20234 yr 8 months

    Technical Specialist (SRE)

    InfosysBPM

Applications & Tools Known

  • icon-tool

    Chef

  • icon-tool

    Nagios

  • icon-tool

    Prometheus

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    Ansible

  • icon-tool

    Terraform

  • icon-tool

    Puppet

  • icon-tool

    Git

  • icon-tool

    GitHub

  • icon-tool

    Jenkins

  • icon-tool

    ServiceNow

  • icon-tool

    Jira

  • icon-tool

    MySQL

  • icon-tool

    Slack

Work History

5Years

AWS DevOps Engineer (SRE)

JPMC (Payroll Company: Snapminds)
Dec, 2023 - Present1 yr 9 months
    Monitored and troubleshooted production issues, identified root causes, and implemented solutions to prevent recurrence, ensuring high system availability. Worked closely with cross-functional teams to identify and resolve performance bottlenecks, leveraging strong analytical and problem-solving skills. Developed and maintained comprehensive system documentation, including runbooks, standard operating procedures, and system diagrams. Participated in on-call rotations to provide 24/7 support for production systems, ensuring minimal downtime. Stayed updated with the latest advancements in Site Reliability Engineering, applying innovative approaches to maintain a competitive edge.

Technical Specialist (SRE)

InfosysBPM
Mar, 2019 - Nov, 20234 yr 8 months
    Leveraged cloud platforms such as AWS and Azure to deploy and manage scalable applications. Utilized automation tools like Chef to streamline operations and enhance system efficiency. Demonstrated strong understanding of Linux operating systems and command-line tools to manage and troubleshoot systems. Implemented and managed monitoring and logging solutions using tools like Nagios, Prometheus, and ELK stack to ensure system health and performance. Collaborated with development teams to optimize system performance and scalability, contributing to overall system reliability.

Achievements

  • Designed and implemented a comprehensive monitoring system using Prometheus and Grafana.
  • Automated alerting processes, reducing incident response time by 30%.
  • Integrated monitoring solutions with Slack for real-time notifications.
  • Conducted training sessions for team members on using the new monitoring tools.
  • Optimized Kubernetes deployments to improve scalability and reliability.
  • Reduced deployment times by 50% and increased system uptime by 20%.
  • Collaborated with development teams to ensure smooth rollouts and minimal disruptions.
  • Developed an automated backup solution using AWS Backup and Python scripts.
  • Ensured data integrity and compliance with company policies.
  • Conducted regular disaster recovery drills, reducing recovery times by 40%.
  • Documented all backup processes and trained junior engineers.

Major Projects

3Projects

Enterprise Monitoring System Implementation

Dec, 2023 - Present1 yr 9 months
    Designed and implemented a comprehensive monitoring system using Prometheus and Grafana. Automated alerting processes, reducing incident response time by 30%. Integrated monitoring solutions with Slack for real-time notifications. Conducted training sessions for team members on using the new monitoring tools.

Kubernetes Deployment Optimization

Mar, 2019 - Nov, 20234 yr 8 months
    Optimized Kubernetes deployments to improve scalability and reliability. Reduced deployment times by 50% and increased system uptime by 20%. Collaborated with development teams to ensure smooth rollouts and minimal disruptions.

Automated Backup and Recovery Solution

Mar, 2019 - Nov, 20234 yr 8 months
    Developed an automated backup solution using AWS Backup and Python scripts. Ensured data integrity and compliance with company policies. Conducted regular disaster recovery drills, reducing recovery times by 40%. Documented all backup processes and trained junior engineers.

Education

  • Master in Computer Applications (MCA)

    Sri Krishnadevaraya University (2019)

Certifications

  • Customer Delight Award

    Infosys BPM
  • Customer Delight Award

    Infosys BPM
  • Insta Award

    Infosys BPM

AI-interview Questions & Answers

Can you help me understand more about background by giving a brief introduction. Yeah. Uh, myself, Sudhir Kumar, uh, I have 5 years of experience as a site reliability engineer. I've worked, uh, on the media domain and the finance domain. Currently, I'm working on finance domain. We've support, uh, in a 247 model. My core, uh, responsibilities are, like, uh, monitoring, uh, finding RCA, that means root cause analysis and on call support and, uh, um, deploying the, uh, sprints. Uh, and and the main technologies that we use were, uh, Jenkins, Ansible, Terraform, and Kubernetes, Docker, and, uh, Python scripting and uh, Linux, all these things. Thank you.

How would you refactor a new monolithic application to microservices while construction, the ground factor development. No. I don't have idea on this 1.

Given a hydropic ecommerce application, how would you leverage auto scaling in AWS to maintain high availability? Uh, for an ecommerce application, we have to, uh, uh, here, as per the question, they're asking, how would you realize auto scaling in AWS to maintain high availability? So auto scaling, uh, for easy 2 instances, we'll have to maintain, uh, uh, 2 availability zones. We'll deploy in 1 availability zone, uh, and we will maintain the other availability zone as a as a backup or something like that. If 1 availability zone goes down, we'll have to, uh, up we'll have to make the other availability zone up so that we can provide high availability. This is what my understanding. Thank you.

What is the role of a liveness probe in Kubernetes, and how can it aid in self help healing deployments? No. I don't have a lot.

Demonstrate how you would improve it by this application's performance by applying the 12 factor development principle. No. I don't have

When working with Google Cloud, how do you ensure that the infrastructure is scalable and follows the ISV practices? We have to ensure, uh, we have do we ensure that the infrastructure is scalable and follows the infrastructure as code practices? You for this, we can use Terraform. We can ensure by use we we can ensure, uh, infrastructure things, and we can scale using Terraform application. That that that's it. I can tell. Thank you.

Analyze this JavaScript function that is meant to return an array of strings split by commas? Why might it not work as expected for all inputs. Here, we see that, uh, there is 1. A string is missing in the input. Like, apple is there, banana is there, orange is there. After that, uh, 1 string is missing, so, uh, it might not work as expected for all inputs.

Given this Python function intended for modern services, can you explain why it, uh, might fail to print services up even when the service is actually running. Okay. Import request. Accept request. Accept. No. Not sure. Uh, how can I tell? Here, we're calling example.com. And in the try section, it is calling. But, uh, after that try, accept is there. Fine. Not sure why it will queue some, uh, fail fail, uh, I I'm not sure.

Not sure.

I don't have idea.

Not sure.