SRE Manager
Cigniti TechnologiesMar, 2023 - Present2 yr 3 months
Project Title: SRE for Insulet Apps; Monitoring and Ensuring 24x7 System Health: Monitoring Insulet OMNIPOD applications and its dependent services in 24 x & model using tools Datadog, PagerDuty & AWS CloudWatch logs. Responsibilities: Led an SRE team responsible for ensuring the availability, latency, and performance of Insulet applications and their dependent services. Implemented comprehensive monitoring solutions using tools. We utilized Agile Scrum methodology to plan and execute project iterations efficiently. Gathered and analyzed metrics from app services for performance tuning and fault detection. Define, measure, and maintain SLOs and SLIs for application performance and reliability. Designed and implemented anomaly detection and synthetic monitoring for microservices within Datadog. Collaborated closely with the R&D team to swiftly resolve production issues, ensuring minimal disruption. Respond to incidents, discuss RCA, and drive improvements to prevent similar incidents in the future. Develop automation tools and scripts to eliminate manual, repetitive tasks and improve system resilience. Leveraged Python scripting to automate the creation of services, event rules, and monitors in Datadog and PagerDuty, streamlining operational processes and improving efficiency. Identify and address performance bottlenecks in software systems, databases, and network infra