Designed, deployed, and managed multi-cloud infrastructure across Microsoft Azure, Amazon Web Services, Google Cloud Platform, and Alibaba Cloud, including virtual machines, networking, storage, security, and Kubernetes clusters. Planned and executed Kubernetes cluster upgrades (AKS, GKE, EKS) manually and via Terraform, along with in-place upgrades of dependent tools and add-ons, ensuring minimal downtime and platform stability. Implemented and maintained Infrastructure as Code (IaC) using Terraform and Ansible to standardize provisioning, upgrades, and decommissioning workflows across environments. Performed SSL/TLS certificate renewals, Service Principal (SPN) / credential rotations, and security hardening activities to ensure compliance and uninterrupted service access. Led cloud resource lifecycle management, including VM, storage, and service decommissioning, cleanup of unused resources, and subscription hygiene to improve cost efficiency and governance. Automated operational tasks such as Jira ticket creation, Kubernetes pod log collection, and health checks using Python, reducing manual effort by ~60%. Managed and optimized CI/CD pipelines using Azure DevOps, Jenkins, and GitHub Actions, supporting application teams with build, release, and deployment troubleshooting. Designed and maintained monitoring and alerting frameworks using Azure Monitor, Datadog, Grafana, Dynatrace, Wormly, and Prisma Cloud, improving incident detection and reducing MTTR. Integrated and managed secrets, storage, and data services including Azure Key Vault, Vault, Portworx, Redis, MinIO, and AWS Secrets Manager for secure and scalable workloads. Supported identity and access management (IAM) across cloud platforms by enforcing RBAC, MFA, and secure access policies, and assisting with cross-cloud authentication integrations. Configured high-availability and scalability components such as load balancers, application gateways, availability sets, auto-scaling groups, and Kubernetes scaling policies. Conducted disaster recovery planning, failover testing, and upgrade validations to ensure business continuity for critical workloads. Participated in 24x7 on-call rotations, handled incidents and service requests via Jira, CMP, and ServiceNow, and led root cause analysis (RCA) while meeting strict SLAs/SLOs. Delivered monthly operational and uptime reviews, documented runbooks and SOPs in Confluence, and ensured smooth handovers and efficient incident triaging across shifts.