Designed and implemented secure VPC architectures with public/private subnets, Internet Gateway, and NAT Gateway, while configuring Route Tables, Route53 DNS, and enforcing strong network security best practices to ensure highly available and well-segmented cloud environments. Automated EC2 production backups using AWS Backup and configured AWS Systems Manager with EC2/EKS autoscaling to start/stop stateful services and cluster nodes in dev, building AWS Lambda + EventBridge workflows with SNS alerts for on-demand scaling without developer console access, reducing weekend/off-hour AWS costs. Managed and optimized AWS IAM Policies, Roles, and Permission Boundaries, implementing strict least-privilege access controls to improve security posture and ensure compliant, hardened cloud environments. Performed advanced Linux system administration, driving process optimization, performance tuning, and deep-dive troubleshooting across distributed systems to ensure high availability, reliability, and efficient resource utilization. Performed zero-downtime Kubernetes upgrades and provisioned EKS clusters using Terraform, integrating ArgoCD for GitOps-based continuous delivery. Deployed Prometheus, AWS Load Balancer Controller, and Cluster Autoscaler using Helm, and configured both internal and internet-facing ingress load balancers for stateful and stateless applications. Implemented secure internal-to-external routing using Nginx as a reverse proxy and deployed core infrastructure services including RabbitMQ, Prometheus, Redis, Kafka, ClickHouse, Ops Manager, Hazelcast, and MongoDB via Ansible. Built scalable CI/CD pipelines using GitHub Actions and Jenkins, enabling automated build, test, deployment, Docker image creation, ECR uploads, and multi-environment version syncing, improving release speed by 50%. Migrated Java and Node.js applications from ADM to ARM architecture, cutting AWS cost by 10%, and enhanced visibility by integrating Slack notifications at each pipeline stage. Collaborated with QA, Backend, and Frontend teams to ensure smooth, reliable releases across all environments. Deployed and managed Prometheus-Grafana stacks with Prometheus federation for real-time Kubernetes and stateful service metrics using node_exporter, migrated dashboards, and implemented intelligent alert routing by sending warning alerts to Slack and critical alerts to PagerDuty. Enforced CloudWatch alarms to meet SOC 2 compliance and deployed a scalable ELK stack for centralized log management, ensuring complete platform observability. Developed automated MongoDB backup workflows using Bash and cron, exporting and compressing full database snapshots and securely pushing them to AWS S3 on a daily schedule. Created Bash scripts for monitoring agent setup with CloudWatch and node_exporter integration, ensuring strong observability. Additionally, automated MongoDB ClickHouse data migrations using Kafka and Python, improving data transfer performance by 90%. Reduced AWS cost by 20% through proactive resource right-sizing, Kubernetes optimization, and fine-tuning CPU/Memory requests, limits, and autoscaling configurations.