Maintained a central ML gateway (FastAPI) routing inference traffic across multiple NLP and multimodal services using RabbitMQ and Kubernetes, supporting millions of requests per day across high-throughput, multi-language workloads. Led end-to-end deployment of NLP models (classifiers, paraphrasers, grammar checkers, translators) across multiple GCP clusters and environments, enabling safe rollouts and rollbacks using ArgoCD and Datadog. Built an automated CI/CD pipeline using GitLab CI to promote fine-tuned Vertex AI models from model registry to production, integrating LiteLLM-based cost observability to track and optimize LLM inference spend, increasing developer efficiency by 70%. Designed resilient LLM inference pipelines with Langfuse-driven prompt versioning and automated cross-vendor fallback, ensuring graceful failover during rate-limit and availability incidents and reducing user-facing failures during traffic spikes. Led complex embedder-discriminator deployments to support personalization research, collaborating with data, backend and research teams to safely productionize experimental models and accelerate research-to-production timelines. Implemented batch processing for AI content detection workloads, improving throughput and reducing required Kubernetes pods, resulting in meaningful infrastructure cost savings.