Built LangChain-based pipelines for document retrieval and context-aware question answering, integrating HuggingFace Transformers for embedding and generation tasks. Fine-tuned open-source LLMs (Mistral, LLaMA) using PEFT with LoRA and QLoRA, enabling efficient on-premise inference while significantly reducing GPU memory requirements. Built end-to-end MLOps pipelines for multiple models, handling training, inference, and deployment in production. Improved model accuracy through spatial and temporal analytics, identifying performance gaps and implementing post-processing fixes. Created an Agentic AI platform with Google ADK for Streamingo Atlasa multi-agent FAQ system that automates documentation search and user support. Developed internal Python packages to standardize model training, inference, and pipeline management, speeding up team development. Restructured inference workflow to run projects in parallel, cutting daily processing time by 5 hours. Tuned CUDA settings for YOLO and MTO models, improving inference speed and GPU efficiency. Set up CI/CD pipelines with Bitbucket and Jenkins to automate builds, tests, and deployments. Led migration from service-based setup to cloud-native architecture using Helm and Kubernetes, improving scalability. Consolidated all GPU machines into a single Kubernetes cluster, removing manual GPU management and improving resource usage. Managed GCP infrastructure (GKE, GCS, networking), optimizing cloud operations and reducing annual costs by 10-12%.