Principal ML Engineer
Cars24Nov, 2020 - Aug, 20254 yr 9 months
Incubated and led efforts in ML Engineering, MLOps, Computer Vision along with data engineering to develop, maintain and deploy at scale. Trained and deployed domain-specific small LLMs/VLMs using LoRA for efficient structured data extraction via adapter switching. Architected a scaled POC consuming STT models and generating customized responses via ChatGPT, enabling a comprehensive 360 degree customer view. Enabled multi-tooling support for the chat agent, integrating image and hybrid search capabilities using typesense & mivlus (vector databases). Built a Small Language Model (SLM) to extract entities from text at real-time throughput, efficient for custom tooling for agentic interaction. Led and expanded a cross-functional AI/ML practice spanning Data Engineering, Deep Learning, ML Engineering & MLOps. Developed and deployed fine-tuned STT models (Hindi, Kannada, Tamil, Telugu), achieving >100 RealTime Factor (RTF) and reducing latency. Optimized deep learning deployment workflows using Triton, TensorRT, ONNX, and mixed/half-precision, maximizing hardware utilization. Implemented ensemble model scheduling with zero memory copy, significantly improving inference performance. Led the migration of the observability stack from managed Google services to a self-managed Grafana/Alloy/Loki setup. FFMPEG with GPU enabled image decoding and encoding, delivering speed and reduced memory copy for enhanced performance. Spearheaded the Snowflake data architecture evolution, ensuring data privacy compliance and managing migrations/upgrades. Architected real-time data pipelines with StarRocks/Kafka for Data Science, powering cost-effective recommendations. Evaluated Delta Lake/Iceberg formats, enabling compute-storage separation and multi query engine support.