profile-pic

Raghav Bansal

I’m a Machine Learning Engineer with almost 4 years at Samsung, building production-grade Machine Learning systems across on-device, cloud, and generative AI pipelines. I’ve worked on multilingual Natural Language Processing (NLP) models, Deep Learning, Retrieval-Augmented Generation (RAG) systems, and deployed Large Language Models (LLMs) at scale using Triton Inference Server and AWS.

My strengths include on-device AI (TensorFlow Lite, model quantization, unsupervised clustering), contextual search (Word2Vec, BERT), and building robust ML APIs integrated with Android and web services.

I’m currently open to opportunities focused on LLMs, on-device intelligence, Deep Learning, or scalable Machine Learning infrastructure.

  • Role

    Software, FastAPI, Machine Learning Engineer

  • Years of Experience

    3.6 years

Skillsets

  • fine-tuning
  • MCP
  • model quantization
  • NLP
  • Ollama
  • Room
  • S3
  • SQS
  • Triton Inference Server
  • Bash
  • LLMs
  • Hugging Face
  • Jenkins
  • Linux
  • NumPy
  • pandas
  • PostgreSQL
  • PyTorch
  • rag
  • Scikit-learn
  • Auto Scaling
  • Python
  • Java
  • SQL
  • AWS
  • Docker
  • Flask
  • Kotlin
  • MLOps
  • TensorFlow
  • Python
  • ChromaDB
  • EC2
  • FAISS
  • FastAPI
  • Gerrit
  • Git
  • Jira
  • LangChain

Professional Summary

3.6Years
  • Jul, 2022 - Present3 yr 8 months

    Software Engineer Machine Learning Engineer

    Samsung R&D Institute India
  • Feb, 2022 - Jul, 2022 5 months

    R&D Intern Machine Learning Engineer

    Samsung R&D Institute

Work History

3.6Years

Software Engineer Machine Learning Engineer

Samsung R&D Institute India
Jul, 2022 - Present3 yr 8 months
    Health Orchestrator | FastAPI, LangGraph, MCP, PostgreSQL, WebSockets, Android, LLMs: Architected a multi-agent orchestration layer handling requests via WebSockets, reducing response latency by 30% while eliminating agent collisions through deterministic sequencing. Engineered PostgreSQL-backed session memory to persist context across 20+ turn interactions, improving clinical instruction recall by 40% and ensuring rigid agent boundaries. Deployed MCP-governed guardrails with custom hallucination checks, blocking 100% of non-compliant tool calls and securing tool-chain execution for safety-critical queries. Samsung Personal Health Records (PHR) | Triton, EC2, SQS, S3, Auto Scaling, Docker, LLMs: Orchestrated a scalable LLM inference service on AWS EC2 via Triton Inference Server, utilizing custom AMIs and Auto Scaling groups to optimize compute costs by 20%. Engineered an asynchronous processing pipeline using SQS and S3, decoupling data ingestion to reduce end-to-end latency by 70% and increasing throughput by 3x. Samsung Internet Browser Semantic Search | Python, ML, RAG, NLP, TensorFlow, FastAPI, LLMs: Engineered a hybrid semantic-search and autocorrect system by ensembling BERT contextual embeddings with Word2Vec and Doc2Vec, achieving 97% Top-k accuracy on complex queries. Productionized a 94-language polyglot NLU model served via FastAPI and integrated with Android (Retrofit), optimizing inference for real-time multilingual support. Architected a RAG pipeline using dense vector embeddings and GPT for contextual query generation, improving intent resolution rates by 30% via semantic grounding. Smart Vibration and Ringtone Adjustment System | TensorFlow Lite, CNN, Android On-device ML: Engineered an ultra-compact 719 KB on-device CNN for surface classification (98.8% accuracy), optimizing inference to run within 610 KB RAM & 10 MB ROM to drive adaptive haptic feedback.

R&D Intern Machine Learning Engineer

Samsung R&D Institute
Feb, 2022 - Jul, 2022 5 months
    Text Summarization Model Development | EncoderDecoder NLP, Python: Developed a text summarizer using EncoderDecoder NLP architecture with attention mechanism, achieving 97% accuracy.

Major Projects

2Projects

Smart Vibration and Ringtone Adjustment System

    Trained a CNN model on accelerometer data for real-time surface classification, achieving 98.8% accuracy and enabling automatic vibration/ringtone adjustment. Collected and processed sensor data, and optimized a TensorFlow model for on-device inference under 10MB ROM and 610KB RAM.

Samsung Cloud Emergency Backup

    Developed an on-device ML based emergency backup with TensorFlow Lite and unsupervised clustering, reducing backup latency by 50%.

Education

  • Bachelor of Technology in Computer Science and Engineering

    Guru Nanak Dev Engineering College (2022)