Senior Data Scientist
InnovaccerData Science Associate Consultant
ZS AssociatesSpearheaded a team of 2 MLEs and 1 SDE in Analytics R&D, working on GenAI and NLP initiatives from POC to production
and fostered strong collaboration across Product, Engineering, GTM, and key business stakeholders to align technical
execution with product vision.
Architected a GenAI-powered Pop Health Copilot using Chain-of-Thought prompting and multi-agent orchestration for
NL2SQL, insights generation, and visualizations in a conversational interface.
Boosted NL2SQL accuracy with 2,500+ custom instructions, dynamic guideline selection, and abstraction over 37+ tables -
achieving 89% query acceptance and reducing latency from 130s to 40s.
Engineered scalable architecture for chatbot with MongoDB (conversation logs), Redis (session management), AWS S3
(knowledge base), and Snowflake (insights), ensuring 99% uptime and 56% faster response time.
Led evaluation of vector DBs (Pinecone, FAISS, Milvus, ChromaDB), finalizing Milvus and cutting 82% resource utilization via
optimized data loading strategies.
Built a prompt and config versioning system with CI/CD integration, reducing release cycles from 2-3 days to under 30 mins
and enabling rapid, agile NLP experimentation.
Accelerated OCR inference by optimizing the PARSeq model with TensorRT and deploying it via Triton Inference Server (fp16
quantized), achieving 23-25 RPS with <1s response time - a 2.8x speedup over the baseline.
Carried out POCs for SLMs (small language models) like Qwen 2.5, NuExtract 1.5 using vLLM, Triton, SageMaker, and RunPod,
for future GenAI scalability, cost, and performance.
Evaluated LLM observability frameworks (LiteLLM, TrueFoundry) with senior architects to enhance traceability and observability.
Built an NLP-based information retrieval system using BioBERT and fine-tuned Spacy NER to extract insights from clinical trial
documents (PDFs, ClinicalTrials.gov, PubMed); reduced manual effort by 60%, reduce trial design time from weeks to hours;
featured on Cision PR Newswire.
Developed a sales optimization pipeline on Dataiku using patient-level data, feature engineering, XGBoost, and unconstrained
optimization to generate timely triggers and rank physicians, achieving 74% recall and boosting sales by 18%.
Automatic Text Summarization using Deep Learning
Solution of Differential Equation using Newton Raphson Method - Python