profile-pic

Soumendu Bhattacharjee, Ph.D Physics

I bring extensive experience in managing data science projects and leading teams, with a particular focus on building advanced Retrieval-Augmented Generation (RAG) systems, including Agentic RAG and Graph-based RAG frameworks. My work has centered on integrating LLMs with graph-based knowledge systems to enhance retrieval accuracy and content generation, enabling enterprises to navigate and leverage complex data effectively.

  • Role

    Senior Data Scientist

  • Years of Experience

    6.6 years

  • Professional Portfolio

    View here

Skillsets

  • Elasticsearch
  • LangGraph
  • Keras
  • K-Means
  • Jupyter Notebook
  • Inceptionv3
  • Gru
  • Gretel
  • Gradient Boosting
  • Few-shot prompting
  • Fast.ai
  • LASSO
  • Elastic net
  • Efficientnetb7
  • Duck DB
  • Dbscan
  • Cnn
  • Catboost
  • Azure Blob
  • Azure
  • AWS Document DB
  • AWS
  • Resnet50
  • Tab2img
  • Stardist
  • Vgg19
  • Vgg16
  • UNet
  • U2net
  • Tableau
  • SFTP
  • SARIMA
  • Ridge
  • ARIMA
  • Prophet
  • Power BI
  • OpenCV
  • NumPy
  • Mongo DB
  • MobileNet
  • Mean-shift
  • MCP
  • LLM
  • LightGBM
  • Airflow
  • XgBoost
  • Scikit-learn
  • PyTorch
  • Git
  • Azure Data Factory
  • Streamlit
  • Rnn
  • Random Forest
  • LSTM
  • BERT
  • AWS Glue
  • Python
  • Python
  • Jira
  • Docker
  • C++
  • C
  • Python
  • TensorFlow
  • Python
  • SQL
  • Postgres
  • AdaBoost
  • Python
  • Timegan
  • SVM
  • SQL Server
  • Spark
  • Snowflake
  • S3
  • Redshift
  • Redis
  • Python
  • pandas
  • Matplotlib
  • kNN
  • Kafka
  • GPT
  • FastAPI
  • Cron scheduling
  • BigQuery
  • Azure Cosmos DB

Professional Summary

6.6Years
  • Apr, 2025 - May, 2025 1 month

    Senior AI Engineer

    Emids
  • Feb, 2024 - Apr, 20251 yr 2 months

    Data scientist

    Flexday Solutions LLC
  • Jul, 2023 - Jan, 2024 6 months

    Data scientist

    SmartHelio
  • Jun, 2017 - Jun, 20214 yr

    Post Doctoral fellow

    TRIUMF, University of British Columbia (UBC)
  • Aug, 2021 - Feb, 2022 6 months

    Research scientist

    Czech Technical University
  • Mar, 2022 - Apr, 20231 yr 1 month

    Senior Data scientist

    Sortera Alloys
  • Apr, 2016 - May, 20171 yr 1 month

    Research Associate

    Inter University Accelerator Centre

Applications & Tools Known

  • icon-tool

    Jira

Work History

6.6Years

Senior AI Engineer

Emids
Apr, 2025 - May, 2025 1 month
    Fine-tuned LLM models for classification tasks using Azure DevOps. Performed multi-label classification on large healthcare documents stored in Azure Blob using Spark. Used Azure Data Factory to automate preprocessing pipelines and summary generation workflows.

Data scientist

Flexday Solutions LLC
Feb, 2024 - Apr, 20251 yr 2 months
    Provided supply chain optimization solutions. Integrated large language models (LLMs) with knowledge graphs for enterprise content navigation. Enhanced RAG system performance using a graph-based approach. Built Q&A systems for complex documents and report generation for legal documents and news articles using LLMs. Finetuned LLMs using LoRA and ran LLMs locally with Ollama. Built a RAG Agent for Q&A on tabular data. Developed a LangGraph research agent. Performed web scraping and data cleaning for LLM input. Led the data science team and addressed diverse business problems. Deployed code using Docker and FastAPI. Worked with MongoDB, Redis, Kafka, and Postgres for real-time content streaming. Built and deployed Streamlit apps using data from S3, Azure Blob, and SFTP. Managed team operations and orchestrated jobs using Airflow and Cron scheduling. Performed speech-to-text analysis on customer support data and conducted A/B testing.

Data scientist

SmartHelio
Jul, 2023 - Jan, 2024 6 months
    Led projects on fault detection in PV systems using autoencoder models. Supervised teams for long-term GHI prediction using multivariate LSTM, XGBoost, and CatBoost. Managed connector fault detection projects using Boosting, RFC, and SVM. Applied Fourier transform for noise reduction. Converted tabular data into images for classification using MobileNet, CNN, VGG16, and VGG19. Generated synthetic time series data using Gretel and TimeGAN. Conducted anomaly detection, feature selection, and simulation of PV systems. Presented work to clients and trained ML models on AWS SageMaker. Used AWS S3 and Glue for data pipelines, stored results in Snowflake and Redshift, and scheduled batch evaluations with Airflow.

Senior Data scientist

Sortera Alloys
Mar, 2022 - Apr, 20231 yr 1 month
    Performed classification on tabular datasets using Random Forest, Boosting, LightGBM, and SVM. Conducted image classification with deep neural networks (InceptionV3, ResNet50, MobileNet, CNN, VGG16, VGG19, EfficientNetB7). Worked on image segmentation for biological images using UNet and Stardist. Conducted feature selection, model optimization, and hyperparameter tuning. Developed automated systems for classification from raw spectroscopic data. Analyzed tabular data by converting to images with Tab2Img. Performed denoising with Fourier analysis. Supervised junior members and presented results to technical and non-technical audiences. Conducted text data analysis and time series analysis. Built pipelines integrating SQL Server, BigQuery, and Duck DB. Used S3, MongoDB, and Redis for hybrid cloud-data storage workflows.

Research scientist

Czech Technical University
Aug, 2021 - Feb, 2022 6 months
    Analyzed data and optimized ML models (supervised and unsupervised) for large-scale scientific experiments. Developed and implemented controls and electronics for detection systems. Prepared scientific proposals and documentation, and presented analysis results at conferences. Collaborated with global groups and supervised junior members.

Post Doctoral fellow

TRIUMF, University of British Columbia (UBC)
Jun, 2017 - Jun, 20214 yr
    Analyzed data and optimized ML models for large-scale scientific experiments. Published results in high-impact journals. Maintained and improved hardware and software equipment. Managed experimental campaigns and prepared scientific proposals and documentation. Supervised undergraduate and master students and collaborated with group members.

Research Associate

Inter University Accelerator Centre
Apr, 2016 - May, 20171 yr 1 month
    Analyzed data and optimized ML models for large-scale scientific experiments. Published results in high-impact journals. Prepared scientific proposals and documentation, and presented results at conferences. Supervised undergraduate and master students and collaborated with other groups. Conducted Monte Carlo simulations for gamma ray detectors using C++.

Major Projects

3Projects

Fault Detection in PV System

    Used autoencoders with different feature sets to detect faults in photovoltaic systems.

Long-term GHI Prediction

    Forecasted global horizontal radiation using multivariate LSTM, XGBoost, and CatBoost models.

RAG System Enhancement

    Enhanced RAG systems using advanced knowledge graphs and developed Q&A tools for enterprise applications.

Education

  • Phd in Physics

    University of Calcutta