profile-pic

Soumendu Bhattacharjee, Ph.D Physics

I bring extensive experience in managing data science projects and leading teams, with a particular focus on building advanced Retrieval-Augmented Generation (RAG) systems, including Agentic RAG and Graph-based RAG frameworks. My work has centered on integrating LLMs with graph-based knowledge systems to enhance retrieval accuracy and content generation, enabling enterprises to navigate and leverage complex data effectively.

  • Role

    Senior Data Scientist

  • Years of Experience

    6.6 years

  • Professional Portfolio

    View here

Skillsets

  • Azure Blob
  • Jupyter Notebook
  • Gradient Boosting
  • Fast.ai
  • Elasticsearch
  • Elastic net
  • Duck DB
  • Dbscan
  • Cnn
  • Catboost
  • K-Means
  • Azure
  • AWS Document DB
  • AWS
  • ARIMA
  • AdaBoost
  • Python
  • SVM
  • SQL Server
  • Ridge
  • LangChain
  • Haystack
  • Stardist
  • UNet
  • U2net
  • Tableau
  • SFTP
  • SARIMA
  • Spark
  • Prophet
  • Power BI
  • OpenCV
  • NumPy
  • Mongo DB
  • Mean-shift
  • LASSO
  • Keras
  • Python
  • Git
  • Azure Data Factory
  • Streamlit
  • Rnn
  • Random Forest
  • LSTM
  • BERT
  • Airflow
  • Python
  • PyTorch
  • Jira
  • Docker
  • C++
  • C
  • Python
  • TensorFlow
  • Python
  • SQL
  • Kafka
  • Snowflake
  • S3
  • Redshift
  • Redis
  • Postgres
  • pandas
  • Matplotlib
  • kNN
  • Python
  • GPT
  • FastAPI
  • Cron scheduling
  • BigQuery
  • Azure Cosmos DB
  • AWS Glue
  • XgBoost
  • Scikit-learn

Professional Summary

6.6Years
  • Jul, 2025 - Sep, 2025 2 months

    Senior Software Engineer - AI

    Velsera
  • Apr, 2025 - May, 2025 1 month

    Senior AI Engineer

    Emids
  • Feb, 2024 - Apr, 20251 yr 2 months

    Data scientist

    Flexday Solutions
  • Aug, 2021 - Feb, 2022 6 months

    Research scientist

    Czech Technical University
  • Mar, 2022 - Apr, 20231 yr 1 month

    Senior Data scientist

    Sortera Alloys
  • Jul, 2023 - Jan, 2024 6 months

    Data scientist

    SmartHelio
  • Jun, 2017 - Jun, 20214 yr

    Post Doctoral fellow

    TRIUMF, University of British Columbia
  • Apr, 2016 - May, 20171 yr 1 month

    Research Associate

    Inter University Accelerator Centre

Applications & Tools Known

  • icon-tool

    Jira

Work History

6.6Years

Senior Software Engineer - AI

Velsera
Jul, 2025 - Sep, 2025 2 months
    Performed entity extraction from PubMed papers for cancer research using LLMs. Leveraged the multimodal capabilities of LLMs to enhance entity extraction performance. Addressed the challenge of maintaining 100% recall by designing a hybrid approach. Utilized BioBERT and SNOMED CT ontology-based methods to improve entity extraction accuracy. Researched and experimented with U-Net, Mask R-CNN, and Mask2Former for semantic, instance, and panoptic segmentation of satellite images. Developed an application for automated satellite image segmentation to support large-scale analysis.

Senior AI Engineer

Emids
Apr, 2025 - May, 2025 1 month
    Fine-tuned LLM models for classification tasks using Azure DevOps. Performed multi-label classification on large healthcare documents stored in Azure Blob using Spark. Used Azure Data Factory to automate preprocessing pipelines and summary generation workflows.

Data scientist

Flexday Solutions
Feb, 2024 - Apr, 20251 yr 2 months
    Provided supply chain optimization solutions. Integrated large language models (LLMs) with knowledge graphs for enterprise content navigation. Enhanced RAG system performance using graph-based approaches. Built Q&A systems and chatbots using LLMs. Generated reports for legal documents and news articles. Fine-tuned LLMs using LoRA and ran LLMs locally with Ollama. Built RAG Agent for tabular data Q&A and LangGraph research agent. Performed web scraping and data cleaning for LLM input. Led the data science team and addressed diverse business problems. Deployed code using Docker and FastAPI. Worked with MongoDB, Redis, Kafka, and Postgres for real-time content streaming. Built and deployed Streamlit apps using data from S3, Azure Blob, and SFTP. Managed operations and orchestrated jobs using Airflow and Cron scheduling. Performed speech-to-text analysis and A/B testing.

Data scientist

SmartHelio
Jul, 2023 - Jan, 2024 6 months
    Used autoencoder models for fault detection in PV systems. Supervised team for long-term GHI prediction using multivariate LSTM, XGBoost, and CatBoost. Managed connector fault detection projects using Boosting, RFC, and SVM algorithms. Applied Fourier transform for noise reduction. Transformed tabular data into images and performed classification using MobileNet, CNN, VGG16, and VGG19. Generated synthetic time series data using Gretel and TimeGAN. Built classifiers by converting tabular data into image data and applying noise removal algorithms. Supervised junior colleagues and led multiple projects. Presented work to clients weekly. Trained and tested ML models on AWS SageMaker. Used AWS S3 and Glue for data ingestion and processing. Stored results in Snowflake and Redshift. Scheduled batch model evaluations using Airflow.

Senior Data scientist

Sortera Alloys
Mar, 2022 - Apr, 20231 yr 1 month
    Performed classification tasks on tabular datasets using Random Forest, Boosting, LightGBM, and SVM. Conducted image classification with deep neural networks for over 1500 classes using InceptionV3, ResNet50, MobileNet, CNN, VGG16, VGG19, EfficientNetB7. Explored combining CNNs with Random Forest and XGBoost. Worked on image segmentation for biological images using UNet and Stardist. Conducted feature selection using PCA, Random Forest, and chi-square. Developed automated systems for classification from raw spectroscopic data. Analyzed tabular data by converting to images with Tab2Img. Performed cell and image segmentation using UNet, U2Net, and Stardist. Applied Fourier analysis for denoising time series. Supervised junior members and presented work to technical and non-technical audiences. Analyzed text data using Tf-idf, Word2vec, Glove, VADER, RNN, LSTM, GRU, BERT, GPT. Conducted time series analysis using ARIMA, SARIMA, Prophet, LSTM, Conv1D & Conv2D. Built pipelines integrating SQL Server, BigQuery, Duck DB. Used S3, MongoDB, and Redis for hybrid cloud-data storage.

Research scientist

Czech Technical University
Aug, 2021 - Feb, 2022 6 months
    Performed data analysis and optimized ML models for large-scale scientific experiments. Developed controls and electronics for detection systems. Prepared scientific proposals and documentation. Presented analysis results at conferences. Collaborated with global groups and supervised junior members.

Post Doctoral fellow

TRIUMF, University of British Columbia
Jun, 2017 - Jun, 20214 yr
    Analyzed data and optimized ML models for scientific experiments. Published results in high-impact journals. Maintained and improved hardware and software equipment. Managed experimental campaigns and documentation. Presented results at conferences. Supervised undergraduate and master students. Collaborated with colleagues and group members.

Research Associate

Inter University Accelerator Centre
Apr, 2016 - May, 20171 yr 1 month
    Analyzed data and optimized ML models for scientific experiments. Published results in high-impact journals. Prepared scientific proposals and documentation. Presented results at conferences. Supervised undergraduate and master students and collaborated with other groups. Performed Monte Carlo simulations for gamma ray detectors using C++.

Major Projects

3Projects

Fault Detection in PV System

    Used autoencoders with different feature sets to detect faults in photovoltaic systems.

Long-term GHI Prediction

    Forecasted global horizontal radiation using multivariate LSTM, XGBoost, and CatBoost models.

RAG System Enhancement

    Enhanced RAG systems using advanced knowledge graphs and developed Q&A tools for enterprise applications.

Education

  • Phd in Physics

    University of Calcutta