profile-pic
Vetted Talent

Sayan Mukhopadhyay

Vetted Talent
Capitalising the vast domain knowledge in Data Science & Machine Learning through leadership to steer companies & clients in breaking new business avenues and reaching new horizons.; targeting for Sr. level assignments in Machine Learning/Solution Architecture/FinTech with an organization of high repute
  • Role

    Architect - AI/ML

  • Years of Experience

    21.8 years

  • Professional Portfolio

    View here

Skillsets

  • Rust
  • TextBlob
  • Terraform
  • SVN
  • Storm
  • Spring
  • Spark
  • Socket.IO
  • SOAP
  • Snowflake
  • Shell
  • WCF
  • REST
  • Redis
  • react
  • Play
  • Oracle
  • OpenMP
  • OpenCV
  • NLTK
  • MySQL
  • D3
  • Dialogpt
  • C#.NET
  • Unity
  • three.js
  • Scikit-learn
  • POSIX
  • OpenGL
  • Node
  • Google APIs
  • Dash
  • MPI
  • C3
  • Kafka
  • Vivado
  • Suricata
  • Nagios
  • Kali Linux
  • Cacti
  • ZeroMQ
  • XgBoost
  • TensorFlow
  • ActiveMQ
  • PyTorch
  • pandas
  • Neo4j
  • MongoDB
  • Hive
  • Hadoop
  • Go
  • Elixir
  • C++
  • Airflow
  • SQL
  • Scala
  • Python
  • Kubernetes
  • Java
  • GCP
  • Elasticsearch
  • Docker
  • Azure
  • Flask
  • Milvus
  • MapReduce
  • LightFM
  • LangGraph
  • LangChain
  • Keras
  • Jenkins
  • Git
  • FPGA
  • AWS
  • FastAPI
  • Falcon
  • Django
  • dbt
  • CUDA
  • ChromaDB
  • BERT
  • Ansible
  • Angular

Vetted For

0Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Digital Data ScientistAI Screening
  • 60%
    icon-arrow-down
  • Score: 60/100

Professional Summary

21.8Years
  • Feb, 2026 - Present 4 months

    Architect - AI/ML

    Confidential
  • Nov, 2016 - Present9 yr 7 months

    Machine Learning, Big Data, Cloud, LLM

    Remote Consultant
  • Jun, 2015 - Nov, 20161 yr 5 months

    Manager Data Science

    Abzooba
  • May, 2013 - Aug, 2013 3 months

    Lead Architect - Data Mining

    Wedoria
  • Oct, 2013 - Jul, 2014 9 months

    Technical Architect Security Tool Group

    Mphasis
  • Jul, 2014 - Jun, 2015 11 months

    Data Scientist / Sr. Consultant Analytics & Big Data

    TCG Digital
  • May, 2012 - Apr, 2013 11 months

    Senior Technical Analyst Machine Learning

    Pubmatic
  • Sep, 2011 - Apr, 2012 7 months

    Senior Software Engineer

    CA Technology
  • Nov, 2010 - Sep, 2011 10 months

    Technical Manager

    FairFest Media
  • Jan, 2000 - Dec, 20033 yr 11 months

    Software Engineer

    Total Computer System
  • Jun, 2008 - Feb, 20101 yr 8 months

    Technical Consultant (Risk & Algo Trading)

    Credit-Suisse
  • Mar, 2010 - Nov, 2010 8 months

    Senior Consultant

    PayPal

Applications & Tools Known

  • icon-tool

    MySQL

  • icon-tool

    Javascript

  • icon-tool

    Python

  • icon-tool

    C++

  • icon-tool

    PHP

  • icon-tool

    Java

  • icon-tool

    Play

  • icon-tool

    Neo4j

  • icon-tool

    AWS

  • icon-tool

    node js

  • icon-tool

    Selenium

  • icon-tool

    R

  • icon-tool

    MongoDB

  • icon-tool

    Hadoop

  • icon-tool

    Spark

  • icon-tool

    Pentaho

  • icon-tool

    Kibana

  • icon-tool

    Tableau

  • icon-tool

    NLTK

  • icon-tool

    Docker

  • icon-tool

    Kubernate

  • icon-tool

    Nagios

  • icon-tool

    Azure

  • icon-tool

    Socket

  • icon-tool

    IPC

  • icon-tool

    HTTP

  • icon-tool

    REST

  • icon-tool

    SOAP

  • icon-tool

    Spring

  • icon-tool

    Flask

  • icon-tool

    Falcon

  • icon-tool

    Django

  • icon-tool

    GDB

  • icon-tool

    Makefile

  • icon-tool

    CUDA

  • icon-tool

    visual studio

  • icon-tool

    pycharm

  • icon-tool

    MATLAB

  • icon-tool

    R

  • icon-tool

    Oracle

  • icon-tool

    Sybase

  • icon-tool

    SQLite

  • icon-tool

    Elastic Search

  • icon-tool

    Hive

  • icon-tool

    Kibana

  • icon-tool

    Tableau

  • icon-tool

    Keras

  • icon-tool

    NLTK

  • icon-tool

    pandas

  • icon-tool

    Snowflake

  • icon-tool

    Airflow

  • icon-tool

    dbt

  • icon-tool

    Linux

  • icon-tool

    Windows

  • icon-tool

    Router

  • icon-tool

    Switch

  • icon-tool

    NMAP

  • icon-tool

    Kali Linux

  • icon-tool

    Suricata

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    terraform

  • icon-tool

    Spring

  • icon-tool

    CUDA

  • icon-tool

    R

  • icon-tool

    SQLite

  • icon-tool

    Hive

  • icon-tool

    Kibana

  • icon-tool

    Tableau

  • icon-tool

    NLTK

  • icon-tool

    pandas

  • icon-tool

    Linux

  • icon-tool

    NMAP

  • icon-tool

    GCP

  • icon-tool

    terraform

  • icon-tool

    Cryptography

  • icon-tool

    GDB

  • icon-tool

    Analytics

  • icon-tool

    SQLite

  • icon-tool

    Kibana

  • icon-tool

    Tableau

  • icon-tool

    NLTK

  • icon-tool

    pandas

  • icon-tool

    bert

  • icon-tool

    Shell

  • icon-tool

    SQL

  • icon-tool

    Scala

  • icon-tool

    Elixir

  • icon-tool

    Go

  • icon-tool

    Rust

  • icon-tool

    Unity

  • icon-tool

    UML

  • icon-tool

    WCF

  • icon-tool

    Linux

  • icon-tool

    Suricata

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    terraform

  • icon-tool

    SOAP

  • icon-tool

    SQLite

  • icon-tool

    Tableau

  • icon-tool

    pandas

  • icon-tool

    Linux

  • icon-tool

    Unix

  • icon-tool

    CUDA

  • icon-tool

    Git

  • icon-tool

    R

  • icon-tool

    Sybase

  • icon-tool

    SQLite

  • icon-tool

    Kibana

  • icon-tool

    Tableau

  • icon-tool

    NLTK

  • icon-tool

    Pandas

  • icon-tool

    BERT

  • icon-tool

    Linux

  • icon-tool

    Kubernetes

  • icon-tool

    Cacti

  • icon-tool

    Suricata

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    Terraform

Work History

21.8Years

Architect - AI/ML

Confidential
Feb, 2026 - Present 4 months
    Working in Gen AI, Physics-inspired ML, and Physical AI.

Machine Learning, Big Data, Cloud, LLM

Remote Consultant
Nov, 2016 - Present9 yr 7 months
    Spearheading the creation of data science capabilities for diverse clients. Architected cloud-native pipelines, established CI/CD, and mentored cross-functional teams to ensure trustworthy AI delivery. SRMB Steel Plant (Jan 2026 Feb 2026): Built an AI system to measure the weight of moving and vibrating objects. ABP Weddings | Solution Architect (Apr 2025 Aug 2025): Architected and developed a comprehensive Proof of Concept (POC), successfully handing over codebases to frontend, backend, and database teams. Engineered a scalable RAG-based recommendation system using Milvus and a FastAPI-based AI service to flag unethical profile images. Integrated Truecaller SDK for phone number verification in React Native. McDonald's | Data Science Engineer (Dec 2023 Dec 2024): Built and deployed a production-grade Dash dashboard for time series forecasting, incorporating statistical, ML, and deep learning models via the Nixtla framework. Validated a third-party recommendation engine for performance and accuracy. Ulventech | ML Engineer (Aug 2023 Nov 2023): Developed a dating site chatbot using Langgraph and ChromaDB for data retrieval and conversation flow. Constructed a robust data pipeline to generate hedge fund trade signals from market data. Gotoko | Lead Data Scientist (Nov 2022 Jun 2023): Built a collaborative filtering-based recommendation engine for an e-commerce platform and developed a sales target optimization module using linear programming (Python, GCP, BigQuery, scikit-learn). Symphony AI | Lead Data Scientist (Jan 2021 Aug 2022): Architected a customer churn prediction product for a major media client using Snowflake, Azure Databricks, PySpark, and TensorFlow to identify at-risk customers. Pubmatic | Lead Algorithm Analyst (Apr 2019 Jan 2021): Designed a multi-layered experimental framework from scratch to apply Six Sigma principles to ML models and trading algorithms using Go, Scala, Python, and Spark. Sulvo | ML Architect (Nov 2016 Feb 2018): Built a high-performance deep learning system from the ground up to predict ad prices for online users, scaling to 22 million daily predictions globally within a 120ms latency target (GCP, TensorFlow, Redis, Falcon). Other Engagements: RLHF for LLMs (Turing, Outlier); Network Anomaly Detection (Whiz Hack via AWS SageMaker); Video Recommender System (Future Today).

Manager Data Science

Abzooba
Jun, 2015 - Nov, 20161 yr 5 months
    Directed a team of 7 data scientists, managing the full project lifecycle from design to implementation, consistently delivering complex initiatives 4 weeks early. Architected a credit scoring model using social media data and a HIPAA-compliant Health Insurance Claim Status Prediction System, boosting model accuracy by 72%. Implemented backend sharding and caching to support 3x peak loads, achieving 99.95% availability with automated failover mechanisms. Designed a Topical Crawler for web harvesting using graph traversal algorithms and NLP.

Data Scientist / Sr. Consultant Analytics & Big Data

TCG Digital
Jul, 2014 - Jun, 2015 11 months
    Led a sentiment analysis project for brand development utilizing a Naive Bayes classifier on Mahout/Hadoop, and deployed a Neural Network-based passenger load predictor for two aviation clients. Implemented a K-Means clustering solution with Needleman-Wunsch sequence matching to resolve data anomalies for an electronics manufacturer, cutting error rates by 80%.

Technical Architect Security Tool Group

Mphasis
Oct, 2013 - Jul, 2014 9 months
    Managed and coached a 12-engineer team to build a complex transaction monitoring system on an open-source stack for 9 enterprise clients (including Coca-Cola and Verizon). Designed enhanced system architecture, including Nagios Active Checks, Netflow integration, and security controls, reducing incident rates by 90%.

Lead Architect - Data Mining

Wedoria
May, 2013 - Aug, 2013 3 months
    Served as Lead Architect for data mining initiatives at ABP Group.

Senior Technical Analyst Machine Learning

Pubmatic
May, 2012 - Apr, 2013 11 months
    Developed the "Nostradamus Approximation Framework," applying predictive approximation techniques for big data queries to enhance the Hadoop platform, improving query performance by 100%.

Senior Software Engineer

CA Technology
Sep, 2011 - Apr, 2012 7 months
    Worked as Senior Software Engineer on Network Management System projects.

Technical Manager

FairFest Media
Nov, 2010 - Sep, 2011 10 months
    Held the role of Technical Manager, overseeing development and systems.

Senior Consultant

PayPal
Mar, 2010 - Nov, 2010 8 months
    Served as Senior Consultant via CSC supporting payments systems.

Technical Consultant (Risk & Algo Trading)

Credit-Suisse
Jun, 2008 - Feb, 20101 yr 8 months

Software Engineer

Total Computer System
Jan, 2000 - Dec, 20033 yr 11 months
    Served as Software Engineer building applications for different educational clients.

Achievements

  • Golden Award Silver 2020 Codility
  • Selected for National Math Olympiad India
  • Ranked 352 in West Bengal Engineering Entrance Examination and 50 in Graduate Aptitude Test in Engineering (Instrumentation)
  • B certificate holder by NCC (Army)
  • Managed and developed highly effective analytical solutions for a system receiving 100 million new records on a daily basis on behalf of a leading online advertising company.
  • Designed and developed a parser for FIX format file in C++ that improved efficiency ten-fold in comparison with Unix grep command; deployed to support high frequency trading servers of a major investment bank.
  • Led the development of an enterprise network management system, involving complex bug fixing and development of new features in one core of heterogeneous, distributed codes.
  • Selected for National Math Olympiad
  • Ranked 352 in West Bengal Joint Entrance Examination (Engineering)
  • All India ranks 50 in Graduate Aptitude Test in Instrumentation Engineering
  • B certificate holder by National Cadet Corps (Army)
  • A parallel algorithm for molecular dynamics simulation
  • Variance of difference as a distance like measure in synchronous time series microarray data clustering
  • Advance Data Analytics using Python, Apress, Sayan Mukhopadhyay (Book) 1st Ed 2nd Ed

Testimonial

Abzooba

Pubmatic

https://www.linkedin.com/in/sayan-mukhopadhyay-61634511/

Major Projects

4Projects

Ad Price Predictor System

    Developed from data collection to ML model for Sulvo Ad Price Prediction.

Nostradamus Approximation Framework

    Inventory estimation for Pubmatic.

METAL - Trading Time Risk Analysis Tool

    Developed for Credit-Suisse, focused on Risk Analysis.

Real Time Latency Monitoring

    High-frequency trading latency monitoring for Credit-Suisse.

Education

  • M.Tech (Research) in Computational & Data Science

    Indian Institute of Science (IISc) (2014)
  • B.Eng. in Instrumentation & Electronics

    Jadavpur University (2004)

Certifications

  • AWS

    Udemy (Jan, 2023)
  • Angular

    Code Academy (Dec, 2015)
  • Sas certified base programmer

  • Ncc b certificate

  • Cryptography from coventry university

Interests

  • Acting
  • Exercise
  • Writing
  • Watching Movies
  • AI-interview Questions & Answers

    My name is, and I did my B.Tech in electronics and instrumentation engineering from Jadavpur University and M.Tech in research in computational and data science from ISB Bengaluru. I work as a full-time employee in Credit Suisse, PayPal, CA Technology, Emphasis, and TC Digital. And after November 2016, I started my freelancing career. I worked for the startup like Sunvo and Future Today. And then I worked for the mid-sized company like Pubmedix and SymfonyAI. I worked for the big company like Crossover. Technology-wise, my main skills are machine learning and data analytics. I was part of the Credit Suisse risk analytics team in public. I was a senior technical analyst in machine learning and data science at TCV Digital. Data scientist and manager of data science. And after another field is in the infrastructure field. I worked in the Data Center team in Credit Suisse and later was promoted to the team. I worked in C e technology in a product prospectus, which is basically a network monitoring tool. And then I was a technical architect in the security tool group of Emphasis. So I have all the experience in all aspects of data. I can claim myself as a full-stack data analyst professional. I can do the front-end. I can do the back-end, and I can do everything in between.

    So we are building a price prediction system for a startup. What happens if your prediction is low is that in the video online bidding, if you ask for a low price, it will automatically sell. And in the next iteration, when you train your model with the data, your data will be lower. So your next iteration's prediction will be lower. In this way, the price is gradually going down, and at the same time, the site's review is going down. So, when we're learning this solution, we propose it. What we see is the PRC. If the revenue is going up, we do nothing. We don't fix it until it's broken. But if the revenue is going down, we look at the situation. If the revenue is going up, that means we're selling more but selling at a low price. This will make our prediction a little bit higher. And if the prediction is that the revenue is going down, that's when we're asking a high price, which is why we can't sell enough ads. So we make our prediction a little bit lower. In the ad industry, this is known as the flow of ad prices, and we give this algorithm the name "Dancing Flow." It's implemented on the Google Cloud Platform, and the prediction model is based on the flow work model. In the ad industry, there's a practice of keeping the model the same and sometimes running without a model. However, here, you can run it for a long time without stopping the model to predict prices. So it is.

    So signal is a time series data. So I will see the autocorrelation function and the autocorrelation function at this part, I will take it as a periodicity. And then also, I will see the average of the moving average. And from the moving average, you can see the trend. And I will remove the noise if these two things are there, if the if this is the case where these two things are not there, then this Adam, animal model, then the Adma model will be implemented in statistics in time series model. But if the trend is there, then the Adima model means integrated auto-regressive moving average model, and that is with the trend. And if the periodicity is also there, then the model is the seasonality model. And that is the model is there. And if you want to look at the deep learning problem, then we can use the current neural network for this kind of data signal, time series signal. And deep learning, and other signal processing, you can do the Fourier transform of the signal and see the frequency and, you can do the analysis in the frequency domain. And this is a long-term problem where you have to apply the inputs from the moment you have to learn from the errors, and actually, you will make a prediction model for the errors. And with the prediction of the value, we will predict the error and correct the value and give the answer. It will increase your accuracy.

    Yeah, I work in a project where the other 24 million people are saying they should allow us to target their actions. Okay. So what we see is that people who buy their subproducts, we make a column of 1, otherwise 0. And with this column, I see the correlation of the features that are correlated. And all these features will be taken as a clustering using the Euclidean distance function. And each cluster will then be declassified by the product. Like, what that means is if you buy that product or not. And we calculate the probability of buying the product. And after some threshold of the probability, we choose those customers. And in those customers who are already buying a lot, we do conventional Volatility filtering, also known as correlation filtering, but collision is not a distance. So we just take the p-value distance. We just use this in Spark. And the classification, we use named-based, random forest.

    So there is a data validation framework that you can check. So you probably follow with the right sample size, start date, and end date. And if you want to highlight that data, it's just whether the event date is in the middle of the start date and end date or not. And you can do a byte test for each row of the column, or above this column, you will get a record that is True or False. So this is one thing we can do. And, trained actually, you can play a visualization of the data analysis technique. If you look at the train, all these things, and the average of the last 10 points, average seasonality is if there isn't that, you can find it with the correlation function. Correlation will give you if the auto-correlation is high, that means there is a seasonality or periodicity. And free at this point also, you need to find the seasonality that has the full frequency forward, you have the highest value, free attachment value. Those are the time periods. Those inboxes will be the time periods. So that is in the k f, in the full year, frequency domain and significant change. That's only these two things that should be checked to check the accuracy of the model, and also you'll find that the data is also accurate.

    So I prefer a free software, and I use JavaScript. And I believe I build moments like this Google visualization API, but some clients have problems with sending data to Google. So that's why you can use D3 or C3. So divide this library with JavaScript. And on the backend, I write HTTP using the Python Flask library mainly, and sometimes I use a Flask API. If high performance is required, I will use Firebase. And if it is Google Cloud Platform, there is Google Data Studio. You can make a Google data sheet on these things. You can make a chart like Excel. Whatever you can do, you can do in Google Data Studio also. On the Google Cloud Platform, Amazon, if you have a ready database, the latest database has a lot of data analytics and visualization things in AWS. And conventional API tools like Tableau and Power BI, I am somewhat familiar with. And apart from that, I know front-end logic like tables. All these things I can build by myself using AngularJS or React. Actually, I do JavaScript at an average level. So I'm not a very good front-end developer, but I can build front-end. And back-end, I am an expert. I worked for a very big company in development. And data-related, I worked with companies, so I can handle data very well.

    I use so, basically, I work in the Panda and process CSV. I dump the data in from Python to Excel. And then the Excel, I use formulas like count. I did coordination things. I created bits about different kinds of charts. And, I'm also pivoting the table, which converts the column as a row to a column. So, that's what I can do in Excel. I have experience with Excel. I'm not very good, but I can handle things in Excel. And, I know a little bit of Power BI scripting also. So, this is Power BI automation. Now, I do not know. I do not work, but I heard that Python sixty is available in Excel. If that is available, I am an expert in it.

    That's a good question. So if it's business, people don't go to the technology, but see the impact on how much lifting in the revenue or how much lifting in the pew, or the audience, so you should be considerate and concentrate on that and that should be emphasized. And if it's technical people, then go to the architecture diagram, sequence diagram, class diagram, these kinds of things are there. An algorithm should be explained in a lucid manner to the business people because they're interested in how it's done. Apart from that, if it's technical, I'll make it short. So make it short and formal. Don't make anything that I don't want. Focus on the fee and it's very good, look at the audience's LinkedIn profile and these things and see their capability, their expertise, and keep your presentation within those domains. That would be a good idea. Do some homework.

    So large volume of data. So I worked in Matthews in 2013 and 2014, and they handled 100,000,000 records every day. It was for formatting I did. And then I worked on part five, I worked for a company called Future Today. And in PySpark, I developed an app price prediction system for them. And then I worked for another meeting, and that time I worked in the Scaler's part also. And I also worked at Future Today, where I worked on the Scaler's part. So, that is a type of bug. And another way is another Spark feature that is in machine learning. There is another thing called transfer learning. Like, you separate data into small charts and each small chart updates your model. So your old training is not neglected. Instead, all training is considered, and it updates your model. That is done through transfer learning, Deep learning – it is possible. And Bayesian models, like all models, are possible. So that can be used.

    So Google data, I will fetch that data if it's Big Query or database. Whatever it is, I will fetch it in the Google Sheet. This Google Sheet, whatever you can do in Excel, you can do in Google Sheets. And Google Data Studio you can build beautiful plots also for that and you can change the parameter and the plot will be damaged. You can change so that can be done and Google Data Studio gives you a Google dataset. I work with Google BigQuery database. And other databases, like Google Cloud Storage, I work with Bigtable. And Google has Google DoubleClick, I worked on it for a small time period, but I worked on the analytics the same way. From there, we'll change all impression data for their whole day, do it in batch processing mode every day or release it in an AWS cluster. That is, I did it for the reversal and did analytics from the data, built a recommendation system which ad should we recommend it for which people so that we