Vetted Talent

Bharath Shroff

Vetted Talent

Results-driven professional with 5+ years of experience in AI, data science, and software engineering, consistently leveraging cutting-edge technologies to drive innovation. Proven expertise in automating financial and data processes, building scalable solutions, and delivering actionable insights for global stakeholders. Skilled in AI/ML, Python, RAG, Next.js, and cloud platforms like Databricks and Azure. Adept at enhancing decision-making through advanced analytics, end-to-end application development, and agile methodologies, with a strong foundation in project management and client-focused solutions.

Role
Data Scientist
Years of Experience
6 years
Professional Portfolio
View here

Skillsets

REST API - 2 Years
React Js - 2 Years
react - 2 Years
Scala - 1 Years
React Js - 2 Years
Next Js - 1 Years
Next Js
Selenium - 2 Years
MLOps - 1 Years
LLMs - 1 Years
K-Means - 1 Years
Backend - 2 Years
Financial reports - 1 Years
Node Js - 1 Years
PowerBI - 2 Years
MySQL - 5 Years
Git - 4 Years
PowerBI - 2 Years
rag
Data engineering and manipulation
Tableau - 1 Years
Reporting - 3 Years
Relational Database - 5 Years
PyTorch - 1 Years
Python - 6 Years
SQL - 5 Years
PySpark - 5 Years
Cloud - 1 Years
Next Js - 1 Years
Databricks - 5 Years
Odoo
Big Data - 5 Years
Data Engineering - 5 Years
MLFlow - 1 Years
JavaScript - 4 Years
React Native - 1 Years
Databricks cloud
Finance - 1 Years
Restful APIs - 5 Years
LLM - 1 Years
AI - 3 Years
Data Engineer - 5 Years
Data warehouse - 5 Years
Azure - 2 Years
API - 3 Years

Vetted For

10Skills

Roles & Skills
Results
Details

Python Developer (AI/ML & Cloud Services) - RemoteAI Screening
66%

Skills assessed :GCP/Azure, Micro services, Django /Flask, Neo4j, Restful APIs, AWS, Docker, Kubernetes, machine_learning, Python
Score: 59/90

Professional Summary

6Years

Aug, 2024 - Present1 yr 10 months
Contract Data Scientist
MCSquared AI
Aug, 2024 - Oct, 2024 2 months
AI Innovation Specialist - Finance
Trilogy
May, 2022 - Jul, 20242 yr 2 months
Full Time Data Scientist
MCSquared AI
May, 2018 - Jul, 2018 2 months
RnD Intern
DELL EMC
Jun, 2019 - Jul, 20212 yr 1 month
Associate IT Consultant
ITC Infotech
Aug, 2021 - Apr, 2022 8 months
Full Stack Developer Volunteer
Isha Foundation
May, 2016 - Jul, 2016 2 months
RnD Intern
Computer Institute of Japan

Applications & Tools Known

Odoo
Apache
NumPy
WordPress
Palantir Foundry
Databricks
Azure Data Factory
Power BI
Next JS
LangChain
React Native
Git
DevOps
Selenium
PowerShell
Scala
Kaggle
Scrapy
SVM
Naive Bayes
Tkinter

Work History

6Years

Contract Data Scientist

MCSquared AI

Aug, 2024 - Present1 yr 10 months

Led the team to build a pipeline in Databricks feeding into a map view dashboard containing proximity hotspots of leads around business provided site locations leveraging Bing Maps API and 3rd party Real world data sources like Citeline, Health Verity, IQVIA.

AI Innovation Specialist - Finance

Trilogy

Aug, 2024 - Oct, 2024 2 months

Deriving Financial Insights using LLM chatbot built on React for the frontend and Express JS for the backend, which updated the RAG Vector DB upon new file uploads, reducing manual analysis time by an hour.

Full Time Data Scientist

MCSquared AI

May, 2022 - Jul, 20242 yr 2 months

Deployed Machine Learning Survival model to production replacing the previous XGBoost model on Databricks using the medallion architecture capable of self re-training every month with new data and auto archive or promote to production based on the champion model using MLFlow for model versioning and evaluating the model performance based on C-score.

Full Stack Developer Volunteer

Isha Foundation

Aug, 2021 - Apr, 2022 8 months

Developed a web application using the open-source Odoo Framework built on Python, streamlining processes and digitizing multiple forms required to be filled by hand by 100s of visitors saving hours of work both for the visitors and the staff.

Associate IT Consultant

ITC Infotech

Jun, 2019 - Jul, 20212 yr 1 month

Deployed end-to-end modules using Git DevOps for Continuous Deployment across the 4 stages (DEV->QA->UAT->PROD), ensuring seamless transitions and operational efficiency for MLOps.

RnD Intern

DELL EMC

May, 2018 - Jul, 2018 2 months

Developed Python scripts for automated reporting, flagging approximately 100 high-priority reports daily, enhancing efficiency in report management.

RnD Intern

Computer Institute of Japan

May, 2016 - Jul, 2016 2 months

Helped in improving the accuracy of multi-class Classification of emails and Achieved 70%+ accuracy.

Achievements

Football Secretary (IIT Hyderabad)
Inter IIT Football Captain
Participated in Table Tennis Inter-Departmental / Inter-Year Tournaments

Major Projects

7Projects

Melanoma Classification

Achieved 85% AUC score in Identifying Melanoma using Convolutional Neural Network (CNN) models.

Network traffic analysis ITC Infotech

Oct, 2020 - Oct, 2020

Extracting insights by transforming Apache access logs and visualizing through plots showing traffic originates from 10 different countries. Processed 6 million+ rows of server logs fetched from Open Source Apache Server Logs. Done as part of a training for PySpark.

Network traffic analysis

Oct, 2020 - Oct, 2020

Extracting insights by transforming 6 million + Apache server logs and visualizing through plots showing traffic originates from 10 different countries.

Machine Learning Library from scratch

Aug, 2020 - Aug, 2020

Implemented a few ML algorithms only using NumPy with the intention of developing a deep understanding of the Machine Learning algorithms. Regression 3 models, Classification 3 models, No use of any existing modules libraries apart from NumPy (math library). Also 9 Normalization algorithms for Data Standardization in an effort to understand them.

Image classification of fruits

May, 2020 - Jul, 2020 2 months

Multi Class Classification of Fruits using images, dataset used from Kaggle with 90380 annotated images. Leveraging Pretrained models like VGG, ResNet, AlexNet, Mobile Net for mobile deployable model.

Tic-tac-toe Extended 2player

Apr, 2019 - Apr, 2019

Implementation of an advanced version of the Tic-Tac-Toe game in python. 2 player as of now. Learnt about this game of 2 layered Tic-Tac-Toe from a friend where we used to play on the behind of our notebooks. Implemented as a side project during college, to be played manually by 2 people as of now, ambitious objective of using ML as a future scope.

IITH Main Website

Jan, 2019 - Mar, 2019 2 months

Built our college website from scratch using WordPress Templating which included integrating from over 10 departments.

Education

Bachelor of Technology in Mechanical Engineering
Indian Institute of Technology (2019)
Bachelor of Technology in Mechanical Engineering
Indian Institute of Technology (IIT) (2019)
Bachelor of Technology, Mechanical Engineering
Indian Institute of Technology (IIT) Hyderabad (2019)

Certifications

Certified azure data engineer associate (dp-200, 201) microsoft 2021
Certified azure data engineer associate (dp-200, 201) | microsoft | 2021
Microsoft certified azure data engineer associate (dp-200, 201)

AI-interview Questions & Answers

Hi, my name is Bharat Shroff and I'm from Bangalore, Karnataka. Starting my career as an associate IT consultant, where my responsibilities included those of a data engineering role, I worked with two clients. In the first client, I helped them build an Azure data factory, in which we orchestrated a pipeline, an event-driven pipeline, which every day would upload a file, triggering a pipeline of notebooks that would take the data from the raw, apply transformations, generate analytics, and push that to Power BI and Synapse Analytics, which would then be consumed by further stakeholders. In the second one, it was majorly on Azure Databricks, creating a similar data pipeline. Then, after that, I worked at Isha Foundation for a considerable amount of time, where I basically helped them build or built the website that helped digitize their process. It was a very manual process where every time a person came to the Isha Yoga Center, they had to fill a handwritten form, which used to take hours of work from the team and participants as well. So we created a digital profile, storing all that information, and integrating different aspects of different activities like accommodation or other programs by integrating those APIs and building a common website where the user or visitor could come and just book through that. For this, I used Python and Udo. Udo is an open-source framework, so I got exposed to a lot of full-stack development, where I developed both the backend and the frontend. Then, coming back to MC squared, I switched to MC squared, where I worked as a data scientist. There, I also worked with two clients. The first client had their own data platform, called Palantir, where I basically worked on preparing visualizations, which is essentially a POC on visualizations that stakeholders would be interested in. That did involve some health checks on the data monitoring, data drift monitoring, and all this kind of KPIs. In the second client I worked with, it was basically again on Azure Databricks, but this had a process of identifying data vendors from which we could buy data, and using the client's proprietary data to do analysis, competent analysis, and other analysis that would help grow their business essentially. In my latest project, the current project I'm working on, it's on an LLM, where we've built an agent that you can ask questions, and which will create SQL queries and fetch data from the required database. So, yeah, it's been a good journey with very varied experiences and tech stacks. Thank you.

I instrument and improve the reliability of a distributed task by using AWS Step Functions, which is the equivalent of Azure Data Factory in AWS. This service helps orchestrate pipelines, and I can use AWS SageMaker to automate the machine learning and data processing logic within notebooks written in AWS Glue. These notebooks contain the actual Python code.

So Redis cache is one of the industry-leading standards here, and that would help us drastically optimize the performance of any cloud platform by storing or even edge caching, which would store certain relevant data on the edge devices with near real-time retrieval speed. And if the AI model itself is small enough to be hosted on the edge device, then the latency between the server load and the latency between each query that comes back to the server, which the server uses the AI model to generate the response and serves it back, would be greatly reduced by hosting and minimizing the AI model size so that it can be hosted on an edge device.

When designing a low-latency API, which serves machine learning predictions, or at least from a user interface user experience perspective, it is important. And it's very important that the perceived time, or the time delay, is definitely shown to improve user experience. So, as we start getting responses, just start showing each of the words. And then ultimately, once the whole response is generated, then format, I think that's what major UIs and other low-latency systems do. Using vector databases definitely helps speed up the process.

Now, we'll destructure a Python code base, keeping solid principles in mind. So, an ML project, it's important to accommodate flexibility in data and the flexibility in training a model and retraining with updates to the data. So, it's very important to accommodate that. Based on what I've used is the database architecture of bronze, silver, and gold layers, where the bronze layer contains the raw data, the silver layer contains feature engineering or feature extraction, and basically all the features we want to feed into a machine learning model. Then, the gold layer has the data that's filtered and just before it goes into the machine learning model. And in the gold layer, that's where the predictions are created. And then, beyond that, we obviously would want a retraining process, which would utilize a sense of what with MLflow – I'd be maybe a bit biased about that – but any other Apache Airflow or similar strategies would work, where we retrain the model on new data using a champion model comparison, whether based on certain metrics relevant to the particular use case. We would either archive the previous model or continue with the champion model based on which one is performing better. So, all these would help build a self-sustaining pipeline, which would maintain the data as well as the quality of predictions, and the accuracy would improve because the more data an ML model has, the better the accuracy.

What strategy would you employ to optimise a Python application's interaction with S3 is one of the major computationally intensive operations or which can handle the computation and not block or cause any blockages which is essential for user experience so that all these S3 buckets by default they have parallel access so use multiprocessing or multithreading also would work so that in the Python application itself so that the Python application is leveraging multithreading and accessing for each user or even not even for each user for each prediction it uses a different thread so that and that thread can independently and in parallel access the S3 buckets so that because by default Python applications are sequential and by helping to parallelise that would significantly improve or optimise how S3 natively supports parallel accesses reads and writes so yeah.

I have worked with SQL majorly. So I don't know about graph. But this query with a question mark property question mark value and it's not a valid SQL query at least. This backslash quote doesn't make sense. It's not correct Python syntax. So we don't need that backslash, just three quotes would do. And the query itself, I don't know if we should be using commas and the where condition it doesn't have and what should be the condition exactly. So this query doesn't look right to me.

Neo4j is basically a graph-based database framework, so based on any use case which involves maintaining relationships, these kind of node or graph kind of representation like a social media network where you have friends who are friends of friends and so on. This is how a graph, a node is connected to another node, so your friend is connected to another friend. This setup is ideal for these kind of scenarios, and the machine learning in this case inherently knows about these relationships. It would try to leverage similar nodes not only by the individual node attributes but using the relationships as well, which would help the machine learning model learn about these things. This is instead of the usual table structure, which would require additional training to integrate the relationship aspect. Explaining how one row is related to another row wouldn't be something straightforward to teach an ML model using a tabular or a columnar structure.

That can enhance ML prediction capabilities for a system designed in this strategy. Neo4j, like I said before, is a graph-based database. So building a knowledge graph or implementing a knowledge graph would be very straightforward, and leveraging this for machine learning predictions, I mean, assuming it is a use case which is very suitable for a graph, Neo4j natively supports nodes, relationships, and this would be easily captured by the machine learning model which would help train or implement a knowledge graph, and the machine learning model can immediately learn about how the knowledge graph is structured.

So Skykit, the project I worked on initially involved using XGBoost on Skykit Lore, but based on the use case, a survival model was a much better fit. There is another library by Skykit called Skykit Survival, which we implemented to tailor fit our use case, which just made sense instead of using traditional machine learning algorithms, which are majorly good for classification kind of problems or, of course, regression.

FastAPI, since I've worked with FastAPI, it natively supports asynchronous programming, although there is a little tricky part where if you specify an async function, it actually becomes a sequential function, which I think was a major topic of confusion, not debate, which was clarified in a PyCon – I believe in Ireland – where the speaker clarified how to exactly use this for asynchronous purposes. So basically, you just define the function as is, without manually specifying async, and because FastAPI natively supports async, it will automatically run the functions in an asynchronous manner. It's essential to keep any API asynchronous to prevent one user's query from blocking another user's query and to optimize server load and compute, reducing idle time for the CPU.

Bharath Shroff

Data Scientist

6 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Contract Data Scientist

AI Innovation Specialist - Finance

Full Time Data Scientist

Full Stack Developer Volunteer

Associate IT Consultant

RnD Intern

RnD Intern

Achievements

Major Projects

Melanoma Classification

Network traffic analysis ITC Infotech

Network traffic analysis

Machine Learning Library from scratch

Image classification of fruits

Tic-tac-toe Extended 2player

IITH Main Website

Education

Bachelor of Technology in Mechanical Engineering

Bachelor of Technology in Mechanical Engineering

Bachelor of Technology, Mechanical Engineering

Certifications

Certified azure data engineer associate (dp-200, 201) microsoft 2021

Certified azure data engineer associate (dp-200, 201) | microsoft | 2021

Microsoft certified azure data engineer associate (dp-200, 201)

AI-interview Questions & Answers