Vetted Talent

Manisha

Vetted Talent

Dynamic and results-oriented Data Analyst with a proven track record of leveraging advanced analytics and machine learning to drive transformative business outcomes. I am eager to bring my data science and analytics expertise to your team, delivering actionable insights and driving innovation to propel company growth.

Role
Freelance Data Science & GenAI Projects
Years of Experience
5 years
Professional Portfolio
View here

Skillsets

Machine Learning
Tableau
SQL
Salesforce
ROAS optimization
Recommendation Systems
R
Python
Predictive Analytics
Power BI
OpenAI
NLP
Marketing Mix Modeling
A/B testing - 1.5 Years
Looker
LangChain
Generative AI
Geminiai
GCP
Data Analytics
Business Analysis
BigQuery
Azure
Airflow
CRM - 2 Years

Vetted For

10Skills

Roles & Skills
Results
Details

Data ScientistAI Screening
69%

Skills assessed :Exploratory data analysis, AWS (SageMaker), JAX, NumPy, PyTorch, Scikit-learn, TensorFlow, AWS, Git, Python
Score: 62/90

Professional Summary

5Years

Apr, 2025 - Present 6 months
Freelance Data Science & GenAI Projects
Freelance Data Science & GenAI Projects
Jun, 2024 - Feb, 2025 8 months
Data Science-Account Manager
Paragon Dentsu India
Sep, 2022 - Apr, 20241 yr 7 months
Data Scientist
ISDC Global
Dec, 2020 - Sep, 20221 yr 9 months
Academic Mentor
Extramarks Education

Applications & Tools Known

Python
MySQL
Tableau CRM
Microsoft Excel
SAS
Git
Jira
Visual Studio Code
SPSS
Pandas
BeautifulSoup
EDA
Matplotlib
Seaborn
Plotly
PowerBI
Salesforce
Zoho
SQL
Apache Hive
Microsoft Teams
Zoom
Microsoft Power BI
VS Code
Canva
Zoho people
LinkedIn
Midjourney
Salesforce Einstein Analytics
Kubeflow

Work History

5Years

Freelance Data Science & GenAI Projects

Apr, 2025 - Present 6 months

Delivered corporate and professional training sessions on Python, SQL, Databricks, Machine Learning, and BI tools to working professionals across industries. Conducted GenAI workshops and masterclasses for 150+ participants, covering LLMs, prompt engineering, chatbot development, and real-world use cases. Designed and developed training content, case studies, and project guides for upskilling programs, ensuring high engagement and industry relevance. Mentored early-career and mid-level professionals on live projects involving GenAI, recommendation systems, and image recognition. Consulted on freelance ROAS optimization projects, helping businesses improve marketing ROI using MMM techniques and performance dashboards. Built GenAI-powered chatbots for mental health and career support, using Streamlit and Gradio for seamless user interaction. Developed summarization and content-generation tools tailored for educational and e-commerce clients, increasing content production speed by 60%.

Data Science-Account Manager

Paragon Dentsu India

Jun, 2024 - Feb, 2025 8 months

Automated Google Trends reporting using PyTrends, Airflow, and BigQuery for brand monitoring. Led Marketing Mix Modeling (MMM) to assess media impact and optimize ROAS for key clients. Achieved a 30% increase in sales for Habitaclia, Clicars, and Fotocasa by providing data-backed campaign strategies. Developed ROI-focused KPI dashboards (Power BI) to support GRP-based planning and improve marketing decision cycles. Improved campaign effectiveness by 20% via MMM UI enhancements and real-time performance monitoring. Enabled Pandoras record-high Mother's Day sales through precision targeting and ROAS forecasting.

Data Scientist

ISDC Global

Sep, 2022 - Apr, 20241 yr 7 months

Built ML-driven recommendation systems, boosting user engagement by 35%. Performed churn prediction and segmentation analysis using SQL and Python. Applied NLP on customer feedback to enhance satisfaction scores by 15%. Migrated over 120K+ data points to GCP and optimized architecture using A/B testing.

Academic Mentor

Extramarks Education

Dec, 2020 - Sep, 20221 yr 9 months

Supported sales & marketing teams by developing Python-based marketing analytics dashboards, improving lead conversion by 20%. Designed predictive recommendation systems, driving personalized sales growth. Optimized SQL database performance, reducing query time by 30%, resulting in faster decision-making.

Achievements

Developed ML data products increasing customer engagement by 49%
Improved lead conversion rates by 30% through Salesforce data analysis
Facilitated operational efficiency by 12% through data-driven process improvements

Major Projects

5Projects

Macy's Inc CIM Simplification

Aug, 2023 - Jan, 2024 5 months

Orchestrated the migration of 120,783 data points to a cloud environment, achieving enhanced scalability and performance.
Implemented A/B testing methodologies to assess cloud architecture configurations, resulting in optimized infrastructure and cost efficiency.

Tools: Google Cloud Platform (GCP), BigQuery, Google Cloud Storage (GCS), Cloud SQL, Airflow, Oracle, Data Definition Language (DDL).

Medical Appointment Data Analysis

ISDC Global

May, 2023 - Jun, 2023 1 month

Identified factors contributing to patient no-shows in medical appointments, enabling healthcare providers to improve patient care and resource utilization.
Presented actionable insights derived from A/B testing and statistical analysis, driving operational efficiency and cost savings for healthcare organizations.

Customer Satisfaction Analysis

ISDC Global

Nov, 2022 - May, 2023 6 months

Extracted customer satisfaction survey data from the company's database using SQL queries, focused on relevant metrics such as product/service ratings, feedback comments, and demographic information.
Cleaned and preprocessed the extracted data using SPSS to ensure data quality and consistency, handling missing values and outliers as necessary.
Utilized SPSS for statistical analysis to uncover patterns and relationships within the customer satisfaction data.
Generated descriptive statistics and visualizations, to illustrate key insights and trends in customer satisfaction levels.
Created interactive dashboards and reports using data visualization tool Power BI, to present findings clearly and intuitively for stakeholders.
Collaborate with cross-functional teams to develop strategies and initiatives based on the insights gained from the analysis, such as product/service improvements, customer engagement initiatives, and targeted marketing campaigns.

Student Performance Prediction System

Extramarks Education Pvt. Ltd

Aug, 2021 - Aug, 20221 yr

Successfully developed a student performance prediction system that provides actionable insights to educators and administrators, enabling them to identify at-risk students early and implement targeted interventions to support their academic success.

Clean and preprocess large datasets containing student information using SQL and Python libraries like Pandas to ensure data quality and consistency.
Utilize machine learning techniques, including regression and clustering, to build predictive models that can identify patterns and trends in student performance data.
Implement A/B testing methodologies to assess the effectiveness of different model configurations and fine-tune algorithms for optimal performance.
Create detailed reports and interactive dashboards using Tableau to visualize key metrics and performance indicators for stakeholders.
Collaborate closely with product and engineering teams to integrate the predictive model into educational platforms and recommend actionable insights to improve student outcomes.

Learning Path Personalization

Extramarks Education Pvt. Ltd

May, 2022 - Aug, 2022 3 months

Collaborate with the salesforce and marketing teams to gather comprehensive student interaction data, including demographic information, learning preferences, and past engagement metrics.
Utilize SQL queries to extract relevant data from Salesforce databases and preprocess the data using Python libraries like Pandas to prepare it for analysis.
Analyze the collected data to identify patterns and trends in student behavior, gaining insights into their learning preferences and needs.
Develop collaborative machine learning algorithms using Python, focusing on techniques especially collaborative filtering and content-based filtering, to recommend personalized learning paths based on individual student profiles.
Collaborate with stakeholders and conduct direct meetings to understand their requirements and gather feedback on the initial prototype of the recommendation system.
Optimize the recommendation system based on feedback from stakeholders and ongoing monitoring of key performance indicators, such as recommendation accuracy and user satisfaction.

Education

E.M.Tech in Cloud Computing
IIT Patna (2027)
PG Diploma in Data Science
Imarticus Learning (2018)
M.Sc. Biotechnology
Birla Institute of Technology, Mesra (2017)
B.Sc. Botany (Hons)
Banaras Hindu University (2014)

Certifications

Certified machine learning specialist
Certified associate member of the institute of analytics, uk
Institute of Analytics (Dec, 2022)
Datacamp certificates in sql, google data studio, python, r, tableau
Udemy certificates in advanced excel, a to z of excel, python, forecasting, r, big data

Interests

Cooking

Learning

Travelling

AI-interview Questions & Answers

Yes. Okay. So, uh, yes. Hi. Uh, and a very good morning oh, sorry. Good evening to all of you. This is Manisha, and, uh, I am, uh, currently based out of program. So, um, I'm currently working with ISCC Global, um, on the role of data analyst, and, uh, I am, uh, basically here dealing with the operational team where we are trying to provide, uh, like, database solution to the clients based on their needs and requirement. Apart from that, we are also, um, involved, uh, into the training segment, uh, to the working professionals, uh, grad and postgrad students, uh, on the basis of data analytics and business analytics tool. And, uh, right now, uh, because my organization was, uh, UK based organization, so they have closed this one segment of India, uh, this operational segment of India as of now. And, uh, so here I am looking for some another role. And, uh, prior to this, I was working with Extra Marks Education. So Extra Marks Education was an EdTech organization where my profile was academic mentor. I was handling complete, uh, the sales and marketing operations where I was optimizing their data to enhance, uh, the sales. And, also, we were recommending, uh, some, uh, customized product, like, uh, product recommendation based on the, uh, specific needs and the requirement of the customer. Apart from that, I was handling a team, uh, of the mentors where I was, uh, like, altogether, there were 10 members in my team where I was, uh, representing their complete like, weekly progress report and, uh, the weekly reporting system, what how many calls they have done, what are the status, what is the update, and all these analysis we were doing on a regular basis for which we have used Python, uh, Tableau, and, uh, sometimes SQL also for retrieving most of the informations. Okay. Uh, prior to this, I was working with the Brilliance Academy, uh, on the profile of operation analyst where I was, uh, basically of these students where I was, uh, preparing a report. So, basically, I was, uh, creating the report, uh, reporting of the complete data that was related with the organization. I was also involved, um, in understanding and tracking down the employee's progress report, like, on the basis of their attendance, their basic details, their salary and, uh, update and everything. And, also, and the students that were involved with the organization, I was preparing their, uh, basic reports again on the basis of the attendance and the, uh, weekly test, the assignments, the classes they are doing, and all all these things. And we were representing that in the form of dashboard and our vulnerable reports, like, customized report, uh, to the parents. Overall, I would say with ISDC, the optimization in the report, the very recent report that we've represented has, uh, definitely given me a winch of achieving 49% success rate, uh, followed by the extra marks where I have tried, uh, achieving all approximately 40 to 50% of, uh, success rate and similarly in my previous role with Brilliance Academy too.

What is my approach to use Git for version control? Now, this is something I need to explain. Okay. So talking about Git, so when I am using Git for the version control, my approach actually followed a structured workflow to track challenges and also collaborated with the different team members to make sure that the project's integrity and the product, like product streamline and the product is completely streamlined. Basic steps that I followed for this particular, for this particular part for processing my script, that data process, like data script, I first initialize the repository. So I started by using the Git initiator, that INIT, okay. Then I have used Git add functions to make any changes or anything I want to include or I want to update. Then I commit the changes in this particular Git by using Git commit functions and created the branching by providing them a specific name, like branch name and everything. And after all these things, we have created a merge and rebase by using the Git merge function. Okay. So that was obviously needed and you can even go through my GitHub profile link to see all these things. So I have not updated a few works on it, but very recently I have done one project. So all these steps I have followed thoroughly and properly, that is clearly visible and you can definitely see it. After that, we have used a Git push for the collaboration process and then complete a review and the feedback was going on on a continuous basis. And then the final documentation has also been done by including the readme files. So this is all about the Git version controlling that I have used in processing my data scripts.

Okay. So, um, how would you approach the fine tuning in a pretrained model on a new dataset using How would you approach fine tuning a pretrained model on a new dataset using the transfer learning technique with Keras? Okay. This is difficult to answer. Although I have not come across this particular step yet in my complete, uh, career journey. But to answer this question, I can say that, uh, the first basic step that I basically followed, uh, for this particular technique and this particular method, the fine tuning approach is selecting the pretrained model first. So, uh, first step is, uh, like, we can, uh, select or we can choose the pretrained model, uh, which is very much suitable for the task, that can be, uh, done on the large dataset. And, uh, by using, uh, some techniques like mobile net or inceptions, this is what I remember right now, then we can, uh, proceed with the next step by loading these pretrained model, uh, using the Keras. Okay? Uh, so we can actually upload its architecture and its weight using the Keras, And then we can freeze the layers, the different layers of the pre trained model. Uh, apart from that, uh, after, uh, that, we can add different layers or we can just modify it. We can remove any layer as per the requirement, and we can definitely proceed all these thing on the basis of the different, uh, units, uh, or variable that we are using for this particular task like binary classifications or multiclass, uh, classification like this. And then we can compile the model and, uh, then perform the, uh, complete, um, augmentation on the new datasets to understand the, uh, the training images or to understand, uh, the complete analysis of the report. Then finally, I think we can train the model and, uh, evaluate its performance by adjusting the parameters.

Okay. So explain how checkpointing works in training deep learning model with PyTorch and when when it becomes crucial. Okay. So, um, checkpointing, uh, is, uh, basically a technique in PyTorch that is, uh, used to manage the memory usage during the training process. And, uh, particularly when we are dealing with the large model. Okay. So this process completely involves periodically saving, uh, intermediate stats of the model during the training process and the reloading process if needed in future. Um, how this basically works. So we during the training process, all the predefined intervals that we have used, uh, for the model, uh, or the total number of iterations that we that are, uh, that we have used or the current state of the model that is present, including all the different level of parameters is saved to the data disc. Okay. So, uh, it means whatever we are doing, whatever changes we are doing, or whatever changes we have done will be saved to the data desk. Then it will after saving this particular, uh, state, after saving all these information about the model, the memory will occupy the model and the optimizer, uh, can be released. So this will, uh, basically help in preventing out of memory errors. And, also, it will train larger models on GPUs with the, uh, limited models. This is what it can do. And after releasing, uh, memory, uh, like, training can continue. Like, training part can continue from the safe checkpoints, and the safe state is reload again into the model and training resumes from the last set iterations. Again, why it is important? Because memory. This checkpointing is very crucial because memory constant are a big concern, uh, especially when we are training the large model or when we are dealing with the large dataset that, uh, sometimes it happens. We run out of memory. Uh, so we need to keep on checking. We need to keep on understanding all these things. 2nd, sometimes the training sessions takes longer or time, longer duration, so we are sometimes at a list risk of losing some informations To prevent that step, to prevent those part, those sections, we need this checkpointing.

Highlight the differences between the TensorFlow with TensorBoard with TensorFlow and wisdom with Itauch for model visualization and training. We still did PyTorch for model visualization. Difficult question. Definitely, it is. Okay. So talking about TensorBoard. 1st, so TensorBoard is basically visualizing machine learning models and also to keep monitor all the metrics metrics that, uh, are being used during the trading process. 2nd, uh, wisdom is also very much similar to the visualization library, but, um, but they are, uh, designed specifically to use with PyTorch. This is the basic difference I, uh, can remember right now. Now talking about the integration part, so TensorBoard is very much, uh, integrated with TensorFlow. And, uh, Wisdom is, uh, kind of stand alone library, uh, for visualization and is not integrated with PyTorch. Then TensorBoard, uh, basically provides user friendly interface with a wide range of visualization, whereas Wisdom basically offers a simple and flexible interface for creating the different visualizations. But, yes, sometimes it does require manual configurations. Uh, talking about the compatibility of, uh, wisdom and TensorBoard. So TensorBoard is, like, as we know, is primarily integrated with the help of TensorFlow, but it can also be used with other frameworks. Okay. Right now, I do not remember the name of those other different frameworks, but, yes, it can it can be used with other frameworks. Whereas, wisdom is a kind of framework that can be used with any, uh, DL network or DL sorry. DL framework that is deep learning framework, uh, where PyTorch also, uh, TensorFlow also, and all these thing. So these are the basic differences that I remember right now.

To test and evaluate the robust robustness of a machine learning model developed using Scikit. Okay. So what would you what would be your methodology to test and evaluate the robustness of the machine learning, the machine learning model developed using SQLone against data drift. So to test and evaluate the robustness of a machine learning model developed using sklearn against Data Drift. I, uh, can follow a basic step. Like, first, I can define, um, what is essential and, uh, what basically the data drift section. So we I can definite I can clearly define, uh, what basically the data drift constitute, like, what exactly it has and what context, uh, we are basically trying to, uh, uh, like, understand the problem. Then, uh, we'll connect, uh, we'll collect the baseline, uh, data where, uh, it is going to represent a sample of the initial training, uh, the data that was developed, uh, and, uh, during the machine learning model. Then we are going to establish monitoring mechanism. Like, we are going to implement a monitoring mechanism that will keep on tracking the data the incoming data, the the data that is, uh, being uploaded, uh, that will keep on tracking the incoming data and, uh, can detect any deviations from the baseline distributions. And this may also involve statistical methods such as, uh, like, feature distributions or monitoring, uh, like, concept. And then, uh, after that, we can set a threshold or, uh, for basically, to accept the drift in the given data distribution. And these threshold can be easily determined based on the given domain knowledge or even the historical dataset. And, of course, because, uh, we have a continuous flow of the dataset, so we need to monitor we need to continuously monitor the, like, the new data or the continuously new newly, uh, like, uh, of, what, uh, I can say, incoming data. So this will, uh, help us in comparing the feature distributions. And if any kind of errors or drift exceeded or any kind of information that it detects, it is going to predefine the threshold value and the trigger value, uh, alerts the, uh, relevant, uh, stakeholders. Okay. And then so on, we will develop, uh, deploy, and then evaluate the model performance.

Examine the Python code intended to deploy a machine learning model, uh, with Docker. Identify the issue in the Docker file related to best practice in constructing Docker images for the Python applications. So from Python 3.8 slim copy app, work directory app run PIP install no cache directory requirement dot text expose 80. CMD Python app dot pi. Okay. So the information that has been provided here, there are few issues in the provided Dockerfile, uh, that is, obviously related with the best practices. Um, the first thing, the PIP install command basically is installing dependency directory, uh, sorry, directly from the requirement dot text that is clearly visible here without specifying what version it will be, uh, working and what version it recommend. Okay. So second, uh, there is, um, the use of no cache that we can clearly see no cache directory, uh, in the PIP install. So in this particular code line, in this particular command can potentially introduce security risk, obviously, because this is, uh, not correct and, uh, by by, uh, bypassing the, uh, caching mechanism. So that is obviously going to help improve, uh, the performances, but, um, is useless. So second, we can see the we are exporting the port value as 80 without providing any context or any explanation, which is obviously not sufficient here specifically. And, uh, uh, also, um, this is a very good practice to include comments and documentations explaining these specific ports, uh, why are why are we exposing that particular specific ports, and why do we need it, why why are we are going to use it. 3rd, uh, the last line that shows CMD functions, that is command functions. So, uh, while writing this complete CMD to specify the command to run, the application is acceptable. Uh, but using entry point might be more appropriate in some cases. So, yes, this is what we have observed here.

Given Python code block, uh, which is designed to test a machine learning model accuracy. What would you change to follow best coding practice regarding the variable names and readability. Okay. So there is this image. Import SK learn. Not metrics as metrics. Define test table. Model x test. Y test. Y prediction. Accuracy rate. Okay. Return. ACC. Model accuracy. Where it is defined. Okay. Accuracy, it's defined. Test model, train model, test model, test label. There are few changes that I can do in this particular code. The first thing is I can just rename the variable, like, ACC to accuracy for more clarity. If someone is looking at the code for the first time, he or she should understand this part. Then, uh, instead of model, uh, ACC, okay, I can replace that part as model accuracy to provide a more descriptive name of the variable, uh, again, just to clarify the accurate result and storing the keyword information. 3rd, we can add a docstring to the test model. This is one thing that we can do, uh, to provide the complete documentation about what exactly it is, uh, um, this complete command is about, this complete syntax is about, its purpose, and then different type of arguments, what is the return value following, um, the complete code readability and maintainability and also defining each and every functions.

When designing a machine learning system involving complex event. Processing. How about you factor and check efficiency for high performance computation? Jack's efficiency. I do not properly remember this particular answer, so I don't think so I would be able to answer this part. I just know this thing that jacks is basically a built in, uh, function in the programming, and they are used for complete performing different level of functions. And, also, it can it is immutable. Like, once it is defined, it cannot be rechanged. Okay. And, also, it can compile and optimize computation, um, like, computation very, uh, faster. Its execution is, uh, faster and also gets better scalability.

Okay. So how can These questions are so tough. K. So I remember the answer, but it's difficult for me to analyze and put them together to form a proper answer. Okay. So how can containerizations with Docker enhances the deployment process of machine learning models developed using Python library. So the one thing that I can okay. So the one thing that I can answer here is that Docker contains encapsulated, um, containers are encapsulated, uh, um, the entire time, uh, so sorry. The entire runtime environments. That basically includes Python libraries, dependencies, system configurations, and so on. Okay? So there, uh, because of all these thing, the risk of deployment minimizes the error, uh, due to the different, uh, environmental discrepancies. Okay. 2nd, it also provides isolations that allow machine learning models to run-in isolated environments. I think, this much. And the one factor that I know, yes, scalability is good. Version control is available. There are dependency management. Like, uh, it basically eliminates the any kind of need to install any dependency manually. Okay. This is one thing that can be done with the Docker. Then it can, uh, the last thing is that the Docker, uh, containers are lightweight and consume minimal system resources. So that basically makes them very much efficient for deploying the machine learning model in resources and, uh, or any, um, other cloud environments. This is one thing that can be done. 2nd, it integrates, like, seamlessly with, uh, tools like DevOps and, uh, also practices that enable automations of the complete deployment process.

Manisha

Freelance Data Science & GenAI Projects

5 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Freelance Data Science & GenAI Projects

Data Science-Account Manager

Data Scientist

Academic Mentor

Achievements

Major Projects

Macy's Inc CIM Simplification

Medical Appointment Data Analysis

Customer Satisfaction Analysis

Student Performance Prediction System

Learning Path Personalization

Education

E.M.Tech in Cloud Computing

PG Diploma in Data Science

M.Sc. Biotechnology

B.Sc. Botany (Hons)

Certifications

Certified machine learning specialist

Certified associate member of the institute of analytics, uk

Datacamp certificates in sql, google data studio, python, r, tableau

Udemy certificates in advanced excel, a to z of excel, python, forecasting, r, big data

Interests

AI-interview Questions & Answers