profile-pic
Vetted Talent

Manisha

Vetted Talent

Dynamic and results-oriented Data Analyst with a proven track record of leveraging advanced analytics and machine learning to drive transformative business outcomes. I am eager to bring my data science and analytics expertise to your team, delivering actionable insights and driving innovation to propel company growth.

  • Role

    Data Scientist

  • Years of Experience

    5 years

  • Professional Portfolio

    View here

Skillsets

  • Data Analysis - 5 Years
  • MySQL - 5 Years
  • Data Manipulation - 3 Years
  • A/B testing - 1.5 Years
  • Deep Learning - 1.5 Years
  • CRM - 2 Years
  • Data Visualization - 4.5 Years
  • Deployment - 3 Years
  • Statistical analysis - 4 Years
  • SPSS - 1 Years

Vetted For

10Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Data ScientistAI Screening
  • 69%
    icon-arrow-down
  • Skills assessed :Exploratory data analysis, AWS (SageMaker), JAX, NumPy, PyTorch, Scikit-learn, TensorFlow, AWS, Git, Python
  • Score: 62/90

Professional Summary

5Years
  • Sep, 2022 - Apr, 20241 yr 7 months

    Data Analysts

    ISDC Global
  • Dec, 2020 - Sep, 20221 yr 9 months

    Academic Mentor

    Extramarks Education PVT LTD
  • Aug, 2018 - Aug, 20202 yr

    Operational Associate

    Brilliance Academy

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    MySQL

  • icon-tool

    Tableau CRM

  • icon-tool

    Microsoft Excel

  • icon-tool

    SAS

  • icon-tool

    Git

  • icon-tool

    Jira

  • icon-tool

    Visual Studio Code

  • icon-tool

    SPSS

  • icon-tool

    Pandas

  • icon-tool

    BeautifulSoup

  • icon-tool

    EDA

  • icon-tool

    Matplotlib

  • icon-tool

    Seaborn

  • icon-tool

    Plotly

  • icon-tool

    PowerBI

  • icon-tool

    Salesforce

  • icon-tool

    Zoho

  • icon-tool

    SQL

  • icon-tool

    Apache Hive

  • icon-tool

    Microsoft Teams

  • icon-tool

    Zoom

  • icon-tool

    Microsoft Power BI

  • icon-tool

    VS Code

  • icon-tool

    Canva

  • icon-tool

    Zoho people

  • icon-tool

    LinkedIn

  • icon-tool

    Midjourney

  • icon-tool

    Salesforce Einstein Analytics

  • icon-tool

    Kubeflow

Work History

5Years

Data Analysts

ISDC Global
Sep, 2022 - Apr, 20241 yr 7 months
    • Spearheaded the development of ML data products, driving a 49% increase in customer engagement through targeted product recommendations.
    • Led cross-functional teams to implement advanced analytics solutions, significantly improving operational efficiency and cost savings.
    • Applied deep learning and statistical analysis techniques to derive actionable insights, contributing to revenue growth and business optimization.
    • Maintained comprehensive documentation of experiments and results, facilitating seamless knowledge transfer and team collaboration.
    • Led customer segmentation and churn prediction initiatives, driving improved customer retention rates and overall satisfaction.

Academic Mentor

Extramarks Education PVT LTD
Dec, 2020 - Sep, 20221 yr 9 months
    • Optimized sales and marketing strategies through in-depth Salesforce data analysis, leading to a 30% improvement in lead conversion rates.
    • Developed custom dashboards and reports through Salesforce using Tableau, empowering stakeholders with actionable insights for informed decision-making.
    • Collaborated cross-functionally to deliver data-driven solutions that drove business growth and enhanced customer satisfaction.
    • Ensured data accuracy and integrity within the Salesforce platform, enabling data-driven decision-making at all levels of the organization.
    • Developed a product recommendation engine, driving increased sales through personalized recommendations based on individual preferences. Monitored algorithm performance and conducted A/B testing, optimizing recommendations and maximizing customer satisfaction.
    • Forecasted sales volumes and predicted demand fluctuations, enabling proactive inventory management and strategic pricing decisions. Integrated A/B testing into sales forecasting models, identifying optimal pricing. Strategies to drive revenue growth and market share.

Operational Associate

Brilliance Academy
Aug, 2018 - Aug, 20202 yr
    • Identified areas for process improvement through data analysis, resulting in a 12% increase in operational efficiency and cost savings.
    • Designed and maintained dashboards and visualizations, providing business stakeholders with actionable insights to drive strategic initiatives.
    • Transformed raw data into valuable insights, enabling informed decision-making and driving continuous improvement across operational processes

Achievements

  • Developed ML data products increasing customer engagement by 49%
  • Improved lead conversion rates by 30% through Salesforce data analysis
  • Facilitated operational efficiency by 12% through data-driven process improvements

Major Projects

5Projects

Macy's Inc CIM Simplification

Aug, 2023 - Jan, 2024 5 months
    • Orchestrated the migration of 120,783 data points to a cloud environment, achieving enhanced scalability and performance.
    • Implemented A/B testing methodologies to assess cloud architecture configurations, resulting in optimized infrastructure and cost efficiency.

    Tools: Google Cloud Platform (GCP), BigQuery, Google Cloud Storage (GCS), Cloud SQL, Airflow, Oracle, Data Definition Language (DDL).

Medical Appointment Data Analysis

ISDC Global
May, 2023 - Jun, 2023 1 month
    • Identified factors contributing to patient no-shows in medical appointments, enabling healthcare providers to improve patient care and resource utilization.
    • Presented actionable insights derived from A/B testing and statistical analysis, driving operational efficiency and cost savings for healthcare organizations.

Customer Satisfaction Analysis

ISDC Global
Nov, 2022 - May, 2023 6 months
    • Extracted customer satisfaction survey data from the company's database using SQL queries, focused on relevant metrics such as product/service ratings, feedback comments, and demographic information.
    • Cleaned and preprocessed the extracted data using SPSS to ensure data quality and consistency, handling missing values and outliers as necessary.
    • Utilized SPSS for statistical analysis to uncover patterns and relationships within the customer satisfaction data.
    • Generated descriptive statistics and visualizations, to illustrate key insights and trends in customer satisfaction levels.
    • Created interactive dashboards and reports using data visualization tool Power BI, to present findings clearly and intuitively for stakeholders.
    • Collaborate with cross-functional teams to develop strategies and initiatives based on the insights gained from the analysis, such as product/service improvements, customer engagement initiatives, and targeted marketing campaigns.

Student Performance Prediction System

Extramarks Education Pvt. Ltd
Aug, 2021 - Aug, 20221 yr

    Successfully developed a student performance prediction system that provides actionable insights to educators and administrators, enabling them to identify at-risk students early and implement targeted interventions to support their academic success.

    • Clean and preprocess large datasets containing student information using SQL and Python libraries like Pandas to ensure data quality and consistency.
    • Utilize machine learning techniques, including regression and clustering, to build predictive models that can identify patterns and trends in student performance data.
    • Implement A/B testing methodologies to assess the effectiveness of different model configurations and fine-tune algorithms for optimal performance.
    • Create detailed reports and interactive dashboards using Tableau to visualize key metrics and performance indicators for stakeholders.
    • Collaborate closely with product and engineering teams to integrate the predictive model into educational platforms and recommend actionable insights to improve student outcomes.

Learning Path Personalization

Extramarks Education Pvt. Ltd
May, 2022 - Aug, 2022 3 months
    • Collaborate with the salesforce and marketing teams to gather comprehensive student interaction data, including demographic information, learning preferences, and past engagement metrics.
    • Utilize SQL queries to extract relevant data from Salesforce databases and preprocess the data using Python libraries like Pandas to prepare it for analysis.
    • Analyze the collected data to identify patterns and trends in student behavior, gaining insights into their learning preferences and needs.
    • Develop collaborative machine learning algorithms using Python, focusing on techniques especially collaborative filtering and content-based filtering, to recommend personalized learning paths based on individual student profiles.
    • Collaborate with stakeholders and conduct direct meetings to understand their requirements and gather feedback on the initial prototype of the recommendation system.
    • Optimize the recommendation system based on feedback from stakeholders and ongoing monitoring of key performance indicators, such as recommendation accuracy and user satisfaction.

Education

  • DIploma in Data Science

    Imarticus (2020)
  • M.Sc Biotechnology

    Birla Institute of Technology, Mesra, Ranchi (2017)
  • B.Sc Botany(Hons)

    Banaras Hindu University, Varanasi, UP (2014)

Certifications

  • Certified machine learning specialist

  • Certified associate member of the institute of analytics, uk

    Institute of Analytics (Dec, 2022)
  • Datacamp certificates in sql, google data studio, python, r, tableau

  • Udemy certificates in advanced excel, a to z of excel, python, forecasting, r, big data

Interests

  • Cooking
  • Learning
  • Travelling
  • AI-interview Questions & Answers

    Yes. Okay. So, uh, yes. Hi. Uh, and a very good morning oh, sorry. Good evening to all of you. This is Manisha, and, uh, I am, uh, currently based out of program. So, um, I'm currently working with ISCC Global, um, on the role of data analyst, and, uh, I am, uh, basically here dealing with the operational team where we are trying to provide, uh, like, database solution to the clients based on their needs and requirement. Apart from that, we are also, um, involved, uh, into the training segment, uh, to the working professionals, uh, grad and postgrad students, uh, on the basis of data analytics and business analytics tool. And, uh, right now, uh, because my organization was, uh, UK based organization, so they have closed this one segment of India, uh, this operational segment of India as of now. And, uh, so here I am looking for some another role. And, uh, prior to this, I was working with Extra Marks Education. So Extra Marks Education was an EdTech organization where my profile was academic mentor. I was handling complete, uh, the sales and marketing operations where I was optimizing their data to enhance, uh, the sales. And, also, we were recommending, uh, some, uh, customized product, like, uh, product recommendation based on the, uh, specific needs and the requirement of the customer. Apart from that, I was handling a team, uh, of the mentors where I was, uh, like, altogether, there were 10 members in my team where I was, uh, representing their complete like, weekly progress report and, uh, the weekly reporting system, what how many calls they have done, what are the status, what is the update, and all these analysis we were doing on a regular basis for which we have used Python, uh, Tableau, and, uh, sometimes SQL also for retrieving most of the informations. Okay. Uh, prior to this, I was working with the Brilliance Academy, uh, on the profile of operation analyst where I was, uh, basically of these students where I was, uh, preparing a report. So, basically, I was, uh, creating the report, uh, reporting of the complete data that was related with the organization. I was also involved, um, in understanding and tracking down the employee's progress report, like, on the basis of their attendance, their basic details, their salary and, uh, update and everything. And, also, and the students that were involved with the organization, I was preparing their, uh, basic reports again on the basis of the attendance and the, uh, weekly test, the assignments, the classes they are doing, and all all these things. And we were representing that in the form of dashboard and our vulnerable reports, like, customized report, uh, to the parents. Overall, I would say with ISDC, the optimization in the report, the very recent report that we've represented has, uh, definitely given me a winch of achieving 49% success rate, uh, followed by the extra marks where I have tried, uh, achieving all approximately 40 to 50% of, uh, success rate and similarly in my previous role with Brilliance Academy too.

    What is my approach to use Git for version control? Now, this is something I need to explain. Okay. So talking about Git, so when I am using Git for the version control, my approach actually followed a structured workflow to track challenges and also collaborated with the different team members to make sure that the project's integrity and the product, like product streamline and the product is completely streamlined. Basic steps that I followed for this particular, for this particular part for processing my script, that data process, like data script, I first initialize the repository. So I started by using the Git initiator, that INIT, okay. Then I have used Git add functions to make any changes or anything I want to include or I want to update. Then I commit the changes in this particular Git by using Git commit functions and created the branching by providing them a specific name, like branch name and everything. And after all these things, we have created a merge and rebase by using the Git merge function. Okay. So that was obviously needed and you can even go through my GitHub profile link to see all these things. So I have not updated a few works on it, but very recently I have done one project. So all these steps I have followed thoroughly and properly, that is clearly visible and you can definitely see it. After that, we have used a Git push for the collaboration process and then complete a review and the feedback was going on on a continuous basis. And then the final documentation has also been done by including the readme files. So this is all about the Git version controlling that I have used in processing my data scripts.

    Okay. So, um, how would you approach the fine tuning in a pretrained model on a new dataset using How would you approach fine tuning a pretrained model on a new dataset using the transfer learning technique with Keras? Okay. This is difficult to answer. Although I have not come across this particular step yet in my complete, uh, career journey. But to answer this question, I can say that, uh, the first basic step that I basically followed, uh, for this particular technique and this particular method, the fine tuning approach is selecting the pretrained model first. So, uh, first step is, uh, like, we can, uh, select or we can choose the pretrained model, uh, which is very much suitable for the task, that can be, uh, done on the large dataset. And, uh, by using, uh, some techniques like mobile net or inceptions, this is what I remember right now, then we can, uh, proceed with the next step by loading these pretrained model, uh, using the Keras. Okay? Uh, so we can actually upload its architecture and its weight using the Keras, And then we can freeze the layers, the different layers of the pre trained model. Uh, apart from that, uh, after, uh, that, we can add different layers or we can just modify it. We can remove any layer as per the requirement, and we can definitely proceed all these thing on the basis of the different, uh, units, uh, or variable that we are using for this particular task like binary classifications or multiclass, uh, classification like this. And then we can compile the model and, uh, then perform the, uh, complete, um, augmentation on the new datasets to understand the, uh, the training images or to understand, uh, the complete analysis of the report. Then finally, I think we can train the model and, uh, evaluate its performance by adjusting the parameters.

    Okay. So explain how checkpointing works in training deep learning model with PyTorch and when when it becomes crucial. Okay. So, um, checkpointing, uh, is, uh, basically a technique in PyTorch that is, uh, used to manage the memory usage during the training process. And, uh, particularly when we are dealing with the large model. Okay. So this process completely involves periodically saving, uh, intermediate stats of the model during the training process and the reloading process if needed in future. Um, how this basically works. So we during the training process, all the predefined intervals that we have used, uh, for the model, uh, or the total number of iterations that we that are, uh, that we have used or the current state of the model that is present, including all the different level of parameters is saved to the data disc. Okay. So, uh, it means whatever we are doing, whatever changes we are doing, or whatever changes we have done will be saved to the data desk. Then it will after saving this particular, uh, state, after saving all these information about the model, the memory will occupy the model and the optimizer, uh, can be released. So this will, uh, basically help in preventing out of memory errors. And, also, it will train larger models on GPUs with the, uh, limited models. This is what it can do. And after releasing, uh, memory, uh, like, training can continue. Like, training part can continue from the safe checkpoints, and the safe state is reload again into the model and training resumes from the last set iterations. Again, why it is important? Because memory. This checkpointing is very crucial because memory constant are a big concern, uh, especially when we are training the large model or when we are dealing with the large dataset that, uh, sometimes it happens. We run out of memory. Uh, so we need to keep on checking. We need to keep on understanding all these things. 2nd, sometimes the training sessions takes longer or time, longer duration, so we are sometimes at a list risk of losing some informations To prevent that step, to prevent those part, those sections, we need this checkpointing.

    Highlight the differences between the TensorFlow with TensorBoard with TensorFlow and wisdom with Itauch for model visualization and training. We still did PyTorch for model visualization. Difficult question. Definitely, it is. Okay. So talking about TensorBoard. 1st, so TensorBoard is basically visualizing machine learning models and also to keep monitor all the metrics metrics that, uh, are being used during the trading process. 2nd, uh, wisdom is also very much similar to the visualization library, but, um, but they are, uh, designed specifically to use with PyTorch. This is the basic difference I, uh, can remember right now. Now talking about the integration part, so TensorBoard is very much, uh, integrated with TensorFlow. And, uh, Wisdom is, uh, kind of stand alone library, uh, for visualization and is not integrated with PyTorch. Then TensorBoard, uh, basically provides user friendly interface with a wide range of visualization, whereas Wisdom basically offers a simple and flexible interface for creating the different visualizations. But, yes, sometimes it does require manual configurations. Uh, talking about the compatibility of, uh, wisdom and TensorBoard. So TensorBoard is, like, as we know, is primarily integrated with the help of TensorFlow, but it can also be used with other frameworks. Okay. Right now, I do not remember the name of those other different frameworks, but, yes, it can it can be used with other frameworks. Whereas, wisdom is a kind of framework that can be used with any, uh, DL network or DL sorry. DL framework that is deep learning framework, uh, where PyTorch also, uh, TensorFlow also, and all these thing. So these are the basic differences that I remember right now.

    To test and evaluate the robust robustness of a machine learning model developed using Scikit. Okay. So what would you what would be your methodology to test and evaluate the robustness of the machine learning, the machine learning model developed using SQLone against data drift. So to test and evaluate the robustness of a machine learning model developed using sklearn against Data Drift. I, uh, can follow a basic step. Like, first, I can define, um, what is essential and, uh, what basically the data drift section. So we I can definite I can clearly define, uh, what basically the data drift constitute, like, what exactly it has and what context, uh, we are basically trying to, uh, uh, like, understand the problem. Then, uh, we'll connect, uh, we'll collect the baseline, uh, data where, uh, it is going to represent a sample of the initial training, uh, the data that was developed, uh, and, uh, during the machine learning model. Then we are going to establish monitoring mechanism. Like, we are going to implement a monitoring mechanism that will keep on tracking the data the incoming data, the the data that is, uh, being uploaded, uh, that will keep on tracking the incoming data and, uh, can detect any deviations from the baseline distributions. And this may also involve statistical methods such as, uh, like, feature distributions or monitoring, uh, like, concept. And then, uh, after that, we can set a threshold or, uh, for basically, to accept the drift in the given data distribution. And these threshold can be easily determined based on the given domain knowledge or even the historical dataset. And, of course, because, uh, we have a continuous flow of the dataset, so we need to monitor we need to continuously monitor the, like, the new data or the continuously new newly, uh, like, uh, of, what, uh, I can say, incoming data. So this will, uh, help us in comparing the feature distributions. And if any kind of errors or drift exceeded or any kind of information that it detects, it is going to predefine the threshold value and the trigger value, uh, alerts the, uh, relevant, uh, stakeholders. Okay. And then so on, we will develop, uh, deploy, and then evaluate the model performance.

    Examine the Python code intended to deploy a machine learning model, uh, with Docker. Identify the issue in the Docker file related to best practice in constructing Docker images for the Python applications. So from Python 3.8 slim copy app, work directory app run PIP install no cache directory requirement dot text expose 80. CMD Python app dot pi. Okay. So the information that has been provided here, there are few issues in the provided Dockerfile, uh, that is, obviously related with the best practices. Um, the first thing, the PIP install command basically is installing dependency directory, uh, sorry, directly from the requirement dot text that is clearly visible here without specifying what version it will be, uh, working and what version it recommend. Okay. So second, uh, there is, um, the use of no cache that we can clearly see no cache directory, uh, in the PIP install. So in this particular code line, in this particular command can potentially introduce security risk, obviously, because this is, uh, not correct and, uh, by by, uh, bypassing the, uh, caching mechanism. So that is obviously going to help improve, uh, the performances, but, um, is useless. So second, we can see the we are exporting the port value as 80 without providing any context or any explanation, which is obviously not sufficient here specifically. And, uh, uh, also, um, this is a very good practice to include comments and documentations explaining these specific ports, uh, why are why are we exposing that particular specific ports, and why do we need it, why why are we are going to use it. 3rd, uh, the last line that shows CMD functions, that is command functions. So, uh, while writing this complete CMD to specify the command to run, the application is acceptable. Uh, but using entry point might be more appropriate in some cases. So, yes, this is what we have observed here.

    Given Python code block, uh, which is designed to test a machine learning model accuracy. What would you change to follow best coding practice regarding the variable names and readability. Okay. So there is this image. Import SK learn. Not metrics as metrics. Define test table. Model x test. Y test. Y prediction. Accuracy rate. Okay. Return. ACC. Model accuracy. Where it is defined. Okay. Accuracy, it's defined. Test model, train model, test model, test label. There are few changes that I can do in this particular code. The first thing is I can just rename the variable, like, ACC to accuracy for more clarity. If someone is looking at the code for the first time, he or she should understand this part. Then, uh, instead of model, uh, ACC, okay, I can replace that part as model accuracy to provide a more descriptive name of the variable, uh, again, just to clarify the accurate result and storing the keyword information. 3rd, we can add a docstring to the test model. This is one thing that we can do, uh, to provide the complete documentation about what exactly it is, uh, um, this complete command is about, this complete syntax is about, its purpose, and then different type of arguments, what is the return value following, um, the complete code readability and maintainability and also defining each and every functions.

    When designing a machine learning system involving complex event. Processing. How about you factor and check efficiency for high performance computation? Jack's efficiency. I do not properly remember this particular answer, so I don't think so I would be able to answer this part. I just know this thing that jacks is basically a built in, uh, function in the programming, and they are used for complete performing different level of functions. And, also, it can it is immutable. Like, once it is defined, it cannot be rechanged. Okay. And, also, it can compile and optimize computation, um, like, computation very, uh, faster. Its execution is, uh, faster and also gets better scalability.

    Okay. So how can These questions are so tough. K. So I remember the answer, but it's difficult for me to analyze and put them together to form a proper answer. Okay. So how can containerizations with Docker enhances the deployment process of machine learning models developed using Python library. So the one thing that I can okay. So the one thing that I can answer here is that Docker contains encapsulated, um, containers are encapsulated, uh, um, the entire time, uh, so sorry. The entire runtime environments. That basically includes Python libraries, dependencies, system configurations, and so on. Okay? So there, uh, because of all these thing, the risk of deployment minimizes the error, uh, due to the different, uh, environmental discrepancies. Okay. 2nd, it also provides isolations that allow machine learning models to run-in isolated environments. I think, this much. And the one factor that I know, yes, scalability is good. Version control is available. There are dependency management. Like, uh, it basically eliminates the any kind of need to install any dependency manually. Okay. This is one thing that can be done with the Docker. Then it can, uh, the last thing is that the Docker, uh, containers are lightweight and consume minimal system resources. So that basically makes them very much efficient for deploying the machine learning model in resources and, uh, or any, um, other cloud environments. This is one thing that can be done. 2nd, it integrates, like, seamlessly with, uh, tools like DevOps and, uh, also practices that enable automations of the complete deployment process.