profile-pic
Vetted Talent

Sachin Mishra

Vetted Talent
Experienced Data Scientist and Mentor with strong background in Machine Learning, NLP, and Computer Vision. Possessing over 2.5 years of hands-on expertise in developing and implementing cutting-edge solutions, I have successfully led team of Junior Data Scientists and Analysts, providing guidance and mentorship to drive exceptional results. With proven track record of leveraging data-driven insights to solve complex problems, I bring unique combination of technical expertise and leadership skills to create impactful solutions. Seeking opportunities to contribute my skills and knowledge in dynamic and challenging environment.
  • Role

    Data Scientist

  • Years of Experience

    3 years

  • Professional Portfolio

    View here

Skillsets

  • GCP
  • automation
  • AI
  • Streamlit
  • Flask
  • On
  • Github
  • SAP
  • Azure
  • APIS
  • MLOps
  • LinkedIn
  • Cloud
  • Troubleshooting
  • Tableau
  • Leadership
  • CI/CD
  • UI
  • Windows
  • Random Forest
  • Database management
  • Matplotlib
  • Git
  • Training
  • Docker
  • Python - 3 Years
  • Database
  • Statistics
  • NLP
  • Mongo DB
  • Deep Learning
  • Scrum
  • ML
  • R
  • AWS
  • Python Programming
  • Python - 3 Years
  • C
  • Communication
  • API
  • PowerBI
  • Code Review
  • Computer Vision
  • SQL
  • MySQL
  • Agile
  • FastAPI

Vetted For

12Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Data Scientist (Remote)AI Screening
  • 71%
    icon-arrow-down
  • Skills assessed :Communication Skills, Jira, Retrieval-Augmented Generation, Computer Vision, Deep Learning, PyTorch, TensorFlow, GitLab, machine_learning, NLP, NO SQL, Python
  • Score: 64/90

Professional Summary

3Years
  • Jul, 2022 - Present3 yr 2 months

    Data Scientist

    Data Society
  • Jun, 2022 - Present3 yr 3 months

    Data Analyst/Scientist Mentor

    Digikull
  • Feb, 2022 - Oct, 2022 8 months

    Data Science Intern

    Ineuron.ai
  • Jul, 2021 - Jul, 20221 yr

    SAP Analyst

    Tata Consultancy Services Ltd

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    GCP

  • icon-tool

    AI

  • icon-tool

    Git

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Docker

  • icon-tool

    NLP

  • icon-tool

    MongoDB

  • icon-tool

    CodeClouds

  • icon-tool

    Tableau CRM

  • icon-tool

    Azure

  • icon-tool

    Flask

Work History

3Years

Data Scientist

Data Society
Jul, 2022 - Present3 yr 2 months
    Accomplished diverse training projects encompassing Python, NLP-based Clustering, Computer Vision, and Web Scraping, delivering all projects on time. Recognized with an Efficient Employee Award for consistently meeting project goals. Worked on CNN project to classify fruits images using transfer learning. Written a production grade code for building a Convolutional neural network(CNN) using transfer learning approach on pretrained VGG-16 model, and Mobile-Net model to classify fruits images. Worked on NLP project for doing sentiment analysis on companies policy documents. Created a python package called assesment-creator to automate the task within the organization thus reduced the manual working hours. Worked on Project of Tableau Dashboards for North-carolina government which I used tableau prep flow builder for creating dataflows and finally build the dashboard for the same.

Data Analyst/Scientist Mentor

Digikull
Jun, 2022 - Present3 yr 3 months
    Facilitated learning and growth as a mentor and instructor, delivering engaging lessons and practical examples to students on Python programming, machine learning, statistics, and Tableau. Developed and implemented a comprehensive curriculum for Python programming, machine learning, statistics, and Tableau, catering to students with diverse backgrounds and skill levels. Mentored and coached junior data analysts, providing guidance on best practices, troubleshooting techniques, and code review to facilitate their professional growth and development. Mentored and guided students in their learning journey, providing individualized support and feedback to help them grasp complex concepts and apply them effectively. Designed and conducted hands-on coding exercises, projects, and assessments to assess students' understanding and proficiency in Python programming, machine learning, statistics, and Tableau. Introduced and implemented MLOps methodology for the first time in the organization using MLFlow.

Data Science Intern

Ineuron.ai
Feb, 2022 - Oct, 2022 8 months
    Completed generic training in MySQL, Python, Statistics, Tableau, and Machine Learning. Cleaned and formatted Big Mart Sales data with over 8524 rows and 12 columns and made it ready for analysis. Developed interactive dashboards to visualize Key Performance Indicators (KPIs) and provided business recommendations. Build and Compare different Machine Learning Models such as Linear Regression, Lasso Regression,SVM, and Random Forest to Predict Sales of the different stores of Big Mart.

SAP Analyst

Tata Consultancy Services Ltd
Jul, 2021 - Jul, 20221 yr
    Day to day creating and maintaining Clients data in SAP hana database in Production, Development and Quality System for smooth business function. Prepared BI dashboard in Tableau for tracking of monthly incidents, tasks and change request reported. This work improved the tracking of different task assigned to the different teams and reduced the SLA time by 30%. Worked closely with the engineering and business team using scrum/agile methodology Creating SAP BI objects, Providing necessary roles and authorizations to clients and monitoring process chains.

Achievements

  • Accomplished diverse training projects encompassing Python, NLP-based Clustering, Computer Vision, and Web Scraping,
  • Recognized with an Efficient Employee Award for consistently meeting project goals.

Major Projects

13Projects

Sports Celebrity and Data Scientist

Image Classi cation

Credit Score Classification

Pet-Image Classification using CNN

Python Pypi Package

NLP Emotion Detection

Deployed

Atliq Hardware Sales/Pro Dashboard

Aadhar Card Masking & Information Retrivel

NLP Food Order App

NLP-Text Summarization Using Pegasus Model

Content Based Movie Recommender Engine

Community Sessions:

Education

  • BACHELOR OF ENGINEERING, Electronics and Telecommunication

    Thakur College of Engineering & Technology, Mumbai, Maharashtra (2021)

AI-interview Questions & Answers

Yeah. Sure. I can give my background. So my name is Sachin Mishra, and, uh, I am working as a data scientist in a data society company, um, from last 2 years. Before that, I was working in a TCS as a data analyst or SAP analyst. Um, so I have overall 3 years of experience in the data field, and I love enjoying, um, and, uh, solving I I I'm passionate about solving the data problems. So that's all about me.

How would you set up an experiment to compare the effectiveness of different embedding techniques for NLP. Well so we we have, like, different kind of, um, embedding techniques, right, in NLP. We have TFIDF. We have bag of words. We have, uh, modern techniques, um, like, we can use open AI embeddings. So there are so many different embedding techniques we have in the NLP. Now in order to set the experiment, um, what I will, uh, what I would do or, uh, basically, I will start with a very basic comparison, and it will also depend upon the problem statement. For example, if I have to solve the basic, uh, let's say, classification problem of maybe, um, let's say, any kind of sentiment analysis I have to do. Now with respect to that sentiment analysis, if if my task is just to do that that classification, then maybe I can, uh, go and create the embeddings with respect to TFIDF or maybe using mega words. And then I will see my how much, um, accuracy I'm getting, how much, uh, a one score I'm getting. So, uh, based upon that, I can see the results. But let's say if, uh, I I have the problem statement in which, um, I have, like, documents, bunch of documents, and I wanna classify the documents. Now here, in case of documents, the number of words and understanding which which basically a modeling is instead of not just solved by using, you know, TFID for bag of words, um, techniques. Then I can go ahead and, you know, try to utilize, um, state of the art algorithms like open AI embeddings or maybe some open source model embeddings. So in that way, I will basically set them an set up an experiment and compare the effectiveness of different embedding techniques.

Um, explain your approach to implementing a hybrid recommendation system combining collaborative filtering and content based methods. Okay. Yeah. So, uh, I have worked on the recommendation system. So one of the project which I have worked on a recommendation system was related to, you know, suggesting the movies, which, uh, Netflix also does. So I was just trying to build the same kind of replica. Now I I use the collaborative filtering method and not the content based methods. So in the in the collaborative, uh, filtering, what we do usually is, um, you know, how users like, uh, we we try to group basically similar kind of user, and then we we provide that kind of content, uh, to that user if that user belongs to that particular cluster. In the content based matters, we we have to basically come up with the user's data. So when we initially launch this kind of recommendation system, we don't have any kind of data. So we we have to wait, uh, till we get some data. And once we get the data, then, um, based upon, like, what user is liking, then, um, you know, what kind of taste user has. Um, maybe user is loving the, let's say, emotional movies or maybe, uh, drama movies or whatever genre we can pick up. And once we have the data, then we can provide, uh, you know, the, um, recommendation based on the content and not only the, um, like, collaborative filtering. So, yeah, obviously, by combining collaborative as well as, um, content based filter, we can achieve the very good hybrid recommendation system, which can work in a production system.

How do convolutional neural networks handle image data differently from fully connected neural networks? Yeah. So CNNs are, um, basically specialized, we can say, neural networks, which is which is basically designed for doing the, um, you know, basically designed for, um, images and videos, uh, for some time, but, obviously, they they mostly work with image data. Now the way they are different than, uh, normal neural network or fully connected neural network is basically we can directly feed the images, uh, into the CNNs, and then they have this, um, you know, different kind of filters using which they, uh, try to identify the patterns within the images. Uh, but in other hand, uh, fully connected neural networks, we have to basically convert the images into the pixels, then we have to do the flatten operation, and then we pass those, uh, filters where they self identify, okay, what this image have. But in, uh, fully connected neural networks, we don't have this concept of filters. So that's how they are different.

What techniques we use to handle by leaking when it's dataset in a supervised learning context? Yeah. There are several techniques to, um, handle imbalance dataset. Like, we can go for SMOTE is one of the technique using which we can handle the, uh, imbalance dataset. Then we have techniques like downsampling or upsampling. So if we have, um, some, uh, let's say, 2 classes, 1 and 0, which we want to do the classification, then and we have the, you know, majority classes in 1, then we can go for either, you know, down sample them or maybe we can up sample these 0 classes. So those are some techniques using which we can um, handle the imbalanced dataset in in terms of supervised learning.

What strategy would you use to streamline the deep learning model to run efficiently on a mobile device? Okay. Um, what strategy would would I use to streamline, basically, a deep learning model to run efficiently on a mobile device. Um, honestly, I have never worked on, um, machine learning on edge devices, um, because that is not in my 3 years of experience I've seen. But, uh, I have worked on a mini project in which, um, I I I have to build, uh, basically, a deep learning model, basically, which can take the picture of plant, um, leaves plant leaves, basically, and then it can classify, okay, the leaves, uh, are having some kind of problem or they are okay. So that was the project. And I have basically uses TensorFlow light model, uh, TF Lite model, and created the Android app, and the model was working fine. Uh, so TF light is one of the framework we, obviously, we can go and use to, you know, deploy our models on our mobile devices. That's what I, um, I have used in in my mini project.

Can you explain what the following Python function is intended to do and, uh, identify any potential error that would prevent it from functional functioning correctly. Yeah. Let me just guess. Define, calculate precision, true positive, false positive. Precision is equal to positive divided by true positive for f plus false positive. Return precision. Calculate precision for you on 1 0. So false positive is 0. All are true positive. If precision reserved, except 0, there is an error. Yeah. So to positive so if I look at this um, function, it it basically help us calculate the, uh, precision, and the formula is given by true positive divided by true positive plus false positive. And what we are trying to do in a try, um, block is basically we are calling that function, calculate precision. Uh, and we are passing true positive as 42, false positive as 0, and then we are printing the precision result. Okay? So in this case, we have 2 positive as 42 and divided by 2 positives. So in this case, we will get the precision as 1 because false positive number is 0. And so this accept 0 revision error. This will only can happen if we have, like, all the 2 positive and 0 as well as false positive 0. So in that case, we will obviously get the 0 division error. And, uh, identifying potential error that would prevent it from functioning correctly. Yeah. So that that's the, uh, only thing. If we have the true positive as well as false positive value is 0, then we will get this, uh, accept 0 division error. It will go to the accept block and raise this exception that are calculating precision, um, division by 0 error.

In this Python code snippet, the goal is to sort a dictionary by its values, summing the code, and explain any issue that might arise with the current implementation. Okay. We have import operator. Day is equal to apple 50, banana 30, cherry 20. Sorted d is equal to sorted d dot items. Comma key is equal to operator dot item getter 1, reverse equal to 2, prints on d. Yeah. So, okay, in this code by hand, the goal is to follow dictionary wise values. Yeah. Go and explain any issues that might arise with the current implementation. Okay? Let me just check. So we have d is equal to apple 50, banana 13, cherry 20. Okay? And we wanna sort it using the values. So we want the answer, like, first cherry, then banana, then apple. Right? Because we have 20, 30, and 50. Okay. Sorted d dot items. We'll obviously sort it. Key is we are saying operator dot item getter 1. I'm not sure about this function item getter 1 and reverse equal to 2. So, basically, it will gonna reverse the entire dictionary with respect to the values. Um, I I I have to basically write it in a code environment and then see because, uh, I'm not able to recall item that are 1. But, obviously, I mean, this uh, code looks good good. I mean, this this can solve the purpose to basically shorten the dictionary by its values, but I'm not sure about this item getter 1. So I have to call it, and then only I can comment if this can give any kind of problem or not, this code.

What method would you use to scale feature extraction for millions of images efficiently in a distributed computing environment. Okay. What method would I use to scale feature extraction? K. Scale feature for 1,000,000 of images. That is your name on the studio, computing environment. Yeah. I mean, I can go and use PySpark. So I can write function or maybe I can create, um, you know, Lambda function or maybe cron job. So I I will write a Python function in which, uh, I will be extracting features from the from the images. And then on a different computer computer moment, all the images can be continue continuously streamed or feed it. And then, uh, the Python code will basically extract the feature, and maybe I will convert that feature into pandas data frame or maybe spark data frame, uh, whatever output format we required in in that format. That's what I can think of right now.

What are your preferred tools for automating the deployment of machine learning model and ensuring margin control? Yeah. So I have, like, several tools. So, obviously, for automating the deployment of a machine learning model, I can, um, go and use the, um, basically so for version control, obviously, I will be using, um, GitHub or maybe, um, whatever, I mean, code versioning system the company is using. So right in my current organization, we are using Bitbucket. So that I will obviously use for version control of the code. And for automating deployment of the machine learning model, um, we can automate it via maybe Jenkins pipeline or, uh, maybe, you know, through GCP, whatever tool we are using, like, uh, you can we can use Google Cloud Platform or or maybe, uh, so let let me just remember. I I had used one more tool. Well, I have used. So I have used Yeah. So one of the tool is MLflow. MLflow, we can use for automating the deployment of a machine learning model. MLflow is, uh, one of the tool which, uh, right now we are using in in our current company. So that is one of the tool we can use for sure. And let me just remember one more, uh, I had in my mind. Just, uh, I'm thinking it. So we have workflow and Yeah. So or we can go for Terraform. Uh, so using Terraform also, like, once we have the Dockerfile and everything ready, then we can use a Terraform also to automate this entire process. So those are the tools, um, I will be definitely using.

Yeah. How would I approach data versioning when working with large dataset in machine learning experiments? So, obviously, like, um, code versioning, model versioning, uh, we should go for data set versioning as well. So the data versioning, uh, I would go I will I will be using a tool, uh, DAG sub and, uh, MLflow for, um, data versioning. That that is my go to tool. And, uh, using that, I will ensure my whatever model I'm building. So whatever experiment I've done, so I will be ensuring that this dataset belongs to this belongs to this experiment. And if the dataset got changed and if, um, my model got changed, then using that deck sub and MLflow experiment tracking, I will use, uh, that particular, um, let's say, dataset version 2 and then ML model version 2. In that way, uh, I will be basically ensuring, okay, I have the, uh, proper data version as well as the proper um, model version.