
Dynamic and results-oriented Data Analyst with a proven track record of leveraging advanced analytics and machine learning to drive transformative business outcomes. I am eager to bring my data science and analytics expertise to your team, delivering actionable insights and driving innovation to propel company growth.
Freelance Data Science & GenAI Projects
Freelance Data Science & GenAI ProjectsData Science-Account Manager
Paragon Dentsu IndiaData Scientist
ISDC GlobalAcademic Mentor
Extramarks Education
Python

MySQL

Tableau CRM
Microsoft Excel

SAS

Git
Jira

Visual Studio Code

SPSS

Pandas

BeautifulSoup

EDA

Matplotlib

Seaborn

Plotly

PowerBI

Salesforce

Zoho

SQL

Apache Hive

Microsoft Teams

Zoom

Microsoft Power BI

VS Code

Canva

Zoho people


Midjourney

Salesforce Einstein Analytics
.png)
Kubeflow
Tools: Google Cloud Platform (GCP), BigQuery, Google Cloud Storage (GCS), Cloud SQL, Airflow, Oracle, Data Definition Language (DDL).
Successfully developed a student performance prediction system that provides actionable insights to educators and administrators, enabling them to identify at-risk students early and implement targeted interventions to support their academic success.
Yes. Okay. So, yes. Hi. And a very good morning. Good evening to all of you. This is Manisha, and I am currently based out of this program. So, I'm currently working with ISCC Global on the role of data analyst, and I am basically here dealing with the operational team where we are trying to provide like database solutions to the clients based on their needs and requirements. Apart from that, we are also involved in the training segment to working professionals, grad and postgrad students, on the basis of data analytics and business analytics tools. And, right now, because my organization was a UK-based organization, so they have closed this operational segment of India as of now. And, so here I am looking for another role. And, prior to this, I was working with Extra Marks Education. So, Extra Marks Education was an EdTech organization where my profile was academic mentor. I was handling the sales and marketing operations where I was optimizing their data to enhance sales. And, also, we were recommending customized products based on the specific needs and requirements of the customer. Apart from that, I was handling a team of mentors where I was representing their complete weekly progress report and the weekly reporting system, including how many calls they had made, their status, and updates. And all these analyses were done on a regular basis, for which we used Python, Tableau, and sometimes SQL to retrieve most of the information. Prior to this, I was working with the Brilliance Academy on the profile of operations analyst, where I was basically creating reports of complete data related to the organization. I was also involved in understanding and tracking employee progress reports, including attendance, basic details, salary, and updates. And, also, for the students involved with the organization, I was preparing their basic reports based on attendance and weekly tests, assignments, classes, and all these things. And we were representing that in the form of dashboards and vulnerable reports, like customized reports to the parents. Overall, I would say with ISDC, the optimization in the reports has definitely given me a win in achieving a 49% success rate, followed by Extra Marks where I have tried to achieve approximately 40 to 50% of success rates, and similarly in my previous role with Brilliance Academy too.
What is my approach to use Git for version control? Now, this is something I need to explain. Okay. So, talking about Git, my approach to using Git for version control is structured to track challenges and collaborate with team members to ensure the project's integrity and a streamlined product. The basic steps I followed for data processing are as follows: First, I initialized the repository by using the Git init command. Then, I used the Git add function to make changes, include updates, or track any modifications. Next, I committed the changes by using the Git commit function. I also created a branch with a specific name using the Git branch command. After that, I used the Git merge function to create a merge and rebase. You can view all these steps on my GitHub profile link. Although I haven't updated it recently, I did complete a project very recently, and all the steps I followed are clearly visible. After that, I used Git push for collaboration and continuous review and feedback. Finally, I completed the documentation by including the README files. This is my approach to using Git for version control in processing data scripts.
So, how would you approach fine tuning a pretrained model on a new dataset using the transfer learning technique with Keras? Okay. This is a difficult question to answer. Although I have not come across this particular step yet in my complete career journey. But to answer this question, I can say that the first basic step I followed for this particular technique and method is selecting a pretrained model first. So, the first step is to select or choose a pretrained model that is very much suitable for the task, which can be done on a large dataset. And by using techniques like MobileNet or Inception, we can proceed with the next step by loading this pretrained model using Keras. So, we can upload its architecture and its weights using Keras. Then we can freeze the layers of the pretrained model. Apart from that, after that, we can add different layers or modify it. We can remove any layer as per the requirement, and we can proceed with all these things based on the different units or variables that we are using for this particular task, like binary classification or multiclass classification. And then we can compile the model and perform the complete augmentation on the new dataset to understand the training images or the complete analysis of the report. Then finally, I think we can train the model and evaluate its performance by adjusting the parameters.
Okay. So, explain how checkpointing works in training deep learning models with PyTorch and when it becomes crucial. Checkpointing is basically a technique in PyTorch that is used to manage memory usage during the training process, particularly when we are dealing with large models. This process completely involves periodically saving intermediate stats of the model during the training process and reloading them if needed in the future. During the training process, all predefined intervals that we have used for the model, the total number of iterations that we have used, or the current state of the model, including all levels of parameters, is saved to the disk. It means whatever changes we are doing or have done will be saved to the disk. Then, after saving this particular state, the memory occupied by the model and the optimizer can be released. This will help in preventing out-of-memory errors and train larger models on GPUs with limited memory. After releasing memory, training can continue from the safe checkpoints, and the safe state can be reloaded again into the model, and training resumes from the last set of iterations. Checkpointing is very crucial because memory constraints are a big concern, especially when we are training large models or dealing with large datasets that sometimes run out of memory. We need to keep checking and understanding these things. Sometimes, training sessions take longer, and we are at risk of losing information. To prevent that, we need checkpointing.
TensorFlow with TensorBoard with TensorFlow and wisdom with Itauch for model visualization and training. We still did PyTorch for model visualization. Difficult question. Definitely, it is. Okay. So, talking about TensorBoard. Firstly, TensorBoard is basically visualizing machine learning models and also keeping a monitor on all the metrics that are being used during the training process. Secondly, wisdom is also a very similar visualization library, but they are designed specifically to be used with PyTorch. This is the basic difference I can remember right now. Now, talking about the integration part, TensorBoard is very much integrated with TensorFlow. And, wisdom is a standalone library for visualization and is not integrated with PyTorch. Then, TensorBoard basically provides a user-friendly interface with a wide range of visualizations, whereas wisdom basically offers a simple and flexible interface for creating different visualizations. But, yes, sometimes it does require manual configurations. Talking about the compatibility of wisdom and TensorBoard. So, TensorBoard is primarily integrated with the help of TensorFlow, but it can also be used with other frameworks. Okay. Right now, I do not remember the name of those other different frameworks, but it can be used with other frameworks. Whereas, wisdom is a kind of framework that can be used with any DL network or DL framework that is a deep learning framework, where PyTorch also, TensorFlow also, and all these things. So, these are the basic differences that I remember right now.
To test and evaluate the robustness of a machine learning model developed using Scikit. Okay. So what would you do to test and evaluate the robustness of the machine learning model developed using Scikit against data drift. So to test and evaluate the robustness of a machine learning model developed using Scikit against data drift. I can follow a basic step. Like, first, I can define what is essential and basically the data drift section. So I can clearly define what basically the data drift constitutes, like, what exactly it has and what context we are basically trying to understand the problem. Then, we'll collect the baseline data where it is going to represent a sample of the initial training data that was developed, and during the machine learning model. Then we are going to establish a monitoring mechanism. Like, we are going to implement a monitoring mechanism that will keep tracking the data the incoming data, the data that is being uploaded, that will keep tracking the incoming data and can detect any deviations from the baseline distributions. And this may also involve statistical methods such as feature distributions or monitoring concept. And then, after that, we can set a threshold for basically accepting the drift in the given data distribution. And these thresholds can be easily determined based on the given domain knowledge or even the historical dataset. And, of course, because we have a continuous flow of the dataset, so we need to monitor the new data or the continuously new incoming data. So this will help us in comparing the feature distributions. And if any kind of error or drift exceeded or any kind of information that it detects, it is going to predefine the threshold value and trigger value, alerts the relevant stakeholders. Okay. And then so on, we will develop, deploy, and then evaluate the model performance.
Examine the Python code intended to deploy a machine learning model with Docker. Identify the issue in the Docker file related to best practice in constructing Docker images for the Python applications. So from Python 3.8 slim copy app, work directory app, run PIP install -r directory requirements.txt, expose 80. CMD "Python app.py". So the information that has been provided here indicates a few issues in the provided Dockerfile, obviously related to best practices. the first thing is that the PIP install command is installing dependencies directly from the requirements.txt file without specifying the version it will use. Okay. So, second, there is the use of no cache in the PIP install. In this particular command, this can potentially introduce a security risk because it's not correct and by bypassing the caching mechanism. So, this is obviously going to improve performance, but is unnecessary. So, second, we can see that we are exporting the port value as 80 without providing any context or explanation, which is not sufficient. And, also, this is a good practice to include comments and documentation explaining these specific ports, why we are exposing that particular port, and why we need it. Third, the last line that shows the CMD function, while writing this complete CMD to specify the command to run the application is acceptable. But using an entry point might be more appropriate in some cases. So, this is what we have observed here.
Given Python code block, which is designed to test a machine learning model accuracy. What would you change to follow best coding practice regarding the variable names and readability. Okay. So there is this image. Import SK learn. Not metrics as metrics. Define test table. Model x test. Y test. Y prediction. Accuracy rate. Okay. Return. ACC. Model accuracy. Where it is defined. Okay. Accuracy, it's defined. Test model, train model, test model, test label. There are few changes that I can do in this particular code. The first thing is I can just rename the variable, like, ACC to accuracy for more clarity. If someone is looking at the code for the first time, he or she should understand this part. Then, instead of model, ACC, okay, I can replace that part as model accuracy to provide a more descriptive name of the variable, again, just to clarify the accurate result and storing the keyword information. 3rd, we can add a docstring to the test model. This is one thing that we can do, to provide the complete documentation about what exactly it is, this complete command is about, this complete syntax is about, its purpose, and then different type of arguments, what is the return value following, the complete code readability and maintainability and also defining each and every functions.
When designing a machine learning system involving complex event processing, how about you factor in and check the efficiency for high-performance computation? I do not properly remember this particular answer, so I don't think so I would be able to answer this part. I just know that Jacks are basically built-in functions in the programming, and they are used for complete performance of different levels of functions. And, also, they are immutable. Like, once they are defined, they cannot be rechanged. Okay. And, also, they can compile and optimize computation, like, computation very, faster. Its execution is faster and also gets better scalability.
So how can containerization with Docker enhance the deployment process of machine learning models developed using a Python library? So, one thing that I can answer here is that Docker contains encapsulated containers that are the entire runtime environment, which basically includes Python libraries, dependencies, system configurations, and so on. So, because of all these things, the risk of deployment minimizes errors due to environmental discrepancies. 2nd, it also provides isolation that allows machine learning models to run in isolated environments. I think that's a key point. And one factor that I know is that scalability is good, version control is available, and there are dependency management. Like, it basically eliminates the need to install any dependency manually. This is one thing that can be done with Docker. Then, it can be said that Docker containers are lightweight and consume minimal system resources, making them very efficient for deploying machine learning models in various cloud environments. This is one thing that can be done. 2nd, it integrates seamlessly with tools like DevOps and also practices that enable automation of the complete deployment process.