
Senior NLP Scientist | Proven Leader in Driving Business Impact with AI
I am a highly motivated Senior NLP Scientist with 7+ years of experience in leveraging cutting-edge Natural Language Processing (NLP) techniques to solve complex business problems and generate significant ROI. My expertise lies in building and deploying scalable NLP solutions across various industries, including healthcare, e-commerce, and pharmaceuticals.
Throughout my career, I have consistently demonstrated a strong ability to:
Lead and mentor high-performing data science teams. I have experience spearheading teams of data scientists and engineers in designing, developing, and deploying NLP solutions.
Deliver impactful results through NLP innovation. I have a proven track record of building NLP models that have increased user engagement by 20%, reduced customer pain points by 300,000, and boosted revenue by 3.7%.
Bridge the gap between technical expertise and business needs. I excel at translating complex NLP concepts into actionable insights for non-technical stakeholders, ensuring data-driven decision making.
Embrace new technologies and stay ahead of the curve. I am passionate about staying current with the latest advancements in NLP, actively utilizing tools like Hugging Face, OpenAI, and PySpark to tackle Big Data challenges.
I am confident that my leadership skills, technical depth, and business acumen make me a valuable asset to any team looking to leverage the power of NLP for real-world impact.
Lead Data Scientist
MRESULTSenior NLP Scientist
NAVANA TECHNLP Scientist
NAVANA TECHMachine Learning Developer
RELIANCE JIO
ChatGPT
.jpg)
Hugging Face

Dialogflow

GCP

AWS

Git
Azure

Tableau

Keras

Tensorflow

NLTK

Sklearn

Pandas
.png)
Flask

Kaldi
Could you help me understand more about the background by giving a brief introduction about
Describe any project you Okay. Describe any project you have extensively used in production. What is the achievement you need using LLM, and what are the challenge you are facing? So the project which I have worked with LLM is right now the redaction project. So the redaction project is from Pfizer. How it works is you have and what is the achievement you have made and what are the challenges. So now, uh, we have made here the redaction project using LLM. The task was we have the very sensitive medical data, and this was used in production. So for example, let's say a name of a employee is or let's say name of a patient is Raju. Raju is suffering from acidity, and he is taking this medicine this many times. This is his phone number. This is his email address. So the complete is stored in your, uh, database. Now this information is very sensitive, and if you want to use it for any purpose, let's say, machine learning purpose or developing any model or you want to use it internally, anywhere, you can't use this data directly. It does not follow all the guidelines. So what you need to do, the first thing is you need to, uh, immediately you need to, first of all, immediately make it uh, uh, redacted. Redacted means that all the crucial information is hidden. Now this hidden information, like, for example, Raju name is now changed with some other name so that the real information with the dummy name, his age, his name, his phone number, his email address is redacted with x x x is written or some dummy name is written so that it can be used for modeling purpose also, and it is not, uh, what you call it as linked. So for to do this purpose, we have used large language models. Now inside that, first, identifying the name because the major now what is the major challenge here? Why we can't do this with machine learning? Because the people name let's say the people name is Helen, and there is a medicine name, which is also, let's say, Helen 10 20 mg. Now identifying this, whether it is a name or, uh, what you call as a medicine name, this is very important because we don't want to redact the medicine name. We want the dosages so that we can train a model later on. To identify them, we use 11 models here. Now 11 models can understand the context very well, and even if you give some name which LNN has not seen, it can recognize that. So what we did here, uh, we used the LNN models, and we tried to redact the names. Okay. We also fine tune the model here. Now to fine tune the model, uh, to improve the more performance and accuracy, like I said, this is a medical data. It's very important. What we did was we created a dummy data with the help of LLM only, like, using prompt engineering. We created the data. After creating the data, we redacted that also, and the same thing we used to fine tune it also. So we used the OpenAI model to fine tune it, and we used Azure, uh, OpenAI here, and we've trained that model. Okay. After that, we got a very good accuracy here around 97.3 percent, uh, on this reduction where phone numbers, email address, and, uh, the name of a person was redacted very, very, very perfectly with a 97.3% of f one score. That was it.
Okay. What steps would you take to mitigate Snowflake compute cost while running complex JSON aggregation. So I have not worked with Snowflake, so I would not be able to answer this one.
Imagine you have a complex SQL query consisting of multiple joints that is running slower than expected. What areas of component, query, or data involved would you look into 1st to optimize the performance? So, uh, that's a school of multiple joints and the operations that is running slower than expected. What are the areas of component or data involved would you be? Okay. So we are using here c dot star a dot star and b dot value on table a, table b, and we are status. Okay. We want to optimize this. So I think we are using here a dot star. Instead of using star, we can take a particular value. That is one optimization we can do. Then we are using here 2 joins. B is joined on a dot ID and b dot ID. Okay. And join table c on c dotid and a dotid. Okay. Then we'll see status as active. We can use having clause also here, uh, to make it faster.
Would you how would you leverage different elements to create a WhatsApp chatbot for travel inquiries?
I think this platform is not very smooth. I have missed that question. A lot of questions I have missed. How do you design a scalable slot to handle degradation? I have not scale it and stuff. I have not designed Spark, but I have used Spark jobs by Spark. I have used it where we can take a larger volume of data, and, uh, we can apply map reduce on top of it, and we can make it super fast. So that I have worked on but not designing.
Suppose there is a tie PyTorch model for image classification that is yielding low accuracy than expected. What steps with the within the machine learning model evaluating process would you examine to debug and improve the model performance? Okay. So we have model classifier. Entropies there. We have used item optimizer, okay, for a box and image labels class. We are using optimizer.0 grad, and we are then evaluating the model. Okay? So while evaluating the model, we found that it is yielding low accuracy. Okay. So now if it is yielding a lean low accuracy, we can try to change the loss function maybe. We can try to change the different optimizers. So there is Adam optimizer used there. We can try to use with different optimizers and see that we can change the learning rate. The first thing that will come to my mind is changing the hyperparameter. That is the learning rates. Okay? Uh, then we can add some more process to improve the accuracy like early stopping checkpoints. We can do that. Okay? Uh, we can check that whether it is overfit or underfit. If it is overfit, we can apply these techniques like I said to you, changing the learning rates. We can also try to, uh, hardly stop it so that it doesn't overfit. And if it is under fit, we will try to augment the data more, increase more data because it is image classifier so we can do all that.
You are given with the following as well query that is intended to calculate the average rate of what it is missing a clause, and it's currently returning an incorrect result. Identify the missing clause that is
Python snippet that uses pandas, the code is supposed to merge two data frame on a key and compute the mean of column scored from the merged data frame. However, the result is not expected to work or the code will not work because we are supposed to merge two data frame on a key and compute the mean of column scored. We have one or two data frame, id is there, score is there, value is there, pd.marginValue So, the second column you can see that it is value and the second column score, the names are different. So, it will create now id column, so 1, 2, 3, 1, 2, 4. So, we have 1, 2, 3, we have 1, 2 here. If we do pd.merge, it will merge on id, so it will be an NFJoin, NFJoin is 1 and 2 open match, 1 and 2 open match. So, 23.34 So, the column names are different score and value, so we will make them same.