
Senior Data Scientist
Walmart Global TechMachine Learning Engineer III
Procore TechnologiesSr Data Scientist
IQVIAAsst System Engineer
Tata Consultancy ServicesSoftware Engineer - Machine Learning
GartnerData Scientist II
Tata 1mg
Python

MySQL

PostgreSQL
Jira

LLM

LangChain
Pyspark

T-SQL

Oracle SQL Developer
.png)
Docker

Git

Pandas

Jupyter

Core Java

Scikit-learn

Keras

Pandas

NLTK

OpenAI

AWS Batch

EC2

ECS

S3

Lambda

SQL

Oracle

Redis
.png)
Jenkins

Jupyter

PyTorch

Pandas

Gensim

NLTK

OpenAI

ECS

Lambda

SQL
Hello. Hi. I'm Tanig Gupta, and I'm currently working with IQVIA as a senior data scientist, where I'm helping healthcare professionals make better decisions through AI insights for patient wellness. So, prior to IQVIA, I worked with companies like Protata One MG, which is an Indian healthcare startup, and Gartner and Tata Consultancy Services. So I got really good exposure to working with a variety of datasets, ranging from tower data to national data processing, or computer vision kind of problems, right from data consulting systems themselves. And at that time, I also took a deep dive into the area of artificial intelligence by completing my master's in the field of artificial intelligence itself. Then I joined Gartner, where I worked in the customer analytics area, where I worked on building multiple data ML pipelines for customer prioritization and customer risk analysis. I was also a contributor to a document recommendation agenda, which is available on the gartner.com website. Then I joined IQVIA. In IQVIA, I'm primarily working with healthcare datasets, which are electronic medical records. And with that, we're trying to predict rare diseases in patients, so that doctors can identify them beforehand and save patients. Also, I'm working on the benchmarking of multiple large organ models for healthcare datasets. When working on a trial of 7,000,000,000 models, I'm trying to find the dataset, which is a PubMed-based dataset. I'm trying to do batch modeling by training multiple models and choosing which one works best for the healthcare dataset. So, yeah, that's all about me and my past experience. I've done my undergrad from a bachelor's program in India, which is Maharishi Markandeshwar University in Ambala. I also have a degree from Sarimpot, which is in Uttar Pradesh. So, yeah, that's all about me. Thank you.
What method would you employ to combine predictions of multiple machinery models? So the method I would choose would be stacking. So when we have multiple machine learning models, we can definitely choose stacking as an option where we leverage the power of different machine learning models and come up with the best one having contributed all these models and choose the one which has the highest probabilities. So we actually make the prediction, choose the one which is having the maximum score or the maximum probability. Those predictions have been chosen. This is what we call stacking of the mission.
Can you improve the training speed of a deep reinforcement learning model without compromising its performance? I think in this case, I should be using the dash normalization layers as well, so I can boost up the process.
I approach building a new network model to process multilingual text data by considering sequence modeling with RNNs. I can choose some RNNs, such as LSTM or GRU layers, to process this multilingual data. I can play around with the architecture of it, for example, using 2 LSTMs or 1 GRU, or stacking multiple LSTMs. I can also leverage large pre-trained models today and fine-tune them on a particular task.
Right balance between procedure and the call for a classification problem. So the right balance is something I have to plot precision against the recall curve, and maybe I can have a multiple I mean, I'm choosing the right threshold to divide my dataset, considering a binary classification problem. So there I can plot the recall and proceed and see where they intersect. But it's also driven from the business point of view, maybe I'm going to focus more on my recall, not on the precision. Or maybe I can focus more on the precision, not on the recall. So it's vice versa. In that case, my approach could vary or my threshold, which I choose to classify my data points. You know, that also can vary. So one way to choose is the AUC score, which gives my overall robustness of my model, and then I can simply keep on checking my precision and recall at each threshold. Or maybe I take 10%, 20%, 30% like that. And then I see what makes more sense to the business, and accordingly, I choose the right threshold and the right balance between the precision and recall.
Converter trained to test the model for more mobile-friendly format. Yeah, we can do quantization for the same. We can go with it. We can first use the mobile-friendly format, we can use TensorFlow Lite to build the model. This will actually be very lightweight model. We can convert the model to ONNX format, which is heavily designed for tackling these kinds of problems, the quantization model, which is very friendly for mobile devices and low-level devices like edge devices as well. In that case, it works absolutely fine to improve the model.
We have loaded the model. We are loading the model. The reason we've got the model is because the model. Okay? We have the static method. This is here. Prediction, we are making a prediction. The model is dynamic. If the model is done, then we have an exception, "model not found." Else, we return the model's predictions. Okay. I think instead of having the model as a class method, although we are looking at this. This is in a static method, first of all, and we're trying to use a class variable. So, which is not possible either - the model has to be a class method, not a static method. Or we simply remove the static method here or can define the model as self.model, which is globally available in the whole class. I can use that one. Or I simply make it a class method because I cannot do something like having a static method. I can't just use the class variable here. I cannot do that. Let's validate the rules of a static method. So, this is the main problem here in design.
The Python function, intended for future scaling in a machine learning cross machine learning preprocessing pipeline and the Python screen for using Python production and then feature scaling, machine learning preprocessing pipeline, machine learning programming, scaling of RAM, identifying and explaining potential issues. So if we're just scanning the data frame in a column. Okay. We're going to do the minimum-maximum of it. And the column is equal to the menu. Okay. We are simply standardizing it manually. We are doing column minus minimum value divided by the max minus minimum value. Okay. It's like converted to not z-score, but the min-max standard. I didn't write it. Okay. First of all, this function is only for features which are of numerical nature. That is the continuous features. But I'm not sure how we are using it because there could be a possibility that the data frame in the data frame, there's a categorical feature coming. In that case, this function won't work, and this will produce errors because we can't simply extract a minimum or maximum of any category feature. So, first, that's a point. And the intent of features is mainly machine learning processing. So I think that's a major issue. And in this case, I'm completely converting my data frame, and the column is completely changing and converted to min-max scalar values, which I have here. Okay. The min-max function is fine. The min-max scaler is fine. But we could use the scikit-learn implementation of the min-max scaler as well. So that would also work. That would be more activated, more faster, because that works in a vectorized fashion. So these two points I would say I will consider to improve to this particular function.
How you would use PyTorch to implement a feature that could perform style transfer between 2 minutes? Implement a feature that could perform style transfer between 2 minutes. Like, might want to implement GANs here. So I think I can implement GANs to do the same
Of the use of graph neural networks and their potential use. Graph neural networks can be used when we might want to start a solution linking multiple documents, maybe multiple datasets. And you want to learn something from a variety of datasets. Maybe I have 3 or 4 datasets to solve a particular problem, and I want to establish the connection of learning. I want to learn something from the variety of datasets. And in those kinds of scenarios, I can learn from graph networks.
How might you apply convolutional neural networks to an unconventional dataset such as audio time series? I can apply them, but I think for time series is a sequential problem. So I don't think we can apply CNNs for time series. We foresee time series. We can apply RNNs, but not the CNNs. But for audio kind of problem, when you want to extract some high-level information from the audio, we can definitely leverage CNNs. We can leverage CNNs not only for information, but also for audio and textual data as well. That is when we want to extract some high-level features and then might want to pass it on to RNNs because audio is also a sequential problem. But we can add initial layers of CNNs to extract the high-level information, then pass it on to a sequential modeling.