
Machine Learning Engineer III
Procore TechnologiesSr AI Engineer
IQVIAData Scientist II
Tata 1mgAsst System Engineer
Tata Consultancy ServicesSoftware Engineer - Machine Learning
Gartner Inc.
Python

MySQL

PostgreSQL
Jira

LLM

LangChain
Pyspark

T-SQL

Oracle SQL Developer
.png)
Docker

Git

Pandas

Jupyter

Core Java

Scikit-learn

Keras

Pandas

NLTK

OpenAI

AWS Batch

EC2

ECS

S3

Lambda

SQL

Oracle

Redis
.png)
Jenkins

Jupyter

PyTorch

Pandas

Gensim

NLTK

OpenAI

ECS

Lambda

SQL
Hello. Hi. Uh, I'm Tanig Gupta, and I'm currently working with IQVIA, senior data scientist, where I'm helping the health care professionals to take better decisions through the AI insights for the patient wellness. Uh, so prior going to IQV, I worked companies like Protata One MG, which is an Indian health care start up, and, uh, Gartner and Tata Consultancy represent. So I got really good exposure to work with variety of datasets starting, um, ranging from tower data to the national data processing or the computer vision kind of problems right from the data consulting system itself. And, uh, the that time, also, just to be deep dive into the area in the field of artificial intelligence, I got myself enrolled and complete my masters in the from the in the field of artificial intelligence itself. And then I joined Gartner where I worked in the customer analytics area where I worked on building multiple data ML pipelines, uh, for, uh, the customer, uh, prioritization and the customer risk analysis. Um, I'll be and one of the recommendation agenda which is an document recommendation agenda from one of the gartner.com website. And then I joined and then IQVIA. In IQVIA, I'm primarily working with the health care data set, which is electronic medical records. And with that, uh, we are trying to predict the rare disease of the patients, uh, so that we can the doctors can, uh, identify that beforehand so that we can save the patient beforehand. And, also, uh, working on the, uh, national reprocessing or the, uh, we're working on the benchmarking of the multiple, uh, large organ modems for, uh, only health care datasets. When we are looking in this trial of 7,000,000,000 model, I'm trying to find the dataset, which is an, uh, which is a PubMed based dataset. Uh, I'm trying to do batch model of by training multiple models and choose which one works best best for the health care dataset. So, yeah, that's all about me, uh, my past experience. And academics have I've done my undergrad from, uh, bachelor's, uh, from India only, which is Maharishi market research university, uh, in Ambala. Uh, I heard from Sarimpot, uh, which is in Uttar Pradesh. So, yeah, that's all about me. Thank you.
What method would you employ to combine predictions of multiple machinery models? So the method I would choose would be stacking. So when we have multiple machine learning models, we can definitely choose stacking as an option where we leverage the power of different different machine learning models and come up with the best one having contributed all these, uh, models and choose the one which makes have which is having the highest probabilities. So we actually make the predictions maybe if I have 3 models, maybe 1 decision tree, 1 random forest, 1 x g boost, and then 1 learning solution as well. Out of these 4, I make the prediction, choose the one which is having the maximum score or the maximum probability. Those predictions have been choose. This is what we call as stacking of the mission
Can you improve the training speed of a deep reinforcement learning model without compromising its performance? Uh, speed. So I think in this case, I think I should be using the, uh, the dash normalization layers as well So I can boost up the process.
How do you approach building a max building a new network model to process multilingual text data? I think in case it says, uh, this multilingual data would be of sequence sequence modeling. I can choose, uh, some RNNs. Uh, recurring network, I can go with the LSTM or GRE layers I can choose. I can play around with them. I can choose. I can play around with the architecture of it. Uh, maybe 2 LSTM or 1 GL would be making sense or maybe the stacking of those multiple LSTMs, uh, I can choose. Uh, so, yeah, I think these my approach would be, uh, doing a sequence modeling using the RNNs for the same. I can also leverage the large m v models today, uh, the pre trained models and maybe fine tune it on a, uh, particular task. There are also something I can think of.
Right balance between procedure and the call for a classification problem. So right balance is something I have to plot precision and the call curve, and maybe I can, um, have a multiple I mean, I'm, uh, choosing the right balance between precision recall is actually choosing the right threshold, uh, to to divide my dataset if considering a binary classification problem. So there I can, uh, plot the recall and proceed and see where we are there wherever they intersect. But it also, uh, driven from the, uh, business point of view, maybe I'm going to focus more on my recall, not on the precision. Or maybe I can focus more on the precision, not on the recall. So it's vice versa. In that case, my approach could could vary or my threshold, which I choose to classify my, uh, uh, data points. You know, that also can vary. So one way to choose is the, uh, AUC score, which is which, uh, gives my overall robustness of my model, and then I can simply keep on checking my precision and recall or, uh, at each threshold. Or maybe I take 10% as threshold and divide my dataset and 20% 30% like that. And then I see that what makes more sense to the business, and accordingly, I choose the right threshold and the right balance between the
Convertor trained to test the model for more mobile friendly format. Yeah. We can do quantization for the same. We can go with it. We can, first of all, use if you mobile friendly format, we can use it TensorFlow Lite to build the model. This will actually be very, um, very lightweight model. We can convert the model to ONIX format, ONIXS format ONIX format, which is which is heavily designed for tackle these kind of problems, which is the quantization model quantized model, uh, mister quantized model, which is very friendly for the mobile kind of for the, uh, low level devices like or maybe edge devices as well. In that case, those works absolutely fine, uh, to improve the
Part of machine learning prediction service. What is the potential design problem here? How could you address it? Prediction service. Model is none static method. We have load the model. We are loading the model. The reason that we've got model is because the model. Okay? Okay. We have the static method. This is here. Uh, prediction, we are making, uh, predict model prediction model is dynamic model is done, then we have exception model not found. Else, return the model predictions. Okay. I think instead of having the model as a class method, although we are looking here. Okay. This is in a static method, first of all, and we're trying to use a class variable. So which is not possible either a slot model has to be a class method, not a static static method. Or we simply remove the static method here or can define a model as self dot model, which is globally available in name whole class. I can use that one. Or I simply make it a class method because I cannot do something like having a static method. I can I'm just using the class variable here. I cannot do that. Let's validate the rules of static method. So this is the main problem here in design.
The Python function, uh, intended for future scaling in a machine learning cross machine learning preprocessing pipeline and the Python screen potential using Python production and then features scaling, machine learning preprocessing pipeline, machine learning programming, scaling of RAM, identifying the identify and explaining potential issues. So if we're just scanning, I think the data frame in the column. Okay? We're gonna do the minimum maximizing of it. And okay. And column is equals to menu. Okay. We are simply standardizing it manually. We are doing column minus minimum value divided by the max and minus minimum value. Okay. It's like converted to not z score, but the mini max standard. I didn't write it. Okay. Okay. First of all, uh, this function is only for the features, which are numerical nature. That is the continuous features. But I'm not sure how we are using it because there could be a possibility that data frame in data frame, there's a categorical features coming. In that case, this function won't work, and this will produce the errors because they'll they'll at no point, we can simply extract a minimum or minimum or maximum of any category feature. So, uh, first is that point. And, uh, intent of features mainly machine learning processing. So I think that's a major issue. And in this case, I'm completely converting my data frame, and the column is completely changing and converted to a minmax scalar values, which I have here. Okay. The minmax function is fine. The minmax scaler is fine. But we could use leverage scikit learn implementation of min max scaler as well. Uh, so that would also work. That would be more activated, more, uh, faster, uh, because that works in a vectorized fashion. So the the these 2 points I would say I will consider to improve to this particular function.
How you would use PyTorch to implement a feature that could perform style transfer between 2 minutes? Implement a feature that could perform style transfer between 2 minutes. Like, uh, might want to implement GANs here. So I think I can implement GANs to do the same
Of use of graph neural networks and potential use. Graph neural networks can be used when we might want to, uh, start solution link between multiple documents, uh, maybe multiple datasets. And you want to learn something from variety of datasets. Uh, maybe I may have 3 or 4 datasets to solve a particular problem. I want to establish the connection of learning. I want to learn something from the variety of datasets. And those those kind of scenarios I can, uh, learn from the graph networks.
How might you apply convolutional neural networks to an unconventional dataset such as audio time series? I can apply them, but I think for time series is a sequential problem. So I don't think we can apply CNNs for time series. We foresee time series. We can apply RNNs, but not the CNNs. But for audio kind of problem, when you want to extract some, uh, high level information from the audio, we can definitely leverage the CNNs. Uh, we can leverage the CNNs not only for information, but for audio and textual data as well. That is when we want to extract some high level features and then might want to pass it on to the RNNs because audio is also a, uh, sequential problem. Uh, but we can these initial layers, we can add off CNNs to extract the hello information, then pass it out to a sequential modeling