
Principal Data Scientist/ Head of AI / Mentor with vast experience in building AI use-cases from scratch and deploying them to production. Amazing experience in GenAi, LLm fine-tuning , RAG, vector databases, NLP, Machine learning, Deep learning, transfer learning, working with LLM, MLOPS, deployment using aws, airflow, data bricks, pyspark, pipelining, containerization. Effective and proactive communicator with experience in leading teams and projects. Expertise in Computer Vision for OCR related information extraction from images, pdf parser, XML parser, box detection, entity
detection and recognition, Data Mining, Data tagging, Data Analysis, Feature Selection & Model Selection, Model Building, Model Validation, Model threshold validation, log analysis.
Principal AI Consultant & Advisory (Part Time)
OrthoQuantLead Data Science (Contract)
The Weather ChannelSenior Applied AI Engineer
Work FusionNLP Scientist
Senseforth AI ResearchNLP Scientist
Theatro LabsPrincipal Data Scientist Head of Data Science
Oorwin LabsSenior Software Engineer - Data Scientist
InfosysCo-founder & CTO
Lumino AIConsultant AI Lead (Contract)
MezmerMedia
Python
AWS (Amazon Web Services)

ML

NLP

OCR

Deep Learning

Business Analytics

DevOps

Computer Vision

Artificial Intelligence

Data Visualization

Generative AI
.png)
Docker
Azure

AWS

Tableau

GraphQL
.png)
Flask
.png)
Gunicorn

Solr

Scrapy

GCP

Airflow

MLFlow

Haystack

LangChain

weaviate

GPU
Okay, so this is Sai, Vignan Mayala. I have around 8.5 years of experience in the field of data science. I'm very good at working in real-world applications. I have developed products from scratch. I use Python. I use machine learning, deep learning, NLP, and generative AI. I have led teams. I have good experience, even deploying the models and architecture to production. Good at understanding products and good at deploying and making them to production. So, that's kind of my background. I worked in Infosys, and a couple of startups like Sensport and Theatro. And then I worked at Urwin. I had a long stint of four years at Urwin. As a core team member, I built AI platforms and pipelines from scratch. I work with AWS. I work with Hugging Face, Thomas, I'm very good at working with OpenAI and generative ARAG models, even the latest Llama models. I know how to integrate them, how to deploy them, and how to use them on AWS Fargate, AWS Bedrock. So, I'm good at understanding all the end-to-end architecture, have good experience, and very good at doing hands-on work, as well as leadership. That's my background. Thank you.
So the selection of loss function purely depends on the use case. It's based on the regression, or binary entropy, or categorical entropy, or maximum loss function, depending on the use case. And, suppose I take an example of SVM. We go with the maximum loss function or margin loss function, which is finding the nearest point to the hyperplanes. It depends on the use case in deep learning techniques, how to or what to use in the deep learning algorithm, the appropriate loss function. We can even trigger our own loss functions. I used raw metrics to get the loss functions. And even the deep learning, I used many other models, like categorical loss of entropy. I used binary loss entropy. I used even skewing loss entropy. So I used a lot of new techniques in the loss implementation also. That's a very good implementation from my perspective.
Yeah, so to train my model on a large dataset, I would primarily use some GPUs or a high RAM. I mean, obviously, using a cloud infrastructure like AWS is also an option. And the techniques I use for a large model are basically setting up the hidden layer, the front layer, hidden layers, and the end layer in setting the deep learning architecture. Then, while implementing the architecture, I need to make sure the data is batch normalized so I can implement some best practices, including dropout to randomly drop out nodes to make it learn better, and applying some saturation functions, such as ReLU. So these are all different techniques in deep learning that I would apply to train on large models because finding patterns is very important in large datasets. Applying all these techniques, like normalization, saturation, and the ReLU technique, or even increasing the number of nodes or layers, all these are kind of techniques to enhance the learning of the model. And, obviously, the loss function and the optimizer I use also have an effect on training the model. From the computer, I would require good RAM to do it, and I would not directly start with the larger dataset. I would take some samples from it, strategically implementing some algorithm to get the right samples, and then implement the model and check if it's really working with my metrics. If it's working very well for the small dataset, at least for a little minimum of the p-value, then I can go ahead and load the larger dataset. So it's a bare minimum to check on that.
Do continuously implement continuously the implementation of an NLP model with incoming data. So, you would obviously implement something like active learning. Active learning is a technique where we continuously train an NLP model, where the data is incoming, we put some threshold. Suppose I'm doing some NER in an IP, so entity definition in NLP. So, whenever new data comes, we try to classify it NNEA, but we put a threshold or and we make something like an entity classifier. We make a submodel to classify it in real-time. So, if the thresholds are at the bare minimum or in the borders, we would not take that in value and put it in the human feedback or reinforcement learning reward model feedback. So, there are multiple techniques again, reward model or human feedback or even some threshold-based understanding, so rule-based understandings. All these things can be applied to those NER values that are not coming into the detections, which are in the border, which are far away from the probability of predictions. So, these values could be continuously added to the model and MLOps engine, which is also part of my active learning technique. So, it will continuously feed the data, train it. If required, it'll take out the human loop or a RHL or Reinforcement model or some rules. Right? So, I can use rules. I can use rewards. I can use human feedback. And all these things are implemented when the prediction is happening. And based on a set of rules, the probabilities, we'll try to shift them, and then send it to the active learning or the MLOps engine to train again. So, this kind of real understanding. So, I took an example of NER, like how we detect a particular word. Every day when we use new words, the NLP learn model has to understand the new words also. Right? So, these things can be triggered accordingly.
So versioning of models, how would you manage the versioning of both data and models? So, I would obviously put versions in DVC, we have data version control, and I have Git version control. So, all these things I would use to put versions of my data and the models and push it and pull it accordingly. That's the real technique. In MLOps, when I'm continuously iterating over and training a new model, I'll create a new model and then make sure that the versions are put in Git or DVC. These are quite straightforward techniques where I can version, and I can even use S3 for customized versions. So, all these things are quite manageable in real time.
For testing and development of generative models, there are three factors, or triplets. The honesty, and harmfulness. And one more is harmless, honest, and fact. So there are three things which generative AI models have to make sure they're working properly. So they should not be hallucinating. It should not be harmful. When people ask dangerous questions, such as how to make a bomb, it should not be answered. And it should be harnessed. It should not give some wrong facts. For example, we have to say that the president of India is Modi. Can generative models create something new? Right? So the 3H formula has to answer that we have to make sure. And how to make sure it is giving the right answers is the human feedback when you're making it, and RLHF, the reinforcement learning from human feedback model. There's also something called constitutional AI where you can set different sets of rules and then make a secondary check in the testing phase to make sure it is giving the right results. And for the scores and all, we have ROSE metrics. We have benchmarks. So we can test the benchmarks. We can test the ROCE metrics and make sure the model is very good for deploying. Right? So testing, obviously, the 3H formula and which I said already. Also using ROG metrics and benchmarks. Yeah. So these things will help for a good model. And deploying, yeah, we have to deploy safely in AWS servers, customized models, or use some third parties with custom APIs. So a very good, robust deployment, and it can auto-scale based on the load balance effect. Right? So these are the strategies I would use. Right?
Yeah. So this is a simple issue. So, transform model has no attribute to pretrain. That means either your import is wrong, you know, from adding case import, transform model Importer naming is wrong, so that is a primary reason. Secondly, that particular library is not available for that model. Like, sometimes we use auto model for class sequence classification. Sometimes we use Llama model for token classification. So it depends on the model, and it depends on the library. And even after doing all these things, if it is still coming along, that means the import doesn't have a functionality at all. Or your import is something like import hugging face dot transformers as xyz. Right? So if you're using Xfizer, then the Xfizer will not work here. So from trade trends, basically, it will not work if the model is not pre-trained and the model doesn't have that feature of, you know, taking from a pre-trained model. So that's an easy issue.
Why might this loss function be inappropriate is that it's a custom loss function. That's great. And loss dot David has reduced mean of absolute of y true minus y predicted. So why are you trying to reduce mean of absolute for a generative model? I don't think this is an effective way to decrease the loss for a generative model. Mean is not suitable for generative models. Obviously, there's no y true and y predicted in a generative model, which are words. Right? They're not numbers or something. Right? So you can't say that y true minus y predicted or something like this happening in the generative model. Right? So you have to use something like ROUGE metric, maybe something like reducing the mean of absolute of the number of true words matching minus the number of predicted words matching. Suppose I'm predicting "here is my house" and the actual is "here is the house". So 3 words have matched, "the", "my", and "house" haven't matched. So three out of four words have matched. So you can say that the number of true words matching is 3 and the number of predicted words matching is 3. So 3 minus 3 and absolute of that is, you know, something mean. Right? Mean of that is, you know, 0. So you can say the ROUGE score is 0.
Design a very high level architecture for a scalable generated way to system focused on text generation. Okay. The architecture high level architecture. So we have we have front end, and then, the front end calls the back end. And the back end has, integrations with AI systems so AI servers. So AI servers, they deal with something called RAG methods, like lang chain or something. the rag method deal with, has how to access, you know, the models like OpenAir, you know, llama 2 or something, which is hosted in some GPUs or maybe third party APIs. And, you should have some vector databases to do this for doing some manual actions, you have to access some microservices, to do this. And then, no. You have to have this So or, you know, Airflow or MLOps engines, which continuously give inputs. You have to have the, reward model being trained, or affected and submodest to entity class entity classifiers. So they're all subset of things. We'll link it to each of the AI server to LAN chain or to, a RAG model I mean, a RAG method and then to API to, the original model, Nava model or something, and then direct databases, And then the reward models, the sub entity classification models, so all these are part of ecosystem. And the inside this, there are multiple serverless, multiple hits to internal APIs and all. And then this call gives back response to back end. Back end back end has access to, no, the internal, the session management or databases or all these things, and then it gives back to the front end and manages it. So all these are part of the AWS cloud, and, you have to access you have to do this real time with MLOps. you have to do with Bitbucket, the cloud versioning, the cloud model versionings. So all these things are part and parcel.
In a multiple project environment, how would you ensure consistent performance of generative models across teams and datasets? Okay, so in a multi-project environment, how would you ensure consistent performance of generative models across different teams and datasets? So when you mean consistent performance of generative models across teams and datasets, that means the context of the data generative model has to not change why it has to be used. You have developed for a particular reason, and if it is not being used for a different, same reason, then it will obviously not be consistently performing. So, it'll have this consistent layer with a set of rules, what does do, what does not do, and when you measure the performance of the model across teams and different datasets. Yeah. So one thing is, if you are working with different teams, then they have to prompt-engineer the model according to their need and use case. So the model is already in a cloud server, so they can access it. Every team has a different use case, so every team has to write proper engineering steps accordingly and hit the server. Secondly, different datasets. Obviously, every team has its own datasets. They can fine-tune the model and put it as a version. With customer access, they can create instructional datasets or something. And, thirdly, if they're not doing training, so they can host their data in some vector databases or some semantic DBs. So where the data is put in their collections, and the model has to model the context to send to the prompt, and then the model gives a good response. So you have a vector database. You have a generative AI model. You hit the context, get the real context from the vector database, mix it up with a question, and send to the generative model, and then you get the response. The model is not even tested to change. Just the interaction is happening with multiple teams. So it's all centered around the generative model, and the teams and datasets are accessing it according to their respective understandings. Here, the database is separate for everyone, or the collection is separate for everyone. And, the use case is different for everyone, so they have to write their steps of instructions. So it's all a combination of steps, databases, and then the model.
I think we split in model for chatbot project and just for your choice. Okay. For chatbot project, very good hack invest model. Obviously, I would say there are many. I know we can use Mistral. We can use NAMA too. We can use even Bert. Why not? Many times, if the chatbot use case is very small domain or small set of tasks, so we can go for smaller models because smaller models can be fine-tuned very easily and they're fast in inference. They're fast in inference. So chatbot models are good. I mean, small models are good for small tasks specific tasks, and they're very fast in inference. They don't need to wait because when you hit a larger model, it takes time to infer. Small models for small tasks are good. For a generic task, you need a bigger model. For a small task, small models are good. And sometimes, if we disable specific rule-based understanding of the chatbot, and all. So then you can use three to four models like a small intent classifier, a small entity classifier, and then a small dialogue generation model. So three models you can use and then combine them. Now if it is like broader, then you can use a small specific task-based generation model like Llama 7B or Mister 7B or 13B, or some small models, or even BERT, if it is fine-tuned, or BigBird, if it is fine-tuned. So that's a good choice. If it is a really broad use case, then we have a bigger model like LaMaa, a 270 million model or something. Right? So if it's really big, it's very good at understanding and replying back accordingly. Right? So there are a lot of small digital models, so we can use even those. They're very fast and small, but have the same accuracy. Secondly, you can also implement it with the quantization of, for example, 16-bit or 18-bit, instead of using 32-bit, they are faster. Because chatbots have to be fast. We can't wait for a response. It can't be generating; it has to be giving the response.
Purpose in an approach to fine tune a GPT two model, specifically for client's domain specific language. Okay, so it is quite straightforward. You need to have that domain knowledge, firstly, and then you have to create the dataset for that particular use case, and then you use a GBT two model, where you have your input and output or dual inputs and one output, or whatever is the idea. So it's input and output. You are using a decoder mode, a decoder-only model. Okay. So, you have to use the same embeddings. First, whatever input you are having, embed it with GPT two embeddings, then give your input and output, and then it trains it one by one. And then based on the response or loss, it can understand the performance of the model, and then you can retrain or something. Right? So the fine-tuning approach is straightforward. Get the dataset, embed it, and give the input and output in the right formats and embedded formats, and then train it. Few things to make sure is the data should be very good, and you have to choose the inputs accordingly. You can have five inputs also. Right? You have to choose the inputs accordingly. The input and the output have to be related to it. So, this is a step. And the last, when a model is made, so it's basically GPT two is like next word prediction, next sentence prediction. It's like causal learning. Right? So it is like, when you hear it again, for every word, it is saying to put the next word. So, that is an end-to-end approach. You can also implement pretraining if required, not just fine-tuning.