profile-pic
Vetted Talent

Midhilesh Momidi

Vetted Talent

I am Midhilesh, a seasoned Data Scientist with over 5 years of experience crafting data-driven solutions, specializing in Regression Models and Data Mining Algorithms. Proficient in Python, MySQL, NLP, TensorFlow, and Pyspark, I have also delved into Software Development, contributing to web applications for machine learning using Streamlit and Django. Personally, my curiosity extends to exploring diverse data types, staying updated on AI advancements, and actively participating in coding competitions on platforms like Hackerearth, Codechef, and Kaggle. Excited about the prospect of contributing to your team and company, I am confident that my skills and passion align seamlessly with your objectives.

  • Role

    Senior ML Engineer

  • Years of Experience

    10 years

  • Professional Portfolio

    View here

Skillsets

  • LangGraph
  • PySpark
  • NLP
  • PyTorch
  • MLOps
  • LLMs
  • Airflow
  • LangChain
  • Kubernetes
  • AWS
  • Kubernetes
  • CICD
  • Fine Tuning
  • Flask
  • GPT
  • Hive
  • Python
  • LLAMA
  • Machine Learning
  • model optimization
  • MongoDB
  • NoSQL
  • Quantization
  • Recommender systems
  • Cassandra
  • Azure AI Search
  • Azure doc intelligence
  • Azure foundry
  • Microsoft agent framework
  • Agentic ai/rag
  • Dockers
  • OOPs
  • FastAPI - 6 Years
  • Python - 9 Years
  • Deep Learning - 5 Years
  • Deep Learning - 7 Years
  • NLP - 5 Years
  • PySpark - 5 Years
  • AWS - 6 Years
  • Airflow - 5 Years
  • Hadoop - 5 Years
  • Kubernetes - 5 Years
  • MLOps - 5 Years
  • Transformers
  • MLFlow - 7 Years
  • PyTorch - 7 Years
  • Sagemaker - 5 Years
  • TensorFlow - 6 Years
  • Python - 9.0 Years
  • LLMs - 2 Years
  • Spark ML
  • Spark ML
  • Python
  • PySpark
  • NLP
  • PyTorch
  • MLOps
  • LLMs
  • Airflow
  • LangChain
  • Kubernetes
  • AWS
  • Kubernetes

Vetted For

15Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Machine Learning Engineer, AI/ML, Content (Remote)AI Screening
  • 68%
    icon-arrow-down
  • Skills assessed :Collaboration, Communication, Automated Testing, Data preprocessing, Deep Learning, Model evaluation, Natural Language Processing, PyTorch, Reinforcement Learning, TensorFlow, Good Team Player, NLP, Problem Solving Attitude, Python, Strong Attention to Detail
  • Score: 61/90

Professional Summary

10Years
  • Sep, 2025 - Present 9 months

    Senior ML Engineer

    Microsoft
  • Nov, 2024 - Sep, 2025 10 months

    Senior ML Engineer

    Walmart
  • Oct, 2021 - Nov, 20243 yr 1 month

    ML Engineer II

    Dell Technologies
  • Mar, 2016 - Feb, 20214 yr 11 months

    Data Scientist

    Tcs
  • Mar, 2021 - Sep, 2021 6 months

    Sr. Data Scientist

    Ernst & Young

Applications & Tools Known

  • icon-tool

    MySQL

  • icon-tool

    Git

  • icon-tool

    Python

  • icon-tool

    MongoDB

  • icon-tool

    Visual Studio Code

  • icon-tool

    Apache

  • icon-tool

    PostgreSQL

  • icon-tool

    REST API

  • icon-tool

    Javascript

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Azure Machine Learning Studio

  • icon-tool

    AWS Athena

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    Apache Airflow

  • icon-tool

    MLFlow

  • icon-tool

    Multithreading

  • icon-tool

    Multiprocessing

  • icon-tool

    OOPS

  • icon-tool

    Natural Language Processing

  • icon-tool

    NLP

  • icon-tool

    LLMs

  • icon-tool

    Transformers

  • icon-tool

    Neural Networks

  • icon-tool

    Machine Learning

  • icon-tool

    Pyspark

  • icon-tool

    Django

  • icon-tool

    Recommender Systems

  • icon-tool

    Hive

  • icon-tool

    Hadoop

  • icon-tool

    AWS

  • icon-tool

    Airflow

  • icon-tool

    MLFlow

  • icon-tool

    CICD

  • icon-tool

    Flask

  • icon-tool

    FastAPI

  • icon-tool

    Redis

  • icon-tool

    Apache Kafka

  • icon-tool

    Celery

  • icon-tool

    RabbitMQ

  • icon-tool

    Postgres

  • icon-tool

    Pandas

  • icon-tool

    Scikit-learn

  • icon-tool

    Keras

  • icon-tool

    MLFlow

  • icon-tool

    Redis

  • icon-tool

    Airflow

  • icon-tool

    Plotly

  • icon-tool

    CloudWatch

  • icon-tool

    Spark

  • icon-tool

    Zephyr

  • icon-tool

    Redis

  • icon-tool

    AWS Glue

  • icon-tool

    Plotly

  • icon-tool

    Beautiful Soup

  • icon-tool

    Jenkins

  • icon-tool

    Grafana

  • icon-tool

    Splunk

  • icon-tool

    Apache Cassandra

  • icon-tool

    Redis

  • icon-tool

    Plotly

  • icon-tool

    Camelot

  • icon-tool

    S3

  • icon-tool

    Azure Cognitive Services

  • icon-tool

    Redis

Work History

10Years

Senior ML Engineer

Microsoft
Sep, 2025 - Present 9 months
    Built a multi-agentic SRE system on Microsoft Agent Framework with fine-tuned task-specific agents that autonomously triage incidents end-to-end correlating Kusto telemetry and logs, executing remediation runbooks, raising ICMs with structured RCAs, and generating TSGs and bug analysis dashboards for recurring fault codes reducing manual intervention by 60%. Built Coolville Copilot, an AI assistant enabling firmware failure investigation and automated remediation across 12,500+ nodes daily, with Text-to-KQL translation for large-scale telemetry querying and agent-driven ICM creation for incident tracking. Developed a Multimodal RAG system over PDFs, PPTs, flowcharts, ADO tasks, and knowledge articles using Azure AI Search with hybrid retrieval to map firmware fault codes to step-by-step resolution workflows for firmware engineers. Integrated evaluation frameworks within the CoreIQ AI platform to enable onboarding of AI agents, systematic bench marking and deployment of Microsoft Foundry models.

Senior ML Engineer

Walmart
Nov, 2024 - Sep, 2025 10 months
    Developed a Multi-Agentic RAG architecture using LangGraph with Redis for rapid agent-state retrieval and FastRTC for real-time voice-agent Speech to Text and Text to Speech replacing existing IVR Call Routing. Achieved 1-3 sec Time-to-First-Token (TTFT) for 90% calls caching FAQ responses and eliminated redundant handoffs to human agents, delivering an estimated 200,000+ hours in annual operational savings. Developed a real time monitoring and observability loop using Kafka to store the telemetry data to Big Query. Fine-tuned a Policy Recommender model using the Llama-2 13B model with Parameter-Efficient Fine-Tuning (LoRA), serving 2.3M associates, and optimized inference throughput using vLLM. Developed and deployed a scalable Translation App for Walmart store associates with a daily active usage of 20,000+ users leveraging Azure Cognitive Services for real-time STT, TTS and TTT translations utilizing Kafka and WebSockets achieving translation and transcription in 300ms.

ML Engineer II

Dell Technologies
Oct, 2021 - Nov, 20243 yr 1 month
    Led a team of 4 engineers to build a production RAG pipeline serving 20,000 requests/day across stateless Kubernetes pods with Redis-based conversation management and session persistence to BigQuery. Implemented hybrid retrieval using all-mpnet-base-v2 with Postgres pgvector (IVF + HNSW indexing) and metadata filtering for access control, ingesting 100K+ docs from Jira, Confluence, and ServiceNow. Implemented real-time evaluation using cosine threshold filtering and DeBERTa NLI-based Reflection Design Pattern (20ms) replacing Ragas which saves 5x LLM calls for evaluation for iterative response validation against retrieved context, cutting evaluation latency and returning responses within 2 sec. Designed async evaluation pipeline using Ragas with Zephyr 7B on daily query samples from BigQuery, with user feedback loop through Kafka powering Grafana dashboards for continuous retrieval and generation quality monitoring. Reduced latency from 3-4s to 1-2s through vLLM serving with continuous batching and Paged Attention, AWQ for Llama2-13B, and document-type-specific chunking. Developed an end-to-end machine learning model for personalized recommendations, effectively for 30 million users across US, UK, Canada, APJ recommending 5-8 relevant Cross sell and Upsell recommendations serving under 400ms latency in peak time, enhancing product sales by 16% in UK. Developed embeddings for Order Codes and SKUs using multi stage retrieval strategy utilizing Two Tower Network architecture and and a multi-task learning ranking model with cross-encoder re-ranking, leveraging multi-GPU training using DDP to accelerate training on large datasets. Developed data pipelines for retrieving data from Hadoop and Greenplum, using PySpark and Airflow for scheduling daily data synchronization tasks and update the data to Feature Store (Feast) to both online and offline tables for ranking model. Leveraged Redis as an in-memory database for low-latency retrieval of real-time predictions, and designed fallback strategies (popularity-based, item-based) to handle cold-start scenarios for new users and product. Implemented sequence-based deep learning models using LSTM to predict the next best action for product brand categories, leveraging multi-GPU training using Pytorch to handle large-scale data efficiently. Utilized AWS Glue crawlers to manage and discover data stored in S3, enabling seamless integration with downstream machine learning workflows. Developed a multiclass classification model using XGBoost to discern user intent whether visitors to the site came to shop, browse, or seek support enhancing user experience and site engagement. Leveraged Sagemaker endpoints to serve batch predictions, allowing for scalable and efficient deployment of machine learning models.

Sr. Data Scientist

Ernst & Young
Mar, 2021 - Sep, 2021 6 months
    Developed an ETL pipeline using PySpark to effectively process millions of rows fetching data from different resources like extracting data from invoices using Camelot, Greenplum, Hadoop etc and Airflow as the scheduler to schedule the execution of Spark jobs. Developed a customizable analytics dashboard using Plotly and integrated it into a backend web application using Django, enhancing data visualization and user interaction. Created a data pipeline to extract invoice information from PDFs using Camelot, and subsequently developed a web application using Django to provide a user-friendly interface for accessing and managing the extracted data.

Data Scientist

Tcs
Mar, 2016 - Feb, 20214 yr 11 months
    Developed a topic modeling algorithm using Latent Dirichlet Allocation (LDA) to categorize 900 million in-house IT tickets. Implemented with distributed ML library Spark MLlib, this system efficiently routes tickets to the appropriate agents, improving service level agreement (SLA) response times from 3 days to 1 day. Developed classification models to predict room types, hotel categories, discounts, and rewards for users in the hospitality industry, significantly enhancing guest experiences. Reduced the latency of I/O operations for reading from S3 buckets by implementing multithreading and caching strategies, streamlining data access and processing.

Achievements

  • Implemented sequence based deep learning models to predict the product brand category, increasing sales by 16%
  • Developed end to end ML model for Cross Sell Recommendations for 30M users
  • Part of the Content Modernization team at Dell
  • Built RAG pipeline for Dell Chat application
  • Developed auto tagging with GPT 3.5 turbo model
  • Developed customized Langchain agents
  • Applied Double Machine Learning techniques for product pricing
  • Implemented multiclass classification model for user intent prediction
  • Built data pipelines for daily data sync up
  • Built Gitlab CICD pipeline achieving 90% Dell DevOps Maturity Score

Major Projects

3Projects

Translation App for Walmart Associates

    Scalable app for real-time translations (speech/text) using Azure Cognitive Services, Kafka, and WebSocket.

Cross-Sell Recommendation System

    Developed personalized machine learning model recommending SKUs, improving user engagement and sales across regions.

RAG Pipeline for Dell Chat

    Enhanced content generation, embedding retrieval, and latency optimization for Dell's Chat system using Kubernetes, Postgres, and Kafka.

Education

  • B. TECH

    JNTUA (2015)
  • 12th

    Sri Chaitanya Jr. College (2011)
  • SSC

    Sai Vidyanikethan EM School

Certifications

  • Machine Learning

    Coursera (May, 2019)
    Credential ID : RN4XAC8JJN76
    Credential URL : Click here to view
  • Python Problem Solving

    Hackerrank (Sep, 2020)
    Credential ID : CE2A871B5406
    Credential URL : Click here to view

Interests

  • Adventure Activity
  • Watching Movies
  • Long Rides
  • Driving
  • Bike Rides
  • Chess
  • Outdoor Sports
  • AI-interview Questions & Answers

    Yeah, I completed my graduation in 2015 and in 2016 I joined TCS, then I've been a part of TCS for about 5 years, then I worked on NLP topic modeling using LDA. So I used Spark, MLlib and for training and validating, creating endpoints using Lambda. So everything I used SageMaker for data transformations, I used AWS Athena. And later I worked on a hospitality project where I'm trying to, it's a predictive modeling project where I try to predict what kind of hotel, what kind of room and basically predicting the user behavior, whether he will be able to find out the recorded coupons, sorry, discount coupons, whether he will take those or not. So that's one project. And later I worked in EY, there I've been a very short amount of time in EY and I worked on data engineering and a little bit of backend engineering, working on data pipelines, extracting data from the invoices which they provided. And currently I'm working at Dell Technologies, here I've been working on cross-sell upsell recommendations, like when an item is added to the cart, it also says that people who bought this also bought this, it's such a kind of scenario. And next best action using LSTM and Siamese Networks and Transformers, like what kind of brand or category user actually is going to buy next, like if he has visited a website or different webpages on the dell.com website, then what is the next best item he is actually going to buy. The other one was like an intent model, so a classification model, so here we used execution classification to analyze what kind of user he is, whether he is coming to shop or whether he is coming to browse. The shop instance is purchasing some items on the website, browsing is basically like he is looking for some services regarding his laptop. And from the past, I've used the entire end-to-end, I built data pipelines, model pipelines, inference pipelines, created roadmaps, ran illustrations to the stakeholders. So using Docker, Kubernetes, MLflow for model versioning, Airflow for scheduling and all these things I generally use for the entire projects which I've been working here. And from the past 6-7 months, I am currently working on LLMs, basically building the RAC pipelines for content ingestion, we are using Apache Kafka for, I've started using Apache Kafka recently, but the remaining parts like creating embeddings, creating chunkings, using lang chain, recursive text splitters and other NLP browsing techniques and also creating taggings using GPT-3.5 turbo and using open-source LLMs which is already hosted on our on-prem servers like Lama 2, Zephyr models currently, which I am currently using and about to test for Lama 3 as well. So I created a pipeline for this, so all the data which has been loaded into the DL, created embeddings, chunking strategies after that. So these vectors will be loaded to PostgreSQL DB and from there while retrieving using chain of thoughts, kind of prompt engineering, making it, testing and making it in a better manner. So this is my, and also I am part of the team who provides SDKs for feature store and all, so this is my entire experience.

    Let's say we have a multi-lingual dataset that definitely needs to be categorized like which data was there, which languages were there, so we categorize the dataset into different languages, so definitely preprocessing for one language will never be able to work for other languages, so any language itself the punctuation is actually the same, so I would go for different text preprocessing techniques or something like that maybe, so like removing punctuations, stop words using NLTK, phrase matches using spaCy, this kind of techniques we can be able to employ using this, maybe LLMs have much better opportunities for this, but I haven't exactly worked on this in the past, so let me take a scenario on this part. Okay, will it work if it is probably in English, I think this also works with other languages, okay maybe we can come and say probably step by step answers, for example text preprocessing we can use some NLP techniques using text normalization, for example removing punctuations, lower case letters, standardization, so these kinds of things we can be able to use and we can do tokenizing like splitting the text into words, splitting the huge paragraphs into some kind of sentences, so sometimes this tokenization can be language specific, so different languages will probably have like different techniques, so I would probably check into any LLMs which actually use this for better processing, so part of speech tagging is different, memory recognition is different here, find out which language and using some kind of encoding, maybe check some pre-trained models for these tasks, so for that I can be able to do some checking like tags, phrase matches in this, so basically what we can do in English, we can try to emulate the same in other languages itself but the idea is the same, like chunking, splitting, lemmatization, stemming and everything is mostly different but the way of doing is probably a little bit not different but the remaining idea is actually the same.

    Reinforcement learning I haven't used, I'm not very well aware of reinforcement learning in my experience. I haven't worked on that, so I cannot comment much on that part. But one thing where we can use reinforcement learning is, we are actually trying to build a Dell chat application. So for that, we are creating embeddings and loading into PyTorch Geometric's Vector. So based on that, once we retrieve the embeddings, once this application has been taken to production, users might be able to ask the chat application, chatbot, like "My laptop is getting this kind of issue, so what should I do?" So it will provide some answers. So based on the answers, sometimes the user might be satisfied or not satisfied. So based on that, we can ask the user to provide feedback, provide a review, whether the answers are actually what we're looking for, did we solve the problem? If we put these kinds of questions, we can use that as a chain of thoughts and send some kind of input and review to LLMs. So, at that point of time, if we add a reinforcement learning structure to that problem, I think it can understand, just like how we use in chat, the user can say, "Your answer is wrong, your answer is not exactly what I'm looking for." If we add these kinds of words to the problem, we can use reinforcement learning at this point. So the Dell chat application is like our company's content monetization team. So ideally, at some point of time, reinforcement learning has to be added to that part. But as of now, I cannot answer this question more clearly, I know the idea of reinforcement learning can be used here, RLHF, so this can be done. But I cannot give more details on this part because I don't have much experience with it, I don't have experience with reinforcement learning. But the idea behind it I can tell, but I cannot go into the contents.

    Yeah, knowledge graphs are actually very useful for when there are lots of summaries involved in travel websites like TripAdvisor or Booking.com or Expedia, these kinds of travel aggregators or maybe Trivago. So when users give lots of reviews, we can generate tags from that. So let's say you have a knowledge graph. In the knowledge graph, we can mention the tags. Each tag can be a relationship between one point to another point. Let's say if two users have given feedback, we can find how these two users are similar. So based on the keywords they have used, based on the content they have used, what is the polarization of the summary they have given the feedback. So based on all these factors, if we create a knowledge graph in an embedded space like a huge dimensional space. Once a new user comes to the website and asks about like the best hotels on the beach side view or something like that. I'm looking for something with this kind of facilities. So based on the feedbacks provided by the users, the points on the embedding space calculate. The backend calculation of knowledge graphs are probably DFS and BFS. So based on those calculations, where can this query be embedded into the knowledge graph? This creates a relationship between the points in the knowledge graph. I'm not exactly sure what the backend algorithm is that calculates this part, I haven't worked on that. I've been working on the vector embeddings more than the knowledge graph part. But ideally, I think this is how we can use the knowledge graph in AI development. Now, users keep commenting, keep asking the queries, keep asking answers, and posting their reviews. So based on those reviews, especially the tags, the metadata, what is the summary? What kind of words are in this summary? So based on all those points, we can find some content. Let's say two words are almost similar in word to vector embeddings. In the knowledge graph, that space is there. So how can we connect this summary to that summary? Which is the closest one? So whatever is the closest one, that review we will get, we will see the user can see. That's how I think TripAdvisor also shows the reviews if I ask about some point, it automatically gives summaries, feedbacks, and what other people give. And also in a concise manner without deleting the context of the user. TripAdvisor actually provides us the content for this as well, and knowledge graphs can be useful.

    Transformer-based model for language translation. Language translation, got it, one set if you give to another set, so basically this is an encoder-decoder model as the starting step, so we can understand based on using self-attention mechanisms probably, the core idea behind the transformer is the self-attention mechanism, of course, which allows inputs to interact with each other and this is the significance of each input independently based on their position in that sequence. So first, I will probably go for data preparation, so where each data contains lots of samples and probably use different kinds of libraries in PyTorch or maybe TensorFlow, it doesn't matter which language. So basically, the model generally consists of several layers; if I want to use a transformer model, the first step is creating embedding layers, which convert inputs to tokens, then positional encoding, which adds personal information to the input embedding. Since transformers do not have recurrent layers, the encoder will be there, which is composed of a stack of similar layers with sublayers of attention mechanisms and feed-forward neural networks. After that, the decoder; whatever is happening in the encoder, almost the same will happen in terms of the decoder, but with an additional sublayer that actually performs self-attention or multihead attention over the encoder's output. So the final layer is the decoder output, which decodes the output to the size of the vocabulary. So I think this is how we try to, on average, do anything on a high level, actually not on a high level. So the loss function can be cross-entropy loss, since it is a classification type, so the optimizer can be AdaGrad or Adam optimizer or RMSprop, then do some evaluation and testing based on a separate test set, like taking different samples; if you are doing Spanish to English, take different Spanish texts that have not been seen and work on that. So a metric like the BLEU score or something, which actually assesses this.

    Generally, overfitting can be addressed. So, generally, the overfitting can actually be addressed based on some kind of scenarios where if you have more data, try to reduce the data a little bit less. So, or sometimes the model might have been learning too much complexity and too many patterns in the data. So try to reduce some of the complexities, use dropout regularization parameters so that it cannot learn entire data, so that way we can prevent overfitting. I am saying this in general, not just in personalized recommender systems. So, maybe L1, L2 regularization, monitoring the validation loss, and doing some hyperparameter tuning are some of the techniques we generally use on a machine learning model. But specifically for recommending personalized travel itineraries, the best idea is actually to understand the metrics and understand what kind of predictions we are actually giving. So based on that, we can be able to once the model is deployed, get data from the product analytics team, get data from the consumption analytics team, and see what kind of recommendations we are actually giving. So are the users satisfied with that? So we can ask some user annotations, and users can give some kind of answers to us. So based on that, we can do these kinds of things for personalized travel itineraries, anything mostly based on recommendations. Why? Because I told to deploy the basic model first and start building from there, because user engagement will be much different than whatever we see. So based on that, if you want to train a model, whatever happens in the previous thing, previous years might not be the same at the current time. This is not some kind of a different ML model project, like describing patterns and all, but here it actually changes a lot. So, that's why we deploy the model, get some analytics, figure out what has gone wrong, whether the data drift is too much. That is one point. So that way we can reduce overfitting and use the normal overfitting techniques we use, like regularization, dropout layers, feature selection, feature engineering, reduce some of the complexities, if there is any multicollinearity between features, reduce these things.

    So, I am not exactly sure what "select k best" actually is, probably I am thinking of "select k best" which is based on the chi-square test for statistical relation, but sometimes this is not exactly good, you cannot even say that this might always give a negative impact on a model evaluation, but it also sometimes provides positive results, but along with that you actually have to think about seeing whether there is too much of a relationship between the other variables. For example, if there is too much collinearity between the other variables, so we can be able to reduce some features, that we can also be able to do by using the variable inflation factor, some of the regression variables, continuous variables, we can be able to reduce if two are actually giving the highest importance almost similar importance to the target variable, I think one variable we can delete. So, in that way we can also be able to use, but doing it in this manner, this probably works, but we cannot be able to say exactly, this is going to give the wrong answer. So, the matrix has given us an accuracy score, I assume that for this project, accuracy score works, but ideally, this may not work. So, that is one step, and also for the feature selection part, we can use LASSO regression, I think LASSO is probably LASSO regression, which actually tries to reduce the number of features, so that also we can be able to use. A negative impact on the model in the sense, in case if this is giving a negative impact, mostly if you probably lose a lot of information from the other features, which may be very helpful for the model. So, in that way also, this might give a negative impact, but in general, we cannot be able to say exactly, this is going to be what kind of negative, how much negative impact, and what kind of negative impact on the model evaluation part. And one more thing, is like data leakage might be usually here, because we are using interest splitting at the chart itself. So, do not validate in that case, do not validate on the seen data, in this next test. So, do your validation on the unseen data, which is not even actually seen right now, gets these kinds of feature engineering stuff. So, what exactly happens is like, to the test with the outside world, then only we can be able to understand, so how much negative impact it does, but as of now, we cannot be able to completely assess, like how much impact it does, but there might be an issue, because we are not considering other features.

    Okay, this is the best, the best one is like MLflow, or some people may use Kubeflow. So MLflow is probably in my idea, which is mostly the best one, which is an open-source one. And also, it can be seamlessly integrated with different cloud environments, such as AWS, Azure, Databricks, and GCP. So multiple data scientists, since I am one of them, who is actually building the platform for our team to use MLflow. So I'm creating a code, which basically works for any kind of project in MLflow, like they have to use this code, but they also have to use only these functions, always these functions. You can use any other functions, but finally call your function in one of the MLflow operations functions, which we are providing. So version control is the best one. Once the model has been pushed, I mean, like once your code has been pushed to the repo, so automatically the CSED pipeline will try to run the code and find out the best model based on the hyperparameters, which is actually running in the background. So that model will be stored to MLflow as the production model, not the archive. So whatever the user-defined metrics, let's say based on the best precision, if I'm running five test runs on my experiment, meaning five models, let's say I'm running linear regression with five hyperparameters and logistic regression with five hyperparameters. And again, XGBoost with some five hyperparameters or maybe deep learning. So based on all these things, whichever has been giving the highest accuracy or precision or recall or whatever the metrics or user-defined metrics, which we use, especially for recommendation kind of projects, like click rate, conversion rate, click-to-purchase conversion rates, so these kinds of things. So whichever the best model will go to production. So by this way, any other data scientists or multiple data scientists in the future can work on the models, which have actually been stored in MLflow. So we all should have one MLflow instance, almost all the data scientists who are using, even for different projects, they can pull the model, fetch the model, or they can log the model. They can use the previously loaded model somewhere as they specifically find, they can use that model and judge that model. And they can collaborate with different people, especially, I mean, like in other scenarios, MLflow should be the best option for version control for model part. But for code part, I think GitLab should be the best choice, as I'm working on that. There are different techniques like probably Jenkins and things also we use with GitHub, but currently, I'm using GitLab, which has everything inbuilt for CI, CD, no need to go for some other tool for CD, you can have everything for CI, CD in GitLab itself. So based on this, multiple data scientists will collaborate with different teams, sorry, projects and models, mission and project.

    As I said, I have not been able to work on this type of project, but I have worked on reinforcement learning before. However, since it is a chat-like conversational AI-like approach, we can push our models to the repo. I am sorry, I am confused. So, the user will interact with the website, and key travelers will keep asking questions, getting responses, and saying this is wrong, this is right. Whatever the user mentions as wrong or right, it has to take this feedback into the back-end efficiently. Once the user starts something, the responses I can generate have to be in cache, maybe Redis or something, because my token limit is 1 million tokens or 16,000 tokens or 70,000, whichever the amount of tokens, let's say 100,000 tokens. 100,000 tokens is a very big amount of tokens because the back-end keeps summarizing this content by any LLM, which is actually done. We can build a reinforcement learning system there, where it says wrong, and immediately it has to retrain and check and go to the RAG pipeline built in the back-end, and send some more and do a bit more calculation, like changing the temperature of the query. Just do top k instead of top k documents, like 3, make it 4 or make it 5. More content will be retrieved, and we will get it. From that, we try to get the best response out of this, like based on some kind of score, I exactly did not remember what kind of score is that, but based on that score, you analyze each of those responses generated by the RAG, retrieved from the RAG, and now you generate this answer to the user, like based on whichever is the best one. Again, he says something wrong, now go back, and ask him like what exactly he is looking for if he keeps sending those answers. So based on his keywords, if he has sent something, get that more information as a new token. This token also will have some kind of keywords, tags, summaries, polarizations, positive and negative, everything. So based on that, I can now retrieve with adding this and previous responses. Now the previous responses should be in the cache, of course, because you have to know that the previous response is a wrong one, now I have to make choices. So at this point, we can be able to use a reinforcement learning system, but I have not actually worked on reinforcement learning, but this is how, I think this is the spot, I think we can be able to use the reinforcement learning.

    I have done a lot of work, for example, on one of the NLP projects which I deployed in TCS. So, using topic modeling, the agenda is like this: once the user raises a ticket, it automatically creates some kind of tags for the ticket. This is based on the tags, especially called topics in LDA, which automatically load the tickets to the right agent. As a result, the business SLA has been reduced from 3 to 5 days for this project. The issue we are facing is that if a user raises a ticket like "my Ultimatics has been blocked" or "my Ultimatics access has been blocked," what happens is that Ultimatics is a keyword for a TCS website. The system thinks this person's account got locked, so it automatically sends the ticket to Ultimatics, but actually, he is looking for something like access, and his credentials are good, but his access has been broken. To address this issue, I made adjustments to the machine learning model by using bi-grams, tri-grams, and phrase matches instead of 1-grams. This way, whether it's a positive or negative scenario, the system can understand what exactly the user is doing. Based on our first requirements, we pushed these changes, and initially, we were getting correct results for only 33% of queries, but now we are getting around 60-70% good results. There are many situations like this where we have to change in business environments. For example, the cross-service recommendations which I have been working on. I initially set up the entire pipeline, and since the pipeline is working well and is scalable and reliable, we were able to scale to Canada, the UK, and EMEA, other European countries, and APJC as well. In Canada and the UK, we were able to deploy the models within 14 days, or two weeks. To accommodate changing business requirements, we should have a robust pipeline in place, so that once something new comes up, we can easily deploy it. I believe that is the way to do it. I have even tried to modify the machine learning model and add blacklist items to the model, doing some kind of filters to sort out the reviews, to sort out the feedbacks, and finally giving the right predictions, cross all recommendations. I have also used rule-based use cases and rule-based techniques, like the final scoring part for the recommendation system. For different projects, there will obviously be different business requirements, so I have used that.

    Yeah, this part is like, you know, we can have integration tests, unit tests, entirely built in the code pipeline itself, like CICD tools. So once you have the entire data code, for every code you're creating, try to write a unit test for that. In this manner, code coverage will be covered. Every organization will have DevOps maturity scores. We also have such scenarios. Based on that, you have to provide the code coverages. Code coverage actually creates unit test cases. Unit tests have to be run before the model is pushed to production, to ML flow, so everything. Check for vulnerabilities in the data, whether data distortions are occurring currently. For this, we can use DVC for data engineering, data pipelines, and for automated testing, we can use integration tests. This will not run for the entire data, it will only check whether our pipeline is performing well from start to end or not, whether it's running unit test cases or not, it's running properly, code coverage is there or not, is it properly creating the Docker image, creating the secrets in the Kubernetes, hosting the API. All these things we can run as part of integration tests, or if we're looking for a machine learning pipeline, whether it's training, whether it's loading the results to the DB, whether it's loading the model to the ML flow, if it's not batch one, whether it's inferring faster. We can test all the aspects of the machine learning pipeline. The major idea behind this is to use integration tests to run the entire pipeline and check whether it's working, then move to the next stage, which is deployed to production, deployed to the Kubernetes cluster or whatever stages you have, machine learning CICD pipeline. This is how we can ensure data integrity as well as the entire pipeline, whether it's working or not.