Vetted Talent

Midhilesh Momidi

Vetted Talent

I am Midhilesh, a seasoned Data Scientist with over 5 years of experience crafting data-driven solutions, specializing in Regression Models and Data Mining Algorithms. Proficient in Python, MySQL, NLP, TensorFlow, and Pyspark, I have also delved into Software Development, contributing to web applications for machine learning using Streamlit and Django. Personally, my curiosity extends to exploring diverse data types, staying updated on AI advancements, and actively participating in coding competitions on platforms like Hackerearth, Codechef, and Kaggle. Excited about the prospect of contributing to your team and company, I am confident that my skills and passion align seamlessly with your objectives.

Role
Senior ML Engineer
Years of Experience
9.6 years
Professional Portfolio
View here

Skillsets

Flask
Kubernetes
Python
PySpark
NLP
PyTorch
MLOps
LLMs
Airflow
LangChain
Kubernetes
AWS
Kubernetes
CICD
Fine Tuning
AWS
GPT
Hive
LangGraph
LLAMA
Machine Learning
model optimization
MongoDB
NoSQL
OOPs
Quantization
Recommender systems
Docker
Agentic ai/rag
Cassandra
TensorFlow - 6 Years
Python - 9 Years
Deep Learning - 5 Years
Deep Learning - 7 Years
NLP - 5 Years
PySpark - 5 Years
AWS - 6 Years
Airflow - 5 Years
Hadoop - 5 Years
Kubernetes - 5 Years
MLOps - 5 Years
Transformers
MLFlow - 7 Years
PyTorch - 7 Years
Sagemaker - 5 Years
Python - 9.0 Years
FastAPI - 6 Years
LLMs - 2 Years
Spark ML
Spark ML
Python
PySpark
NLP
PyTorch
MLOps
LLMs
Airflow
LangChain
Kubernetes

Vetted For

15Skills

Roles & Skills
Results
Details

Machine Learning Engineer, AI/ML, Content (Remote)AI Screening
68%

Skills assessed :Collaboration, Communication, Automated Testing, Data preprocessing, Deep Learning, Model evaluation, Natural Language Processing, PyTorch, Reinforcement Learning, TensorFlow, Good Team Player, NLP, Problem Solving Attitude, Python, Strong Attention to Detail
Score: 61/90

Professional Summary

9.6Years

Nov, 2024 - Sep, 2025 10 months
Senior ML Engineer
Walmart
Oct, 2021 - Nov, 20243 yr 1 month
Senior ML Engineer
Dell Technologies
Mar, 2021 - Sep, 2021 6 months
Sr. Data Scientist
Ernst & Young
Mar, 2016 - Feb, 20214 yr 11 months
Data Scientist
TCS

Applications & Tools Known

MySQL
Git
Python
MongoDB
Visual Studio Code
Apache
PostgreSQL
REST API
Javascript
AWS (Amazon Web Services)
Azure Machine Learning Studio
AWS Athena
Docker
Kubernetes
Apache Airflow
MLFlow
Multithreading
Multiprocessing
OOPS
Natural Language Processing
NLP
LLMs
Transformers
Neural Networks
Machine Learning
Pyspark
Django
Recommender Systems
Hive
Hadoop
AWS
Airflow
MLFlow
CICD
Flask
FastAPI
Redis
Apache Kafka
Celery
RabbitMQ
Postgres
Pandas
Scikit-learn
Keras
MLFlow
Redis
Airflow
Plotly
CloudWatch
Spark
Zephyr
Redis
AWS Glue
Plotly
Beautiful Soup
Jenkins
Grafana
Splunk
Apache Cassandra
Redis
Plotly
Camelot
S3
Azure Cognitive Services
Redis

Work History

9.6Years

Senior ML Engineer

Walmart

Nov, 2024 - Sep, 2025 10 months

Developed a Multi-Agentic RAG architecture using LangGraph with Redis for rapid agent-state retrieval and FastRTC for real-time voice-agent Speech to Text and Text to Speech replacing existing IVR Call Routing. Achieved 4 - 7 sec Time-to-First-Token (TTFT) for 80% calls caching FAQ responses and eliminated redundant handoffs to human agents, delivering an estimated 200,000+ hours in annual operational savings. Developed a real time monitoring and observability loop using Kafka to store the telemetry data to Big Query. Fine-tuned a Policy Recommender model using the Llama-2 13B model with Parameter-Efficient Fine-Tuning (LoRA), serving 2.3M associates, and optimized inference throughput using vLLM. Developed and deployed a scalable Translation App for Walmart store associates with a daily active usage of 20,000+ users leveraging Azure Cognitive Services for real-time Text-to-Speech (TTS), Speech-to-Text (STT), and Text-to-Text (TTT) translations utilizing Kafka and WebSockets.

Senior ML Engineer

Dell Technologies

Oct, 2021 - Nov, 20243 yr 1 month

Led a team of 4 engineers to develop Retrieval-Augmented Generation (RAG) pipeline to production serving 2000 requests per day with end-to-end software architecture ensuring model reliability across different nodes. Utilized Postgres Vector DB for efficient storage and retrieval of embeddings using proper indexing like IVF and HSNW thus enhancing relevant content generation using LLMs like Llama2-13B serving with vLLM for faster inference. Used Ragas for evaluating the retrieval and generation metrics for efficient quality of the responses to the user and monitor the metrics through Grafana and develop a feedback loop using Kafka to store the responses and metrics to Big Query for analytics. Incorporated a Reflection Design Pattern mechanism where outputs are iteratively evaluated by itself reducing hallucinations and improving factual accuracy. Utilized Celery for distributed task queue management, with RabbitMQ as message queue and Postgres as the backend, to streamline complex data preprocessing operations such as chunking, document and html parsing, embedding generation, enhancing efficiency in managing embedding uploads to Postgres Vector db. Reduced the latency from 7-8 sec to 3 -4 sec utilizing model optimization and quantization and adding tags in VectorDB for the documents. Developed an end-to-end machine learning model for personalized recommendations, effectively for 30 million users across US, UK, Canada, APJ recommending 5-8 relevant Cross sell and Upsell recommendations serving under 400ms latency in peak time, enhancing UK laptop sales by 16% through targeted brand recommendations. Developed embeddings for Order Codes and SKUs (peripherals) using multi stage retrieval strategy utilizing Two Tower Network architecture utilizing Tensorflow and Transformer architectures, leveraging multi-GPU training to accelerate training on large datasets. Developed data pipelines for retrieving data from Hadoop and Greenplum, using PySpark and Airflow for scheduling daily data synchronization tasks and update the data to Feature Store (Feast) to both online and offline tables for ranking model. Leveraged Redis as an in-memory database for low-latency retrieval of real-time predictions, and designed fallback strategies (popularity-based, item-based) to handle cold-start scenarios for new users and products. Built and deployed a FastAPI service integrated with Redis and MongoDB for scalable recommendation retrieval, orchestrated on Kubernetes to ensure fault tolerance, efficient scaling, and high availability. Implemented sequence-based deep learning models using LSTM to predict the next best action for product brand categories, leveraging multi-GPU training using Pytorch to handle large-scale data efficiently. Utilized AWS Glue crawlers to manage and discover data stored in S3, enabling seamless integration with downstream machine learning workflows. Applied Double Machine Learning (Double ML) techniques to accurately predict baseline product prices, achieving optimal price elasticity of demand. Developed a multiclass classification model using XGBoost to discern user intentwhether visitors to the site came to shop, browse, or seek supportenhancing user experience and site engagement. Leveraged Sagemaker endpoints to serve batch predictions, allowing for scalable and efficient deployment of machine learning models.

Sr. Data Scientist

Ernst & Young

Mar, 2021 - Sep, 2021 6 months

Developed an ETL pipeline using PySpark to effectively process millions of rows fetching data from different resources like extracting data from invoices using Camelot, Greenplum, Hadoop etc and Airflow as the scheduler to schedule the execution of Spark jobs. Developed a customizable analytics dashboard using Plotly and integrated it into a backend web application using Django, enhancing data visualization and user interaction. Created a data pipeline to extract invoice information from PDFs using Camelot, and subsequently developed a web application using Django to provide a user-friendly interface for accessing and managing the extracted data.

Data Scientist

TCS

Mar, 2016 - Feb, 20214 yr 11 months

Developed a topic modeling algorithm using Latent Dirichlet Allocation (LDA) to categorize 900 million in-house IT tickets. Implemented with distributed ML library Spark MLlib, this system efficiently routes tickets to the appropriate agents, improving service level agreement (SLA) response times from 3 days to 1 day. Engineered neural network and tree-based models to predict flight arrivals within a +/- 5- minute window, enhancing operational accuracy and traveler satisfaction. Built a data pipeline using Beautiful Soup to scrape real-time data from the FlightStats website. Integrated this data into a model pipeline using AWS SageMaker, with performance metrics monitored through CloudWatch, ensuring continuous model optimization. Developed classification models to predict room types, hotel categories, discounts, and rewards for users in the hospitality industry, significantly enhancing guest experiences. Improved model performance from 65% to 75% by incorporating seasonality factors related to different time slots, providing more granular and in-depth analysis for strategic decision-making. Reduced the latency of I/O operations for reading from S3 buckets by implementing multithreading and caching strategies, streamlining data access and processing.

Achievements

Implemented sequence based deep learning models to predict the product brand category, increasing sales by 16%
Developed end to end ML model for Cross Sell Recommendations for 30M users
Part of the Content Modernization team at Dell
Built RAG pipeline for Dell Chat application
Developed auto tagging with GPT 3.5 turbo model
Developed customized Langchain agents
Applied Double Machine Learning techniques for product pricing
Implemented multiclass classification model for user intent prediction
Built data pipelines for daily data sync up
Built Gitlab CICD pipeline achieving 90% Dell DevOps Maturity Score

Major Projects

3Projects

Translation App for Walmart Associates

Scalable app for real-time translations (speech/text) using Azure Cognitive Services, Kafka, and WebSocket.

Cross-Sell Recommendation System

Developed personalized machine learning model recommending SKUs, improving user engagement and sales across regions.

RAG Pipeline for Dell Chat

Enhanced content generation, embedding retrieval, and latency optimization for Dell's Chat system using Kubernetes, Postgres, and Kafka.

Education

B. TECH
JNTUA (2015)
12th
Sri Chaitanya Jr. College (2011)
SSC
Sai Vidyanikethan EM School

Certifications

Machine Learning
Coursera (May, 2019)
Credential ID : RN4XAC8JJN76
Credential URL : Click here to view
Python Problem Solving
Hackerrank (Sep, 2020)
Credential ID : CE2A871B5406
Credential URL : Click here to view

Interests

Adventure Activity

Watching Movies

Long Rides

Driving

Bike Rides

Chess

Outdoor Sports

AI-interview Questions & Answers

Yeah, I completed my grad in 2015 and in 2016 I have joined TCS, then I have been a part of TCS for about 5 years, then I have worked on NLP topic modeling using LDA. So I have used Spark, MLlib and for training and validating, creating endpoint using Lambda. So everything I have used SageMaker for data transformations, I have used AWS Athena. And later I have worked on a hospitality project where I'm trying to, it's a predictive modeling project where I try to predict what kind of hotel, what kind of room and basically predicting the user behavior, whether he will be able to, you know, find out the recorded coupons, sorry, discount coupons, whether he will take those or not. So that's one project. And later I have worked in EY, there I have been a very less amount of time in EY and I worked on data engineering and little bit of backend engineering, working on data pipelines, extracting data from the invoices which they have provided. And currently I'm working at Dell Technologies, here I have been working on cross-sell upsell recommendations, like when item is added to the cart, it also says that people who bought this also bought this, it's such kind of scenarios. And next best action using LSTM and Siamese Networks and Transformers, like what kind of brand or category user actually is going to buy the next sequence, like if he has visited website or different webpages on the dell.com website, then what is the next best item is actually going to buy. The other one was like an intent model. So classification model, so here we have used execution classification to analyze what kind of user he is, whether he is coming for shop or whether he is coming for browsing. The shop instance is purchasing some items on the website, browsing is basically like he is looking for some services regarding his laptop, which is maybe 41 or something like that. And from the past, here I have used the entire end-to-end, I build data pipelines, model pipelines, inference pipelines, create roadmaps, run illustrations to the stakeholders. So using the Docker, Kubernetes, MLflow for model versioning, Airflow for scheduling and all these things I generally use for the entire projects which I have been working here. And from the past 6-7 months, I am currently working on LLMs, basically building the RAC pipelines for content ingestion, we are using Apache Kafka for, I have started Apache Kafka very recently, but the remaining parts like creating embeddings, creating chunkings, using lang chain, recursive text splitters and other NLP browsing techniques and also creating taggings using GPT-3.5 turbo and using open-source LLMs which is already hosted on our on-prem servers like Lama 2, Zephyr models currently, which I am currently using and about to test for Lama 3 as well. So I created a pipeline for this, so all the data which has been loaded into the DL, created embeddings, chunking strategies after that. So these vectors will be loaded to PostgreSQL DB and from there while retrieving using chain of thoughts, kind of prompt engineering, making it, testing and making it in a better manner. So this is my, and also I am part of the team who provides SDKs for feature store and all, so this is my entire experience.

Yeah, let's say we have like a multi-lingual that definitely we need to categorize like which data was there, like which languages was there, so categorize the data set into different languages, so definitely preprocessing for one language will never be able to work for other languages, so any language itself the punctuation is actually same, so I would go for like different like you know text to preprocessing techniques or something like that maybe, so like you know like removing punctuations, stop words using NLTK, phrase matches using spaCy, this kind of techniques we can be able to employ using this, maybe probably LLMs have much better opportunities for this, but I haven't exactly worked to the past, so let me take a scenario on this part. Okay, will it work if it is probably in English, I think this also work with other languages, okay maybe we can come and say probably step by step answers, for example text preprocessing we can use some NLP techniques using text normalization, for example removing punctuations, lower case letters, standardization, so these kinds of things we can be able to use and that we can do tokenizing like splitting the text into words, splitting the huge paragraphs into some kind of sentences, so sometimes this tokenization can be language specific right, so different languages will probably have like different techniques, so I would probably check into any LLMs which actually use this for better processing, so part of speech tagging is different, memory recognition is different here, find out which language and using some kind of encoding, maybe check some pre-trained models for these tasks, so for that I can be able to do some checking like tags, phrase matches in this, so basically what we can do in English, we can try to emulate the same in other languages itself but the idea is same like chunking, splitting, lemmatization, stemming and everything is mostly is different but the way of doing is probably a little bit not different but the remaining the idea is actually the same.

Reinforcement learning I haven't used, I'm not very well aware of reinforcement learning in my experience. I haven't worked on that, so I cannot be able to comment much on that part. But one thing where we can be able to use reinforcement learning is, we are actually trying to build a Dell chat application. So for that, we are creating embeddings and loading into pgVector. So based on that, once we retrieve the embeddings, so once this application has been gone to production, so users might be able to ask the chat application, chatbot, like my laptop is getting this kind of issue, so what should I do? So it will provide some answers. So based on the answers, sometimes the user might be satisfied or not satisfied. So based on that, we can be able to ask the user to provide feedback, provide review, whether the answers are actually what we are looking for, did we solve the problem? If we put these kinds of questions, we can be able to use that as some kind of a chain of thoughts and that also we can send us some kind of an input and review to LLMs. So that at that point of time, if we add a reinforcement learning structure to that problem, so I think obviously, it can be able to understand like same like how we use in chat, user can say your answer is wrong, your answer is not exactly what I am looking for. If we add these kinds of words to the problem, we can be able to use reinforcement learning at this point. So the dull chat application is like at our company, it is called as content monetization team. So ideally, at some point of time, reinforcement learning has to be added to that part. But as of now, I cannot be able to answer this question much more clearly, I know the idea of reinforcement learning can be used here, RLHF, so this can be done. But I cannot be able to give more details on this part because I do not have much experience on it, I do not have experience on reinforcement learning. But the idea behind I can be able to tell but cannot be able to go into the contents of the.

Yeah, knowledge graphs are actually is very useful for when there are like lots of summaries involved in travel websites like TripAdvisor or Booking.com or Expedia, these kinds of travel aggregators or maybe Trivago. So when users give lots of reviews, we can be able to generate tags from that. So let's say you have a knowledge graph. So in the knowledge graph, we can mention the tags. So each tag can be something like of a generates relationship between one point to another point. Let's say if two users are actually been, one user has given some feedback, another user has given some feedback. So how these two users have actually been similar. So based on the keywords which they have used, based on the content they have used, what is the polarization of the summary they have given the feedback. So based on all these factors, if we create a knowledge graph on the embedded space like a huge dimensional space. So once a new user comes to the website and asks about like a best hotels, hotels on like a beach side view or something like that. So I am looking for something which I have this kind of facilities. So based on the feedbacks provided by the users, so the points on the embedding space, so it actually calculates. The backend calculation of knowledge graphs are actually probably DFS and BFS. So based on those calculations, so where can be this query, where can this query be embedded into the vector space of like in the knowledge graph, like not vector sorry, in the knowledge graph. So this creates a relationship between, I am not exactly sure what is the backend algorithm which calculates on this part, I have not worked on that. I have been working on the vector embeddings more than the knowledge graph part. But ideally, I think this is how we can be able to use the knowledge graph in the AI development. Now users keeps commenting, keep asking the queries, keep asking answers and like posting their reviews and everything. So based on those reviews, so especially the tags, the metadata, so on all these things, what is his summary? So what his summary has like a tagging, so what kind of words in this his summary has. So based on all those points, so we can be able to find some content. So let us say two words are almost similar in word to vector embeddings. So in the knowledge graph, that space is there. So how can we be able to connect to this summary to that summary? So which is the closest one? So whatever is the closest one, that reviews we will be get it, we will see the user can see. That is how actually I think TripAdvisor also shows the reviews if I ask about some point, so automatically it gives summaries, what not summaries, feedbacks what other people give. And also in a concise manner without deleting the context of the user. So TripAdvisor actually provides us the content for this as well knowledge graphs can be useful.

Transformer based model for language translation. Language translation, got it, one set if you give to another set, so basically this is like encoder decoder model as the starting step, so we can be able to understand based using self-attention mechanism probably, the core idea behind the transformer is self-attention mechanism of course, which allows inputs to interact with each other and this is the significance of each input independently based of their position in that sequence. So first I will probably go for like data preparation, so like where each the data contains like lots of samples and probably use different kinds of you know libraries in PyTorch or maybe TensorFlow, it does not matter which language. So basically, the model generally consists of several layers like if I want to use transformer model case, so convert first step is actually creating embedding layers which converts inputs to tokens, then positional encoding which is the base of all these things, adds personal information to input embedding. Since transformers do not have recurrent layers, then encoder will be there which is composed of a stack of almost similar layers with sub layers of attention mechanism and feed forward neural networks and after that decoder, so whatever is happening in the encoder. So almost the same will be happening in terms of decoder, but an additional sub layer which actually performs self-attention or multihead attention over the encoder's output. So the final layer is actually the decoder output, decoders I mean decodes the output to the size of the vocabulary. So I think this is how actually we tries to on an average anything on a high level actually not on a high level. So loss function can be maybe cross entropy loss, since it is a classification type, so optimizer can be AdaGrad or Adam optimizer or RMS prop, then do some evaluation and testing based on separate test set like to take different samples, if you are doing in Spanish to English like take different Spanish text which have not seen and work on that. So matrix like modern matrix probably like I think blue score or something which actually assesses this yeah.

Generally, overfitting is actually can be done. So, generally, the overfitting can actually be addressed based on like some kind of scenarios where if you have more data, try to reduce the data a little bit less. So or sometimes the model might actually been probably learning too much complexities and too much patterns in the data. So try to reduce some of the complexities, use dropout regularization parameters so that it cannot be able to, it need not learn entire data to, so that way we can be able to overfit. I am saying this in general, not just in personalized recommender systems. So maybe L1, L2 regularization, monitoring the validation loss and do some hyperparameter training and all the techniques which we generally use on a machine learning model is one thing. But specifically for recommending personalized travel itineraries, so the best idea of this part is actually tries to understand the metrics, understand what kind of predictions we are actually giving. So based on that, we can be able to like once the model is deployed, so getting from the product analytics team, so consumption analytics team, so what kind of recommendations we are actually giving. So are the users satisfied to that? So like we can ask some user annotations, so user probably can give some kind of answers to us. So based on that, so these kinds of things we can be able to do for personalized travel itineraries, anything, anything mostly based on recommendations. Why because I told to deploy the basic model first and start building from there because user engagement will be much, much different than whatever we see. So based on that, if you want to train like whatever happens in the previous thing, previous years might not be the same at current. This is not some kind of a different ML model project like describing patterns and all, but here it actually changes a lot. So that deploy the model, get some analytics, figure out that what has gone wrong, whether the data drift is too much is there, that is one point. So that way we can be able to reduce overfitting and the normal overfitting techniques we use like regularization, dropout layers, feature selection, feature engineering, reduce some of the complexities, if there is any multicollinearity between features, reduce these things.

So, I am not exactly sure what actually select k best is probably I am thinking like select k best which is based on chi-square to statistical relation, but sometimes this is not exactly a good you cannot even say that this is might always give a negative impact on a model evaluation, but it also sometimes provides positive results, but along with that you actually have to think about seeing whether there is a too much relationship between the other variables. For example, if there is too much collinearity between the other variables, so we can be able to reduce some features that also we can be able to do by using variable inflation factor some of the regression variables continuous variables we can be able to reduce if two are actually been giving highest importance almost similar importance to the target variable I think one variable we can be able to delete. So, in that way we can also be able to use, but doing in this manner this probably work, but we cannot be able to say exactly this is going to give wrong answer. So, matrix has given us accuracy score I assume that for this project accuracy score works, but ideally this may not accuracy score may not work. So, that is one step and also for the feature selection part we can be able to use LASSO regression I think LASSO it is probably LASSO regression which actually tries to reduce the number of features. So, penalize some of the features which are actually being less importance to the model, so that also we can be able to use. Negative impact on the model in the sense in case if this is giving a negative impact mostly if you probably you are losing lot of information from the other features which may be very helpful for the model. So, in that way also this might give negative impact, but in general we cannot be able to say exactly this is going to be what kind of negative how much negative impact and what kind of negative impact on the model evaluation part and one more thing is like data leakage might be usually here because we are using the interest splitting at the chart itself. So, do not validate in that case do not validate on the seen data in this next test. So, do your validation on the unseen data which is not even actually been right now gets these kinds of feature engineering stuff. So, what exactly happens is like to the test with the outside world then only we can be able to understand so how much negative impact it does, but as of now we cannot be able to completely assess like how much impact it does, but there might be issue because of we are not considering other features.

Okay, this is the best, the best one is like MLflow, or some people may use Kubeflow. So MLflow is probably in my idea, which is mostly the best one, which is an open source one. And also it can be seamlessly integrated with different cloud environments like AWS, Azure, Databricks and GCP. So multiple data scientists, since I am one of the, I am the one who is actually building the platform for our team to use this MLflow. So I'm creating a code, which is basically works for any kind of project in MLflow, like they have to use this code, but they have to, and also they have to use only these functions, always these functions. You can be able to use any kind of functions, but finally call your function in this one. The MLflow operations functions, which we are providing. So version control is the best one. So once the model has been pushed, I mean, like once your code has been pushed to the repo, so automatically the CSED pipeline will try to run the code and find out the best model based on the hyperparameters, which is actually running in the background. So that model will be stored to MLflow as the production model, not the archive. So whatever the user defined metrics, let's say based on the best precision, if I'm running five test runs on my experiment, means five models, let's say I'm running linear regression with five hyperparameters and logistic regression with five hyperparameters. And again, XGBoost with some five hyperparameters or maybe deep learning. So based on all these things, whichever has been giving the highest accuracy or precision or recall or whatever the metrics or user defined metrics, which we use, especially for recommendation kind of projects, like click rate, conversion rate, click rate, click to purchase conversion rates, so these kinds of things. So whichever the best model will go to the production. So by this way, any other data scientists or multiple data scientists in future can be able to work on the models, which has actually been stored in the MLflow. So we all should have one MLflow instance, almost all the data scientists who's been using, even for the different projects, they can be able to pull the model, fetch the model, or they can be able to log the model. They can use the previously already loaded model somewhere as they specifically find, they can use that model and judge that model. And they can collaborate with different people, especially, I mean, like say, other scenario, MLflow should be the best option for version control for model part. But for code part, I think GitLab should be a best choice as far as since I'm working on that. There are different techniques like probably Jenkins and things also we use with GitHub, but currently I'm using GitLab, GitLab has everything inbuilt CI, CD, no need to go for some other tool for CD, you can have everything for CI, CD in GitLab itself. So based on this, we can be able to, multiple data scientists will collaborate with different teams, sorry, projects and models, mission and project, yeah.

As I said, I have not been able to, I have worked on reinforcement learning before, but as I said, since if it is a chat-like conversational AI-like approach, so what we can do is we can push our models to the repo, sorry, not models, I am sorry, I am confused. So user will interact with the website, so key travelers will keep asking questions, so they are getting responses and they will say this is wrong, this is right. So whatever the user is mentioning in the wrong word, right, or whatever, efficiently it has to take this feedback into the back-end, right. So once the user started something, right, so the responses which I can keep on generating, so this has to be in cache, maybe Redis or something, because let us say my token limit is 1 million tokens or 16,000 tokens or 70,000, whichever the amount of tokens, 100,000 tokens let us say. So 100,000 tokens is a very big amount of tokens because back-end keeps summarizing this content by any LLM, it is actually done. So we can build some reinforcement learning system there, like it says wrong, now immediately it has to redefine and check and go to the, if it is a RAG pipeline built in the back-end, go to the pipeline and send some more and do little bit more amount of calculation like for example, change the temperature of the query. So just do like top k instead of top k, documents is like 3, make it 4 or make it 5. So more amount of content will be, we will get it. From that now, tries to get the best response out of this, like based on some kind of score, I exactly did not remember what kind of score is that, but based on that score, you analyze each of those responses generated by the RAG, retrieved from the RAG, now you generate this answer to the user, like based on whichever is the best one. So again he says something wrong, now go back, so he said like, ask him like what exactly you are looking for if he keeps sending those answers. So based on his keywords, if he has sent something, so get that more information as a new token. So this token also will have some kind of keywords, tags, summaries, polarizations, positive and negative, everything. So based on that, I can now retrieve with adding this and previous responses. Now the previous responses should be in the cache, of course, because you have to know that the previous response is a wrong one, now I have to make choices. So at this point, we can be able to use reinforcement learning system, but I have not actually worked on reinforcement learning, but this is how, I think this is the spot, I think we can be able to use the reinforcement learning.

I have done a lot of times, for example, for one of the NLP project which I have deployed in TCS. So, using topic modeling, so what the agenda is like, once the user raises a ticket, so automatically create some kind of tags to the ticket, so and this, so based on the tags especially called as topics in LDA, it automatically loads the tickets to right agent, right, but so the business SLA has been reduced from 3 to 5 days for this. The issue which we are facing is here, if a user raises something like a ticket called as my Ultimatics has been blocked, my Ultimatics access has been blocked, so what happens is Ultimatics is a keyword which is a website of TCS, so it says that this person's account got locked, it might think like that, so automatically the ticket is going to Ultimatics, but actually he is actually looking for something like an access, his credentials are actually good, but his access has been broken. So here the adjustments to the machine learning model which I made was like instead of using 1 grams, I have been using bi-grams, tri-grams and phrase matches, so whether it is a positive or it is a negative, what exactly he is doing, so these kinds of scenarios have been changed and based on our first requirements, so then we pushed, so initially it was like 33%, we are getting correct results like for out of like 100 queries, 100 tickets only 33 are getting good results, but now after these changes, around like 60-70% we are getting good results, this is one, there are different many situations which we have like changing in business environments for example, the cross-service recommendations which I have been working initially, I have set up the entire pipeline for years, so since the pipeline is working well and it is scalable, reliable, so we were able to scale to Canada, UK and EMEA, other European countries and APJC as well, Ratnavarottam APJC, so Canada, UK has been complete, has been like done within most probably like a 2 weeks, again max to max in one sprint. So within 14 days, we were able to deploy these models, so that sorry, not models, yeah models of course for Canada and UK, so this way changing the business environment, business requirements to accommodate changing business requirements time to time, I think we have, we should have a robust pipeline in place, so that once something new comes up, we should be able to easily deploy that, yeah I believe that is the way I have even try to modify a machine learning model as well as like added blacklist items to the model, doing some kind of filters to sort out the reviews, to sort out the reviews, to sort out the feedbacks, and finally giving the right predictions, cross all recommendations, so some amount of rule based use case, rule based techniques also I have used like the final scoring part for the recommendation system, so in that way for different projects, I have often different business requirements obviously will be there, so yeah, I have used that.

Yeah, this part is like, you know, we can be able to have integration tests, unit tests, this entirely has to be built in the code pipeline itself, like CICD tools. So once you have like entire data code, so for every code which you are actually been creating, so try to write a unit test for that. So in this manner code coverage will be, code coverage will be covered. So every organization will have DevOps maturity scores. So we also have such scenarios. So based on that, so you have to provide the code coverages. So code coverage actually, you know, creates, you can be able to do unit test cases. So unit test has to be run before the model is being pushed to production, pushed to ML flow, so everything. So check for vulnerabilities in the data, whether the data distortions has been occurring currently. So for this, we can be able to use like DVC for data engineering, for data pipelines, and for automated testing, we can be able to use like integration tests. So this is not going to run for the entire data, this is going to run only whether our pipeline is actually performing well from the start to end or not, like whether it is running unit test case or not, it is running properly, code coverage is there or not, is it actually properly creating the Docker image, is it creating the secrets in the Kubernetes, is it able to host API. So all these things we can be able to run as part of integration test, or if we only looking for a machine learning pipeline, whether it is training, whether it is loading the results to the DB, whether it is loading the model to the ML flow, if it is not a batch one, so whether it is actually been inferencing faster. So these kinds of things, we can be able to test all the aspects of machine learning pipeline. So basically, the major idea behind this is use integration tests to run the entire pipeline and check whether it is working, then only you move to the next stage, which is deployed to production, deployed to Kubernetes cluster or whatever the stages you have, machine learning CICD pipeline. This is how we can be able to ensure data integrity as well as like the entire pipeline, whether it is working or not.

Midhilesh Momidi

Senior ML Engineer

9.6 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Senior ML Engineer

Senior ML Engineer

Sr. Data Scientist

Data Scientist

Achievements

Major Projects

Translation App for Walmart Associates

Cross-Sell Recommendation System

RAG Pipeline for Dell Chat

Education

B. TECH

12th

SSC

Certifications

Machine Learning

Python Problem Solving

Interests

AI-interview Questions & Answers