Vetted Talent

Rajesh Somasundaram

Vetted Talent

I’m a Senior Software Engineer in Machine Learning with 8+ years of experience building scalable backend systems and end-to-end ML platforms. Currently at Toast, I design and optimize ML infrastructure on AWS, accelerating deployment speed and improving model reliability for large-scale production systems.

My expertise spans MLOps, distributed systems, cloud architecture, and real-time model serving using tools like MLflow, TensorFlow Serving, Kubernetes, and Kafka. I’ve led initiatives that improved deployment efficiency, enhanced predictive accuracy, and strengthened system availability to 99.9% at scale.

I’m passionate about building robust ML platforms, enabling data teams, and transforming complex machine learning workflows into reliable, production-ready systems.

Role
ML Ops & Backend Engineer
Years of Experience
10.17 years
Professional Portfolio
View here

Skillsets

MySQL
React.js
Pl/sql
MuleSoft
Mockito
Machine Learning
JUnit
Jira
GitLab
Boto3
Azure
SAML
Oracle SQL
TensorFlow
MLFlow - 3 Years
Python - 4 Years
Airflow - 2 Years
Docker - 2 Years
Grafana - 1 Years
Prometheus - 1 Years
Kubernetes - 1 Years
Kubernetes - 1 Years
AWS - 3 Years
AWS - 3 Years
Scala - 1 Years
Scala - 1 Years
Java - 4 Years
Java - 4 Years
Python - 4 Years

Vetted For

16Skills

Roles & Skills
Results
Details

Senior Machine Learning Engineer (Remote)AI Screening
72%

Skills assessed :ML frameworks like Tensorflow and PyTorch, Experience with online businesses, streaming architecture, e.g. Kinesis, Apache Kafka, Automated Testing, CI/CD, machine_learning, MLFlow, model tracking, Software Engineering, AWS, Docker, Java, Jenkins, Kubernetes, Python, SQL
Score: 65/90

Professional Summary

10.17Years

Jan, 2026 - Present 4 months
ML Ops Engineer III
CrowdStrike
Jul, 2023 - Jan, 20262 yr 6 months
Senior Software Engineer
Toast
Apr, 2023 - Jun, 2023 2 months
Machine Learning Operations Engineer III
Ninjacart
Sep, 2019 - Nov, 2019 2 months
Software Engineer
TEKsystems
Dec, 2019 - Dec, 20212 yr
Software Engineer II
HERE Technologies
Jan, 2022 - Mar, 20231 yr 2 months
Machine Learning Ops Engineer II
Ninjacart
Apr, 2018 - Aug, 20191 yr 4 months
Backend Engineer
Infosys
Sep, 2015 - Apr, 20182 yr 7 months
Backend Engineer
Wipro Limited

Applications & Tools Known

MySQL
mlflow
whylabs
evidently.ai
Docker
Prometheus
Grafana
Kubernetes
Vue.js
SAML
MLFlow
Airflow
React.js
Gitlab CI/CD
MLFlow
TFX
React.js

Work History

10.17Years

ML Ops Engineer III

CrowdStrike

Jan, 2026 - Present 4 months

Senior Software Engineer

Toast

Jul, 2023 - Jan, 20262 yr 6 months

Engineered a robust ML pipeline infrastructure on AWS, accelerating deployment speed by 70% and enhancing model accessibility for over 100 users. Designed a React.js-based internal developer tool, centralizing data and ML models, which boosted team efficiency by 25% and reduced project turnaround time. Developed a self-service platform concept, simplifying product features similar to Databricks, resulting in a 15% reduction in onboarding time for new users. Executed A/B testing for ML models using Replace and StepWise elevation strategies, leading to a 40% increase in actionable performance insights. Revamped the MLflow architecture within three days, resolving critical model availability issues and enhancing deployment reliability by 50%. Collaborated with the team to prototype a forecasting solution using multi-agent frameworks like Autogen and LangChain, achieving a 15% boost in predictive accuracy.

Machine Learning Operations Engineer III

Ninjacart

Apr, 2023 - Jun, 2023 2 months

Machine Learning Ops Engineer II

Ninjacart

Jan, 2022 - Mar, 20231 yr 2 months

Engineered an end-to-end machine learning and data engineering platform at Ninjacart, enhancing operational efficiency and deployment speed. Developed an Android-based TensorFlow Lite Edge model for real-time object detection of fruits and vegetables, improving accuracy by 30%. Implemented image upload functionality to Azure Blob Storage, streamlining data management and accessibility for machine learning applications. Created and deployed projects using MLFlow, TensorFlow Serving, and Triton Inference Framework, optimizing model serving and performance. Facilitated collaboration between data scientists and Android developers to build planogram models, resulting in a 25% reduction in development time. Established an in-house MLflow tracking server to monitor model experimentation, improving metrics tracking and validation processes. Revamped the data engineering pipeline by migrating legacy CRON jobs to Airflow, increasing scheduling reliability and reducing maintenance overhead.

Software Engineer II

HERE Technologies

Dec, 2019 - Dec, 20212 yr

Software Engineer

TEKsystems

Sep, 2019 - Nov, 2019 2 months

Developed and deployed end-to-end machine learning solutions, leveraging a comprehensive ML stack to accelerate project delivery timelines by 25%. Implemented and automated the deployment of a Road Hazard ML model using TensorFlow Serving API, Docker, and AWS Boto3, enhancing deployment efficiency by 30%. Managed model versioning and auto-scaling for ML models, optimizing resource allocation to maintain 99.9% system availability during peak loads. Automated GitLab CI/CD processes and integrated JIRA, reducing deployment time by 40% and improving project tracking efficiency across teams. Generated comprehensive metrics reports with Prometheus and Grafana, facilitating data-driven decision-making and improving performance monitoring by 50%. Streamlined the deployment of Kubernetes pods, services, and config-maps, boosting system reliability and scalability by 35%.

Backend Engineer

Infosys

Apr, 2018 - Aug, 20191 yr 4 months

Worked for a banking application in Mule soft ESB integration technology to develop an interface layer between various products by following the development cycle - Agile and Scrum. Involved in major part of performance testing, code reviews, analysis, debugging, etc; Handled the complete transaction layer between the back office and the payments engine. Implemented SAML application security. Devised Key-tool, Certificates trust store and key store, Encryption and Decryption of web messages. Completed handling and documentation of security implementation of the current program. Built low latency application which processed a 1.6 million records in an hour.

Backend Engineer

Wipro Limited

Sep, 2015 - Apr, 20182 yr 7 months

Worked in software development, UAT Testing and Production issue analysis. Major part of the role spent on development done majorly using Core-Java, Java-8, Oracle SQL, PL/SQL. Successfully develop and merge the changes being developed without any build issues. Demonstrated JUnit test cases using the Mockito framework and finding the code bug in JAVA. Contributed on a bank application which acts a heart of a ledger book deals with the conversion of the feeds from the upstream to generate the reports for the reporting systems such as SAP BO, AXIOM etc; Generated the extract feeds daily and monthly with intra and batch level based on regions, adjustments. Involved in performance testing of the application to identify code bug and reduced 20% of code redundancy.

Achievements

Redesigned the entire ETL pipeline using apache airflow in Ninjacart. Designed and implemented automated One click deployment strategy using Mlflow.
Lead the team for android based development design for AI based project.
Successfully develop and merge the changes being developed without any build issues at Wipro Limited.
Reduced 20% of code redundancy during performance testing at Wipro Limited.
Built a low latency application which processed 1.6 million records in an hour at Infosys.
Automated deployment of pods, services, config-maps using Kubernetes at HERE Technologies.
Modified the entire data engineering pipeline, migrated the legacy scheduled CRON pipelines to Airflow based schedules at Ninjacart.
Contributed to mlflow code open source for log x-axis change in the newer version.
Collaborated with Indian based startups as a product reviewer and helped them build features accordingly.

Major Projects

1Projects

Scalable E-Commerce Platform on AWS Cloud

Designed and deployed a highly available e-commerce platform on AWS Cloud, leveraging various services and adhering to best practices. Implemented auto-scaling and load balancing for optimal performance. Used AWS EC2, S3, RDS, DynamoDB, Lambda, API Gateway, CloudFront, Route 53, SNS, SQS, CloudWatch, VPC, Elastic Beanstalk, AWS CLI, and AWS SDK.

Education

Specialized in Software Development & Problem Solving
Scaler
BE/B.Tech/BS
Anna University (2015)

Certifications

Specialized in software development & problem solving - scaler 2022

AI-interview Questions & Answers

Hey. hi. This is Rajesh. I do have an experience of around 8 plus years right now in IT industry. And, like, last 4 years, I would say, I'm mostly working on machine learning operations. And previous to that, I'm more of a back end engineer. I started my, career in Wipro where I worked on Java as a Python programming language, and, Oracle SQL was my back end DB, like, that we wanted to use. And then I moved to Infosys where I worked as a MuleSoft Telever. It's basically for integration testing. And then I was looking for various opportunities related to machine learning and other AI related field. Then I found machine learning engineering, like, where software development associated to, associated to ML was, like, really catching my eyes a lot. And then I moved to a different company called as Hear Maps. Over there, I tried to, learn I mean, I started learning Python language, and also we built, ML, solutions. For example, we closely work with data scientists where, we help them to deploy the models and then focus more on scaling and, you know, scale out related functionalities, build CloudWatch dashboards, and, also, I built I worked on alerting systems using Grafana dashboard and Prometheus. These are the major areas that I worked on. Also, during the COVID time, like, there was, like, little shuffling that happened with respect to the teams. So, I got a chance to work into, even streaming application, which is called as a passive link. So what we do is, like, we get the real time updates of the location data where we process it, and then we send the data back to the back end system, which would be updated in the, navigational map. Or, in fact, people would also use it for, building a 3 d maps. So more of that's where, like, my journey to machine learning started. And then I got a chance, like, you know, post COVID, like, I was working in studying in Scalar, and then I was trying to upscale myself into a bunch of other things. And then I, specific bird caught my eyes, which is machine learning operations where, you build a platform and then help the data scientists to accelerate the speed of model deployment by building the model tracking servers and then create, like, automatic pipeline and then, create certain strategies like, you know, performing the deployment and all the other stuff. And then I moved on to the current company where I'm mostly involved in building the ML platform. Here, I built a strategy called as 1 click deployment where, it will have better UI designing as well as the back end designing for the model to get deployed quickly. And also I had to, implement a lot of elevation strategies. right now I just started with, like, you know, schedule based elevation strategy, but, like, I'm kind of focusing on, like, model out based elevations that trigger should be happening. And, yeah. Also, I'm working on some of the LLM related applications. I'm just trying to start picking it up. My major use case right now is mimicking the chat GPT, like, of a response instead of showing the entire tokens that is generated into the, front end. We have to stream the tokens to the front end and then keep working on. So the these are the works that I have done. Thank you.

Oh, okay. So I remember, like, I've worked on some use cases where, we wanted to understand whether they deploy container using, Docker and the Python program, whether the memory that is being consumed here is that even really the right proportion? For example, the we configure the CPUs and memory, but we do not have an, infrastructure to monitor the memories that is being allocated as the actual memory that is needed for these easy 2, instances. For exam so what I did was, like, I have been given a test drive where I have to build this monitoring solution regarding the memory and the performance. So 1 thing we thought, like, okay. As a sidecar, let's deploy our c adviser, which is container adviser to help us understand what is the memory configuration, what's the RAM configuration associated to this specific, container which is deployed and then try to understand. And from the Python point of view, diagnostic performance bottleneck. So certain performance bottleneck, I mean, I mostly fix certain issues related to the back end database, where if the indexing is not right, and then you fix that code and then write a query based on the backend server. And then with respect to Python, not to store caching within the application container itself rather than put it somewhere in the external service. So that will, help us to avoid a lot of issues in the system itself. That is 1 thing. The other thing that I used is, like, see adviser. You can call it as container adviser that is developed by Google. So we kind of, you know, attached the c advisor into the docker container, and then, we made it. And apart from that, other application performance bottleneck, 1 thing was, related to the machine learning model. So let's say when we deploy model into a GPU machine, the first inferencing mostly always takes time. So, what we do is, like, we have the fast API and we spin up the instance. I do not let the instance become healthy until unless the model is being loaded and did its first default inferencing so that the frequent, subsequent request that happens in a GPU machine or CPU machine, that will have a that will have a very, low latency applications. So that is something very recently I found out, you know, that this specific use case is kind of applicable for 1 of the GPU based model, and then it is also applicable for 1 of the CPU based model, which is which is on live GBM that also took, like, 1 second or something like that. So these are the various issues. And also, sometimes the external call to the server takes a lot of time on the latency. So we kind of understands, hey, what actually that happens. And, is it the code that is causing a problem or the external system that, we are interacting with before the model influencing if that is causing the problem. So these are the different use case that we try to do. And, yeah, mostly, these are the different performance bottling that I'm trying to I identify again. Can you?

How could you implement the CIC? I'm not what do you mean? This see, to be honest, like, I have not worked entirely on sentence. Maybe I can take an example of CICD from the GitHub and on the GitLab perspective. So whenever the code that is being committed to a specific branch, let's say, non main branch, like, I would like to call them as a feature branch, so you just deploy it and it triggers certain cases. For example, we can have a unit test cases like, pytest. This py pytest, like, would run the entire, Python script, which is under the test folder to run it. And on top of it, we can have a similar queue integration, which will check the code coverage and also which will fail the pipeline, the CCSCD pipeline in terms of, CSCD pipeline when the code coverage is lesser than the expected 1. This is 1 of the automated testing that we can do. And, also, this is I would say, like, it's like a unit testing. On top of it, we can have certain integration testing as well where we can you can have your own fixtures being uploaded based on that to perform certain influencing. And then, once the influencing is done, then, verify the results. And if the results are up to mark, then go ahead and do it. That's 1 thing. And, 3rd, 1 is, like, related about the department of machine learning model. There are 2 things, with respect to machine learning model. 1 is the code version that changes. The other 1 is the model version that changes. When the code version changes, you build a docker image pointing to the existing model version. Let's say the model registry is an MLflow server. I was like, maybe it could be some other MLflow artifact as well, but, let's consider some standard as a MLflow model registry. So whenever we make a comment, we can point to a specific experiment at the model version, which is registered in a specific image also. And, you build your Docker code, download the model from the package, put together in a Docker container, and then deploy. This 1 way. Or else, you download the model when the actual container instant that has been spinning up, and then, start the deployment in the real time. Like, let's say when the docker container actually starts, that time, based on the model version parameter, like an ENV variable. You just set it up, and then it will get down with the model from them also and then start doing inferencing. So these are different ways that we can have, but I'm I've really not worked on Zendesk. Like, I worked on I worked mostly on the GitHub and the GitLab based configurations, like gitlabcidot yml file.

Okay. For okay. 1 thing that I can see is that we can set up a specific instance where the Jenkins related pipeline that is running, and we can tag a specific task pointing to a specific docker image. From that docker image, we can, you know, start executing the docker scripts that we have. For example, Docker for setting up a repeatable building on for machine learning projects. For example, let's say, have a project that uses PyTorch as a base image And, this base image, we can have the pipe files installed, but this needs to have a base image of NVIDIA on NVIDIA with CUDA driver that is enabled and certain prerequisites. Right? When I try to install GPU machine. So in that case, choose the right base image for your application and also install the libraries that is actually necessary for the, like, CUDA and version, like, any other libraries or drivers associated to the NVIDIA. And then, try to, you know, have the basic installation or poetry installation. So we make we can make our project constructed in a way that it is packaged in, you know, the dependencies are managed by using poetry. It's like, and then only for the inference, there are specific dependencies that is required. You can install it there and then execute your, entry point of your docker, which would take the sales script to run. It can be so here, the project that you set up could be online inferencing model, which serves through STTV diapers, or else it can be via the offline job, which can be used in certain ad flow or, like, other orchestration tools, which executes the job and then completes it. So that time, you can use that image and then build it. So but using Jenkins, I am not really sure how to set up the repeated repeatable build environment, but, we can always, you know, specifically tag to a specific image on an instance. It's like a docker inside the docker where you can run some of your docker commands in docker which supports the building environment specific features. And for every environment, what are the that we wanted to tag, let's say, AWS. You specify that key, import that into your build step, or and then push your docker images to this specific environment based on the built in EMP for which the docker build and the docker push is being configured. Yeah. That's my answer.

Do you provision and scale machine learning? Okay. Here, when I okay. I this is what I understand. When do you want to provision in AWS? Right? Okay. Let's say I assume the models are deployed in AWS ECS clusters. So every ECS cluster is kind of tagged with the ASC auto scaling groups. So with this auto scaling group, what we can do is we can also apply certain scaling policies for that specific, service that is being created. For example, here, there are a lot of metrics that we can use. I personally have used the CPU utilization metrics. Let's say for a given 5 minutes of period, thus, if the CPU metric is more than 20 or let's say, like, 40% or something, then trigger the auto scaling, scale out. That is something we can do. I was like, we can do, like, scale in options. So these are the 2, items that I have worked on, like and I personally have not tried GPU based scaling wherein, by default, the every service that we create in a cluster does not show the GPU utilization and the rewards margin as well. That is something we have to enable. And provisioning? Yeah. Let's say to handle high load prediction. Right? So III under I mean, my asthma my understanding, when we say high load, it is the huge number of incoming traffic to the model. So based on the model, need, we have to set up the minimum and the maximum instances, and we have to first do the performance testing. And from the outcome of the performance testing, we can, set up the minimum number of systems that should always be available for a given specific service. As soon as the request starts increasing or decreasing, based on that, we can update the scaling policy, and, the instances will spin up based on that. This 1 area. And, there are other things, like, which we can do is, like, we can have certain, even bridge scheduler, which can invoke Lambda functions, which will again update the services. Like, let's say, for example, on, like, from the time, like, 12 to 6, there is no load. It's kind of something like that. We know it already. Then that specific time, we can reduce the number of instances. And after that, we can increase the minimum number of instances as well. This is 1 of the technique that we can follow as a cost saving procedures. And sometimes, like, the GPU machines take a lot of time to load and spin up, so we can maintain some of the warm, cool, hot instances in the cluster. So all it has to do is spin up the docker image directly rather than waiting for the GB machine to spin up. For example, very recently, I checked out this oh, I was trying to upgrade g 5 to g 6 instance, which is solid cost efficient, in terms of cost. When I tried to do it, like, I noticed, the CPU machine from provisioning to pending state sorry. From pending to the running state, it took, like, almost more than 10 minutes or something like that to spin up. So, yeah, these are certain things that we can consider to handle high load predictions.

Okay. I am not if AWS are part of the AWS security features for model protection. Okay. when you say, AWS service part of the ML pipeline, I assume here we are talking about, maybe, like, head of the SageMaker instance and history buckets where we save the models and deploying ECS cluster or, like, offline jobs that is going to run on the training jobs. Also, we try to get the data training data from the other services. So I would put all of this into a single VPN, which will have its own, security groups where I would be allowing all these specific, IP addresses rather than allowing all the links. Like, so 1 thing you can do is, like, you can set your inbound rules. The other thing you can set your outbound rules as well. In general, outbound rules could be, like, just allow for all, but inbound rules is something that would be, controlled. And then, you can so when you put all your services into a single VPN and a single region, that would be better. I have not worked on have not personally worked on multi region AWS features, and I have not handled it. But, yeah, security groups, which is adding, like, inbound and outbound rules, and then I will be checking on the VPN. And, also, I'll check-in what are the CIDR blocks of it. And, mostly, these are the basic security features that I would look for. Most probably, I might even look for the gateway AWS gateway services as well and also have certain configurations for security configurations for load balancers. And I have I will have its own rule on for what are the specific you are a parameters. I'll be sending the accepting the request to be routed, else presented. So, yeah, these are the major areas, like, I would be thinking of in terms of security, and mostly it deals with security groups, VPN. Yeah. Pretty much. That's it.

Your container. Sorry. I don't think I can Google it out, but what I can see is here we have mentioned container port, but this container port cannot be accessed from the actual machine port. For example, when we run our docker container, I docker run use have a specific image. Then you specify a parameter called as hyphenp, which is the port number. You give a machine port number and then the container port number. So you have to add these 2. Basically, the machine's port is not actually mapped to the docker bridge through which the container port number 80 can be, accessed, but that mapping is not proper, I believe. So that is the issue here. And, the docker container might run, but the port number is not actually open for the machine itself to access it. So we need to have a exact mapping of instance port and the port docker container port as well. So that is something I that is something that is missing here. And other than that, like, I don't see any other problem So here that could lead to the model service and how might you connect it. Oh, okay. How will I correct it? Adding the port linking? That should solve the problem. Yeah. Yeah. Pretty much that's the thing from my end.

How do you explain this? What could be a potential fix? Would you explain the bug in this pattern? I think we have I we fit the model, but we actually did not transform it. I believe, transform function is the 1 which actually trains the model. It is actually, like, kind of repossing and have the data fit into the model, but we need to call, like, fit and transform for the training. And, for predictions yeah. It's test is fine. Dotcount.dotcount. I need to check the dot count function, but previous to it, like, I feel we have to we have to check we are just splitting the training data and then start predicting on tech test. But we are not actually checking the metrics of the data that we've fit into the model and verify if the recall impression is right or not. That is 1 thing. Syntax wise, everything looks good to me. And, yeah, I think transform is something that is being seen in the code. And, yeah, we have to train the model for the trained data and then start predicting for the x test. So for the x test also, I think we need to do the fit first and then start the prediction rather than directly predicting it. I think that is something should be done. Yeah.

Do you design an optimal high throughput system for real time machine? Right? Oh, okay. There are many ways to do it. 1 is imposing the model inference. Right? For example, when we train a model in a GPU machine, for example, but sometimes we come out with the model weights, which is way lesser I mean, model weights, that is an output, but we have to deploy the CPU machines or GPU machines either. In general, let's say the inferencing is, like, too high in CPU machine, we can opt for a GPU machine for a model deployment. This 1 thing. Or else, we have certain techniques. Like, you can I do not have the exact naming, but you can say, like, converting the floating point numbers, like, from float 32 to float 60 will actually reduce some performance of the model, but it this might increase the model inferencing speed? This is 1 way. And I'm not sure if there is anything like model proving that can be done, or it's like we can also or it's like we can do certain optimizations like convert the model through ONNX, through Onyx, converter, which would give you the model with certain changes in the model inferencing graph. So while inferencing, your inferencing could be faster by using, so whenever we create a model, we can convert that using phonics and then get the model, weights. So these are, like, 3 different ways to check, like, for the for, you know, real time machine learning model inferencing. And if it's an I throughput system, of course, like, I need a scaling auto scaling. And, also yeah. I mean, these are the very basic stuff that I mean, these are the stuff, like, I remember that I am doing, and I'm working on, the scaling part. Also, understand the CPU utilization. Also, let's say if you're performing the real time. Right? For example, say the model is inferencing through STTP, make sure the latency between the network is not right. And, based on that, like, deploy the models at the specific place or the environment where it is, like, very much closer to the customer calls. These are different things that we can think through in terms of model inferencing. I was, like, choose the right, light bit models, like light GBM models, some other models to train it. If the network is too huge, maybe serving the request using batch continuous batching, like, let's say, ray cluster uses certain techniques called as continuous batching. When the model is they said I mean, enabled for a batching solutions, Instead of inferencing each 1 by 1, it creates a micro batch and then performs the inferencing all in together at once and then send the result to the caller. So these are various different items that we can think of for model influencing technique. Yeah.

What would be your I personally have not worked on recommendation systems. What would be your strategy to architect our recommendations? Okay. So, I mean, let's think of a use case first. Maybe based on that, like, how do answer this. I have no idea, like, how should I answer this? Just that. For example, Hotstar streaming system. Right? Okay. That's a very big use case. so when the viewership starts increasing a lot, we can have a very nice, recommendation results output for a specific location and specific types of user have the results being already saved and then show them in the real time rather than when the user logs in. You go and fetch the ad, and the ad results are not being displayed, and we don't know what to I just show it to the user. Right? So it would be better to have the pre computed recommendation outputs based on the user localizations or the user behavior, and they use this specific user comes into a certain pool category, and these are the ads that we want to show it to them. Yeah. Based on that, we can do. Also, like, make sure that we don't show the repeated ads to the user. So we always have to be tracking all these informations while building the microservices approach kind of thing in AWS. Yeah. Pretty much these are my thought process on this. Maybe there could be, like, a lot of push, but please, help me clarify this question. And I would like to discuss more on this. But, I would like to put this in a way saying that, like, I have not worked on recommendation system personally. So I'm not able to just think it through, like, in a very quick short time. Yeah.

How would you integrate, let's say, Kafka with MSO to improve the model tracking for real time predictions. How do you integrate Apache Kafka with m also? To the model tracking? Why we have to my question is, like, why do we have to integrate Apache Kafka with MLflow? Kafka is a consumer. Okay. I mean, it publishes message to the consumers as well as it consumes and publishes. Okay. That MF0 to improve the model tracking for a real time prediction service. Maybe, for example, we can try out something like with a new model straight, if that accuracy is really right, we don't have to integrate the flow with Kafka, but we can send the new model predictions to the Kafka consumer, which can be read by on, like, you know, certain which can be read by a process, let's say, lambda function or something. If the lambda function sees the actual performance of the new model is pretty good for a given experiment than the previous 1, then it can actually go and trigger a new deployment. This is something we can do, but, for tracking a real time prediction surface. Yeah. But, from the deployment perspective, I can't think of, but what is this improving the model tracking for real time prediction service? What model tracking are we trying to do? Is it, like, a new model version that is enabled? I was like, we can emit saying, hey. There's a new model that has come. Emit this. Once the new model is available, it is available in Kafka. Based on that, create an event, trigger it, like, maybe via AWS even bridge or something to trigger certain functions. Hey. I got a new message now. Take this out. Do you wanna do something else? Provide this specific MLflow version to the AWS services like ECS clusters. Restart them so it will take up the new model version, which is new model tagging, and then spin it up. This 1 way. Or else, we need to have the AB testing kind of stuff where create a new different service with a new version altogether. And how do I put this? Yeah. That can be done so that, like, you can have a, b testing kind of thing where you can log 50% of the traffic to the model a and 50% of that to the model b and then start the inferencing. So that is something we can do it. But this is what I am thinking of, but I'm not really sure if this is what, like, being expected out of this question. Maybe I need to reclarify this question a lot than just answering it on straight. Yeah. Thank you.

Rajesh Somasundaram

ML Ops & Backend Engineer

10.17 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

ML Ops Engineer III

Senior Software Engineer

Machine Learning Operations Engineer III

Machine Learning Ops Engineer II

Software Engineer II

Software Engineer

Backend Engineer

Backend Engineer

Achievements

Major Projects

Scalable E-Commerce Platform on AWS Cloud

Education

Specialized in Software Development & Problem Solving

BE/B.Tech/BS

Certifications

Specialized in software development & problem solving - scaler 2022

AI-interview Questions & Answers