Vetted Talent

Shashank Khandelwal

Vetted Talent

Seeking a leadership role where I can apply my extensive machine learning and team management expertise to drive innovation and excellence in deploying advanced ML systems and optimizing technical processes.

Role
AI Lead Engineer
Years of Experience
12.58 years

Skillsets

Elasticsearch
MLOps
Airflow
Generative AI
MLFlow
Datadog
AWS
Azure
Linux
ArgoCD
Aws sagemaker
AzureML
CentOS
Checkmarx
Confluence
Python
Git
Helm Charts
Hp-qc
Jira
macOS
ML algorithms
OpenAI
rag
Rally
SonarQube
Vector databases
Windows
Req-pro
Kubernetes - 2 Years
Python - 8 Years
SQL - 8 Years
Spark
PyTorch - 3 Years
TensorFlow - 3 Years
NumPy
pandas
Unix
MLOps
Kafka
LangChain
Hugging Face
Docker - 5 Years
Java
MLFlow - 4 Years
Snowflake - 4 Years
Django
FastAPI
Flask
Generative AI
Hugging Face
PySpark
Python
MLOps
Airflow
Generative AI
MLFlow

Vetted For

9Skills

Roles & Skills
Results
Details

Senior Software Engineer - MLAI Screening
73%

Skills assessed :Kubeflow, seldon, Spark, AWS, Docker, Kubernetes, machine_learning, Problem Solving Attitude, Python
Score: 66/90

Professional Summary

12.58Years

Jul, 2024 - Present1 yr 10 months
Lead Engineer, AI
Logility
Sep, 2023 - Jun, 2024 9 months
Technical Lead ML
Encora Inc.
Aug, 2020 - Nov, 20233 yr 3 months
Senior Engineer
Acquia
Aug, 2013 - Oct, 20163 yr 2 months
Software Developer
Tech Mahindra
Oct, 2016 - Oct, 20171 yr
Software Engineer
TIBCO
Nov, 2017 - Jul, 20202 yr 8 months
Senior Consultant
Capgemini

Applications & Tools Known

Airflow
MLflow
Kubernetes
Snowflake
GitHub Actions
SonarQube
Checkmarx
Git
Confluence
Rally
Docker
ArgoCD
Helm Charts
Flask
Django
FastAPI
Elasticsearch
Airflow
MLflow
AWS
Azure
ArgoCD
Checkmarx
Rally
Datadog

Work History

12.58Years

Lead Engineer, AI

Logility

Jul, 2024 - Present1 yr 10 months

Led development of demand forecasting models using Prophet and Mixed HMMs (Poisson, Variational Gaussian) for intermittent datasets. Built Order-to-Cash projection models using regression and HMMs, improving financial planning accuracy. Co-developed LEA, a GenAI-powered explainability agent using LangGraph and LLMs, reducing root cause analysis time from weeks to minutes. Built enterprise conversational AI platform with LLMs (GPT-4/4o), RAG pipelines, and multi-turn dialogue using semantic search and graph-based knowledge retrieval. Designed multi-agent orchestration with LangGraph/LangChain 18+ specialized agents (SQL, Cypher, Trend Detection, Root Cause Analysis) with intelligent query routing and stateful workflows. Developed async Python microservices using FastAPI, Pydantic, and production patterns (Factory, Strategy, State Machine) for text-to-SQL and graph query pipelines. Deployed on Azure Cloud AI Azure OpenAI, Key Vault, App Configuration, and Data Lake for secure multi-tenant AI infrastructure. Implemented knowledge retrieval with ChromaDB vector database, SentenceTransformer embeddings, and Neo4j Graph RAG for dynamic few-shot example selection. Applied LLM optimization prompt engineering at scale with YAML-driven templates, structured outputs, few-shot learning, and Arize Phoenix. Drove stakeholder discussions, translating business needs into scalable AI solutions; presented insights to cross-functional teams. Mentored junior engineers and initiated bi-weekly learning sessions on GenAI, fostering a culture of continuous innovation.

Technical Lead ML

Encora Inc.

Sep, 2023 - Jun, 2024 9 months

Boosted team productivity through effective leadership and mentorship of junior ML Engineers, MLOps Engineers, and Software Engineers, fostering a collaborative and high-performing team environment. Revamped tax category prediction accuracy by implementing FastText, DistilBERT, Simple Neural Networks, and SVM algorithms on AWS SageMaker and Azure ML, leveraging machine learning algorithms and cloud services. Reduced model deployment time and enhanced scalability by orchestrating deployment of Airflow and MLflow on AWS EKS using Helm charts. Heightened model performance in data representation tasks through the development of FastText-based utilities for generating embeddings, showcasing proficiency in natural language processing (NLP). Progressed predictive accuracy and enhanced model robustness in dynamic environments by implementing stacking and meta-learner models, demonstrating advanced machine learning techniques. Attained efficient insight queries and improved document summarization accuracy by conducting successful POCs using Retrieval-Augmented Generation (RAG) for document summarization. Garnered faster data preprocessing and validation by designing and deploying multi-tenant microservices using Flask, Docker, Gunicorn workers, and Kubernetes CronJobs. Elevated model efficiency through thorough model efficiency and explainability analyses, optimizing algorithms and ensuring a transparent decision-making process. Eliminated deployment errors and enhanced code quality and security by implementing CI/CD best practices using GitHub Actions, SonarQube, and Checkmarx, showcasing expertise in DevOps practices, automated testing, and code review tools.

Senior Engineer

Acquia

Aug, 2020 - Nov, 20233 yr 3 months

Minimized data processing time and increased model performance by migrating models from Parquet to Snowflake SQL, leveraging PySpark, distributed computing techniques, CUDA acceleration, and data engineering best practices. Catalyzed customer engagement and raised sales by developing deep learning models using PyTorch, TensorFlow, and Keras, integrating Large Language Models (LLMs) like BERT and GPT-2, and utilizing natural language processing (NLP). Progressed efficiency with reduction in model deployment time and expanded deployment frequency by designing a CLI tool using Python, Flask, and Docker, leveraging containerization, orchestration with Kubernetes, and DevOps practices. Enhanced scalable solutions delivered to customers and cut development time by implementing AWS SageMaker, Auto ML, and cloud-based infrastructure, utilizing automated machine learning, hyperparameter tuning, and model optimization. Sustained quality of code coverage and mitigated bugs by prioritizing unit test coverage using pytest, with markers, parametrization, fixtures, and test-driven development (TDD), and utilizing deployment (CI/CD) pipelines. Optimized data management and eliminated data management time, and accelerating data accuracy by using feature store in Snowflake, integrating data warehousing, governance, quality, and lineage.

Senior Consultant

Capgemini

Nov, 2017 - Jul, 20202 yr 8 months

Revolutionized predicting optimal trading times through the development of predictive modeling solutions leveraging machine learning algorithms (LSTM, ARIMA, XGBoost) and advanced data analytics. Slashed execution cost by utilizing deep learning techniques (Convolutional Neural Networks, Recurrent Neural Networks) and data science methodologies to improve pattern recognition in trade data. Strengthened trade execution efficiency by implementing advanced machine learning models (Random Forest, Gradient Boosting Machines) and data mining techniques to optimize execution timing. Upheld compliance and accuracy by developing and updating governance rules using SQL in Snowflake, and managing databases and views for efficient data handling, data warehousing, and data governance. Intensified code quality and reliability by adding comprehensive unit test coverage using pytest, with markers, parametrization, and fixtures. Decreased deployment time was achieved by establishing a CI/CD pipeline using Jenkins, automating deployment processes, and leveraging DevOps practices, containerization (Docker), and orchestration (Kubernetes).

Software Engineer

TIBCO

Oct, 2016 - Oct, 20171 yr

Revitalized data processing efficiency, reduction in latency, and improvement in data quality by designing and implementing a Big Data PySpark framework that ingested data from flat files into Kafka, leveraging Scala and Apache Spark SQL. Saved manual testing time, increase in test coverage, improvement in deployment frequency by executing automation framework utilizing advanced OOPS concepts, Agile methodologies, and CI/CD pipelines. Enriched data quality, and 20% improvement in data governance by successfully converting legacy shell scripts to Python, optimizing customer data delivery, leveraging data pipelines, and data transformation techniques. Empowered system throughput, reduction in response time, improvement in system reliability by performing performance testing using JMeter, identifying and resolving bottlenecks, leveraging APM tools, and infrastructure optimization techniques.

Software Developer

Tech Mahindra

Aug, 2013 - Oct, 20163 yr 2 months

Improved customer notification efficiency by integrating batch processing for email, postal mail, SMS, and automated outbound communications within AT&T's internal applications, enabling error handling and auto-response functionalities. Removed fraudulent entries and increased secure user transactions, with improvement in overall system security and compliance by designing and implementing auth functionality using OAuth, SAML, and JSON Web Tokens. Slashed defect density and improved code quality, with defects identified and resolved, and unit test coverage by developing testing strategy using Test Driven Development (TDD), Continuous Integration (CI), and Continuous Deployment (CD). Yielded sales pipeline value and accelerated conversion rates, with enhanced data visualization and business insights, by building and presenting business pipeline using AVOS, Salesforce, and Tableau. Revoutionized testing efficiency and improved test data management, achieving automation coverage using pytest, Selenium.

Achievements

Boosted team productivity, optimized tax category prediction accuracy
Reduced model deployment time
Developing deep learning models
Successful POCs to innovate document summarization
Maintaining high-quality artifacts and enhancing client satisfaction
Recognized for consistently delivering high-quality results and taking ownership of project deliverables
Awarded for implementing qtest within the testing framework
Received multiple recognitions for leading ML and MLOps independently
Awarded for successful project implementations within short timelines
Praised by clients for adherence to best practices
Appreciated by clients for effective defects triaging
Acknowledged for leadership in team grooming

Major Projects

7Projects

Demand Sensing/Forecasting

Jun, 2024 - Present1 yr 11 months

Developed models to improve financial planning and business efficiency.

Taxonomy Recommendation

Sep, 2023 - Jun, 2024 9 months

Revamped tax category prediction accuracy and deployed models on cloud platforms.

CDP ML

Aug, 2020 - Sep, 20233 yr 1 month

Migrated models to Snowflake improving data management and processing.

Morgan Stanley

Nov, 2017 - Jul, 20202 yr 8 months

Developed trading time prediction models revolutionizing efficiency.

Tibco Mashery

Oct, 2016 - Nov, 20171 yr 1 month

Designed an efficient Big Data PySpark framework reducing data latency.

IRU Unlock AT&T

May, 2015 - Oct, 20161 yr 5 months

Built batch processing systems improving customer notification efficiency.

Customer Response and Notifications Management AT&T

Aug, 2013 - May, 20151 yr 9 months

Designed pipelines increasing sales pipeline value.

Education

Bachelor of Engineering (Electronics)
Rashtrasant Tukadoji Maharaj Nagpur University (2013)

Certifications

Cloudbees certified jenkins engineer
Aws cloud practitioner

AI-interview Questions & Answers

I am Sashank from Delwal. I have 11 years of experience in the IT industry. I have 8 years of experience in Python and 5 years of experience in machine learning and machine learning operations. Okay? The model operations and machine learning operations. I work on classical ML algorithms, deep learning, and other areas. And now I'm focusing more on JNI, MLMs from engineering, linear chain, AI, and all these things. Okay. So this was a brief overview about me. Thank you.

How can you optimize resource allocation in a Kubernetes cluster running heavy Python-based machine learning workloads without overprovisioning? So, what we can do is use the right sizing resource request and limits. Okay. We can use appropriate secure requests and appropriate secure limits, and also the number of nodes, and the number of nodes limited. Okay. We can do the second thing is to use node affinity on specific variations. So, use node affinity to schedule ML workloads on nodes with specific characteristics. For example, nodes with GPU or high memory nodes. Yeah. What is tolerance? 10 is like 10 tolerations. Use 10s and tolerations to control which parts can be scheduled on certain nodes, helping to isolate unmanned workloads from other less critical workloads. Then you can use auto scaling, which is HPA, automatically scale the number of replicas based on CPU or memory usage. Okay. So, this ensures your application can handle varying loads without manual interventions. Okay. Then we can use the cluster autoscaler to automatically adjust the size of the Kubernetes cluster based on the resource request. This ensures that the cluster can scale up to accommodate increased workloads and scale down to save costs when demand decreases. Then we have resource quotas and. So, what we can do is set resource quotas at the namespace level to control the aggregate resource consumption of all pods within a namespace. This prevents resource starvation and ensures fair resource distribution. Then we have efficient resource utilization. Okay. So, we can use the spot instances for noncritical or batch ML workloads. Like, we can use GPU. Okay. You can use jobs to efficiently utilize the GPU resource. Okay. Yeah. That's pretty much apart from this, we can have a monitoring and logging system so that we can continuously monitor and limit alerts if we see any hiccups. Okay.

We have different log levels like debug, info, warning, error, critical. So we have to define these log levels and use them wisely. Then we should also use a logging configuration. To use Python's built-in logging module, we can configure the log levels, formats, and handlers. Then, structured logging can be used to make logs more readable and easier to parse. Libraries like Python's JSON logger can be used to format logs in JSON. Next, we should use a centralized logging solution. We can store our logs in a space and create a Persistent Volume Claim (PVC) to ensure that logs persist even if the cluster scales up or down. Alternatively, we can dump logs into a monitoring tool like Datadog. This will allow us to view logs without needing an ELK stack or advanced knowledge.

Let's take the steps to containerize a Python-based machine learning inference service using Docker. Okay. So, what we can do is, first, you should have three things. I mean, four things you should have. You should have your source code folder, your test suite unit test suite folder. Okay. And then your source code will have your model file. Okay. Then, we will have source code. We will have a test suite. Then we should have a requirements.txt file, and we should have a Dockerfile. Okay. Why we should have a test file is we should have a test file because before every deployment, you should actually try to run the unit test. You can check if the code will run correctly. Okay. This will prevent unnecessary delay cycles. I mean, this will reduce delay cycles. Okay. Then what we can do is we can create a Dockerfile. In a Dockerfile, we can start with some base image, like Python 3.10, or depending on the requirements we have. Then we can have a working directory in that. We can copy all our code there. Then we can install the requirements with pip. Okay. And whatever endpoint we want to expose for our app, we will expose. Okay. If you want to have some environment variables there, we can set them. Okay. After that, if we want to have whatever command we want to run to run the Docker application. So at the end, we'll add that command in the CMD, with brackets, and each word in quotes. Okay. Then we will do docker build. Okay. We will build the Docker image with docker build, with the hyphen t, and then whatever repository name we want to give or build we want to give. Okay. Then we will run the container from that build, okay, locally, and we will test the inference service locally. Once this test is also done, the first one's unit test case, the second one is this buffer test. Okay. Now everything is working here. Then we can push the Docker image to its registry wherever you want to register. Okay. And then we can have Helm charts. You can deploy those Helm charts. In that Helm chart, we will have to mention the registry URL and the tag of the image. Yeah. This will be the steps.

What approach will you take to troubleshoot performance bottlenecks in a Python-based machine learning API running on Kubernetes? That's a very interesting question. One good experience is to answer this question. I can try to answer this. So, first, we have to collect metrics and know the bottlenecks. We cannot directly go and fix the automate because we have to find the bottleneck. So, for that purpose, we will set up monitoring. We can use monitoring with Grafana or have it using Datadog. Then, you can also choose to use the metrics server. That will give us stats around the resource usage metrics. Then, what should we monitor? We should monitor port metrics, like CPU and memory usage. We should also monitor the number of restarts and resource request limits. Then comes mode metrics. Overall, the source usage across our cluster. Then comes custom metrics, such as model inference time, number of content requests, and request time. And, for performance, there should be a section then. We should analyze the resource utilization, like resource for mode. And we should investigate the logs as well. We should handle the logs into Datadog or have some ELK set up where we can just aggregate and analyze our logs. We should try to find some pattern out of the logs. We should try to find a pattern out of the metrics. We should create a story out of it. One thing we should also do is profiling of the application. Once all this is set up, then we should go and run the load testing. And with the load testing, if we found any issue, we should try to recreate it. And then if we are able to recreate it, then find the bottlenecks around it using all of these things that I have described earlier. And based on that, we'll optimize the code and the configuration, whichever is required. We will also implement the resources. If we are underutilizing it, we will do it accordingly. If we are overutilizing it, and we will work on it. We will also configure network and storage.

Outline the process for converting stateless machine learning APIs in Python to stateful services in Kubernetes for complex processing needs. Okay, so first, we'll have to define the state requirements. Determine what state information is to be maintained across requests. For example, user sessions or the intermediate features between APIs or the cache or models. Then, we'll have to implement state management. We'll have to decide how and where the state will be stored. Are we going to store the state in a database or locally or in memory? Accordingly, we can modify the API to handle the state. We'll integrate a state storage mechanism, such as using MongoDB to store and retrieve this state information. Then, we have to upgrade the app. We have to update the Docker configuration to add any necessary dependencies. Then, we'll have to implement stateful logic in the application as well. And then we'll have to deploy this stateful service into Kubernetes.

Doctor's files needed that could potentially break the bill when leveraging the cashiers. Okay. The first issue is I can tell you the order of three instructions. So, we are first doing initializing from the base image. I don't see. But when we are doing the run click install. It cannot be the first thing. Okay. So, what we'll have to do is first initialize it, then we have to create a work directory, which is app. Once the app work directory is there, then we can copy the requirements and other things to app. Okay. I don't think we should use the add, but we should actually use the copy command. These are the issues.

So for this, we'll have to ensure that the enterprise properly sets up the necessary resources and tools. Okay. We have to define the testing mechanism for this. First, we'll have to define what to cache and what part to cache. Then we will choose the caching method. We can use either Memcache. If we have a data structure and just store the memory data structure, then we should use the identity cache. It is widely used for. But if you want to use distributed memory object caching, then we will go and use Memcache. Because it's Kubernetes, and it's kind of like this should have a thing that we should go with Memcache, but for Airflow things, we use Redis and we can deploy Airflow on the. It depends on the actual use case. You cannot say a blanket statement if you are using. So you should use.

We should use a multi-zone cluster to ensure high availability. So, if one availability zone goes down, the others can continue to serve. We'll use a cluster autoscaler to automatically adjust the size of the cluster based on resource usage and workload demands. Second, we should deploy the service configuration with replicas to ensure redundancy. We should also use the Horizontal Pod Autoscaler (HPA) to scale the replicas based on resource usage. For storage and data management, we can use Persistent Volumes (PVs) to manage storage for stateful components like databases or artifacts. We can also use distributed storage solutions like Amazon EFS, Google Cloud File Storage, or Azure Files for high availability. We should also think about monitoring and logging. We should create backups of critical data regularly, including artifacts, configuration files, and databases. We should also set up a front-end setup to take care of this property on a regular basis. For load balancing and traffic management, we should use an ingress controller like NGINX to manage external access to services and provide load balancing. Finally, we should have security and compliance in place. We should have role-based access control and network policies defined to control access to resources and ensure secure connections between ports.

Priorities, key considerations when selecting AWS cloud services for deploying a scalable Python machine learning application. So, when selecting AWS cloud services for deploying a scalable machine learning application, we should consider the following things. So, first, we should consider the low latency. Okay. So first, we should consider the low latency. 2nd, we should configure scalability. For scalability, we can use Amazon EC2 auto-scaling. It should also be noted that we can also use AWS Lambda, which will be serverless and event-driven architecture. We can use, like, EKS. EKS will have clusters and ports which will scale up and scale down. We should set up high availability and fault tolerance. So, like, if we use a database, then we can use Amazon RDS multi-AZ. Or we can use Amazon S3. We can also define Amazon Route 53, so that we have a high-reliability, scalable DNS service for routing traffic. Then we'll come about ELBs. So, how it will distribute the incoming traffic from multiple targets. Then we should think of performance optimization. So, for performance optimization, we have to talk about Amazon instances. We should use instances according to our workload, if it is compute-optimized or memory-optimized. If you talk about Amazon FSX, I mean, if you need a high-performance file system for high-speed processing of large datasets, and we can use this. If we have to improve the global ability and the performance of the application, then we introduce the AWS Global Accelerator. Then we'll come to data management and storage. And next thing will come on security and compliance. The same thing I also talked about in the previous answer. So, in security and compliance, we have to define IAM roles. We have to define the KMS. We should also use AWS Shield. Then it was WAF. Then for monitoring all of this, I'll introduce AWS CloudTrail, which will help to monitor all the API calls, and then we can do auditing around it. Yeah. And the important thing is cost efficiency. So, whatever we are doing, are we underutilizing or utilizing it? If we have any batch things, we can do spot instances.

On the rule of MFluence in language management of Python machine learning model life cycles in a community based platform. Elaborate on the role of MLflow in simplifying the management of Python machine learning model life cycles on a Kubernetes based platform. So, MLflow is used for tracking our model. When we train our model, we track them using MLflow and upload all the artifacts around the model, including the model itself. Once we're done with multiple experiments, we can compare these models, including the graphs and expressions stored in MLflow, across all the runs. We can then choose a model, register it, and create an endpoint. We can host that model on a Kubernetes-based platform and create an endpoint. We can use that endpoint to generate statistics for monitoring purposes, deploy those statistics to a monitoring tool like Datadog, and compare the results. If there's any data drift or concept drift, we should go back to testing the data and repeat the training process. So, everything can be done using MLflow.