
Have worked and developed MicroService architecture, orchestrated containerized applications in Google's Kubernetes Engine platform, Managed resources in AWS, GCP, AZURE cloud platform. Worked with various DevOps tools to optimized Development and Operations processes with various teams.
Developed Python scripts on automating image creation process Using packer. Maintained and monitored Cloud Infrastructure using Infrastructure as a Code (Terraform & CloudFormation Template).
Automated application-specific Tasks and cloud resource updates using Power-shell and Shell Scripting.
Senior DevOps Engineer - L2
NashtechDevOps Engineer
ByjusDevOps Engineer
VirtusaDevOps Engineer
Bridgelabz
ADO

GitHub Actions

CloudWatch

DataDog

Liquibase

Sysdig

TeamCity
.png)
Docker

Kubernetes

Python

Bash

Terraform

Ansible

CloudFormation

Prometheus
.jpg)
Grafana

SonarQube
By giving yourself a brief introduction. Uh, hey. Hello. I'm Sachin. I'm a senior devops engineer at Nasdaq. I have a multi cloud expertise, uh, in implementing microservice architectures and infrastructure orchestration for hosting the applications and services. Also, I have, uh, expertise in implementing, uh, DevOps CICD pipelines and to end automated solutions for the application release build and testing areas as well. Um, so I have a skill sets on scripting, uh, majorly in Linux and Python and as well as in partial to manage the required workload to automate the task. That's about me.
Do you minimize the downtime during rolling update on AWS EC 2 based application? Downtime during rolling update on EC 2 based. Uh, so, basically, for this one, let's assume that I'm hosting, uh, application directly on an EC 2 instance layer. Uh, what I will be doing is that I would be creating the parallel AMI assuming that, uh, the application is exposed through the load balancer and all. I would be creating one more target group, uh, which was the the other version of the application running and registered to the load balancer. So when it come to a rolling update, uh, easily, I will remove the target group that is added as a listener to the load balancer that is serving the application. That way, we could be minimizing a lot of downtime as well as we have the backup of EMI of the current version of the application and as well as the previous version of the application.
What efficient method would you use to deploy Lambda function update using CACD pipeline in AWS? Uh, would you use to deploy Lambda function update using so let's assume that I have a Lambda function deployed on the Python 3.19 runner. Uh, I would like, since it has a a limitation that if the application function, uh, the zip files are more than 10 MB, you cannot directly upload. So usually, what I would do is I would be taking the application uh, function. So either lambda function into s three bucket. And in my CICD scripting, uh, we can use any platform. But as you mentioned, in this one, I can go with Jenkins or I can go with GitHub Actions or I can go with AWS code pipeline itself that has the AWS CLI to, um, deploying, uh, the Lambda function zip, or, like, updating the Lambda function zip and all. So what I would be doing is I would be referring to the specific s three ui path that would be pointing to the zip file of the function that I need to update in the specified, uh, Lambda function. So that way, uh, it would be easier to manage the entire application deployment through a CICD. And as well as managing the Lambda version of the function update as well, uh, the release version.
Up through the process to convert a monolithic application architecture into microservice on the Kubernetes. Assume that I have a monolithic application, uh, which has the, uh, front end proxy server, uh, proxying all the requests coming into the main application server. Application server is connected to the database and also I'm assuming it's a 3 tier, uh, architecture. So what I would be distinguishing is, first of all, I would be setting up the, uh, database either on the direct Kubernetes or an RDS instance. Uh, assuming that it's on Kubernetes layer, I would be deploying a database on a stateful set application in the back end. And when it comes to an application, I would be deploying it as a deployment set, uh, so that it connects to the back end services that it has a separate, uh, manifest file for configuring the application itself. When it comes to exposing that application to the environment of the end users, I would preferably use the ingress controller. So there are, uh, plenty of ingress controllers out there. For example, uh, ALB ingress controller is there. Traffic ingress controllers there. Engineering ingress controller is there. I would be deploying that, uh, to expose the application in parallel to the ingress resource, which has the definition for how to, uh, request us to be forwarded to the application and as well as the, uh, context of how do we, uh, segregate the request if select the service for the forwarding tools and all. That way, uh, micro service architecture can be, uh, implemented.
Share your approach for automating the security patch for Linux based Kubernetes cluster. Share your approach for automating the security patching of Linux based Kubernetes cluster. Uh, I'm assuming on the Kubernetes application layer. So usually, uh, what to say, the API versions are the responsible for deploying or creating any objects in the Kubernetes cluster. So when it comes to Kubernetes update as well, cluster update as well, I would be considering, uh, what would be the major change and what would be the, uh, what to say, the application of version. Sorry. Uh, the, uh, resource versions are duplicated and all, taking into consideration that I would be putting into the script, uh, and I would be using the tools that are responsible for managing the security context of the Kubernetes cluster. Uh, there are plenty out there in the, uh, CNCF formations, uh, and as well as the Datadog is there. So I'm experienced with that and managing it also. I would be checking the cluster vulnerability states. And based on that, if it is affecting the application, I would be considering the first the changes to the application. I would test that on the lower environment before updating on the production and all in the security patches. I would be updating the script that would capture the vulnerabilities that are responsible for, uh, security patches and all, the context of the cluster. Then I would apply that changes through security scripts or directly on the cluster controller which I use like cube CTL and all for managing.
Offline strategy for managing the state in Terraform across multiple environments. Okay. Uh, for managing the Terraform state, first of all, I would be using the remote back end. Uh, in terms of orchestration layers and all, I'm assuming that each environment is distinguished in different accounts. For example, it's in AWS. Let's say we have a dev environment. It's dedicated with 1 AWS account. Stage environment is there. It is dedicated to 1 AWS account. And production environment is there. It's dedicated to 1 account. So I would be configuring the remote state resources in each of the environment by creating the DynamoDB table and an s three bucket in the specific accounts. And, also, I would be, uh, using whenever it comes to a dev environment, I would be selecting the Terraform back end configuration that is specific to the dev environment and stage environment and prod environment. That way, we are isolating the states of each environment provisioned in specific to the account and specific to the resource it has been provisioned in the account.
Assume you are removing a telephone module for deploying an e c two instance and notice the following block. Can you point out an potential risk here? AWS resource, AWS instance is there, AMIT 2 micro deployment key is there, and security uplist is there, and the tag is there. Variable is deployment key. SSH key used to the instance type string and default. Okay. Assume you are in a Terraform module. Oh, yeah. So if I assume this as a Terraform module, the values for the AMI, the instance type, and the security group as well as the tags are are coded. It cannot be reused if I want to pass the custom value, for example, for the AMI or the instance type, I want to change it to from t 3 to t 2 to t 3. That as a restriction. It is kinda specific to one set of configuration that can be deployed. This cannot act as a module. But if you ask me the changes, I would be parameterizing all the, uh, fields in the AWS instance resource block. For example, I would be creating a variable for AMI ID, the instance type, the security group ID to capture, and the tags as well. So everything is variabilized and parameterized. So only thing is that I need to, like, source this, uh, modules in my main route, uh, Terraform module and declare the variable values. By that way, we can minimize this risk.
Uh, examine this partial Kubernetes deployment YML where ML model service is defined. Okay? Identify what crucial details is missing, which is necessary for the distributed ML interface setup. Okay. Assuming it's a Kubernetes deployments and it has a replica of 3, and the selectors are there, and the templates are there. And the containers are this and this. Partial is there with ML model. Okay. Identify a crucial detail is missing, which is necessary for the distributed ML interface. Okay. One is that, uh, it doesn't have a, uh, what to say, a pods tolerance that declares, like, which nodes to be picked and all. By default, it would be going to whichever the taint and the tolerations can be matched. And the second layer is that, uh, it does not have the definition of, uh, initializing the commands, uh, what need to be initialized for the the command documents to the control, uh, the containers. 2nd is that the image pool policies are not being distributed added properly. I think it takes the default is, like, uh, if not present. ML model service, uh, for the distributed. I think and, uh, I would say the secrets management and the volume mounts are not being declared. I would say it's a bare minimal, uh, deployments manifest file has been produced. But I would say if given more details, like, in terms of, uh, what interface and what volumes to be added, Is it an EBS needed? Are the FS supposed to be mounted? And any security keys has to be added into the system and all. Uh, if those details are needed, I can provide the complete details of the deployment manifest.
How would you design a system to auto scale, control, and machine learning workload in hybrid cloud setup using Kubernetes? Okay. Um, design a system to auto scale containerized containerized machine learning workloads in a hybrid cloud. Okay. Uh, assume that I have a ML related application running on the Kubernetes layer, uh, what I would be doing is that, um, I would be installing a custom, what to say, uh, events manager or let me take it another approach. Uh, I would be declaring an HAP on the Kubernetes layer model. The HAP comes into monitoring the resource consumption. For example, the CPU, HAP in a sense, uh, horizontal port scaling. Uh, the CPU is required, uh, the limitations of the CPU. If this threshold of the CPU is crossed, what should be the minimum number of or desired number of instance should be there, and all those declaration I would be adding. In context to that, I would be installing a Karpenter into the Kubernetes cluster. The reason is Karpenter is well versed in managing the dynamic workload managements, especially in provisioning the desired node type and the node configuration for hosting the port applications. Uh, assuming the load has been increased or the parallel jobs for the workload for managing the ML application is there. Uh, depending on the threshold cross, there would be a port scaling event will happen. That port scaling event will be captured by the carpenter as well, which has to provision a node specific to schedule that parts, uh, which is a dynamic in nature. By this way, I could do the auto scaling. And as well as let's assume that the ports scaling up policy has been defined and scaling down policy will also be defined in place. So if the threshold crosses below the given metrics value, for example, the memory and the CPUs are there, uh, if it, uh, reaches the below the threshold value and all, the port will scale down automatically.
Okay. Explain your approach to optimizing Kubernetes cluster for deploying computer vision models developed in Python. Explain your approach to optimizing Kubernetes cluster for deploying computer vision developed more developed models in Python. I would be honest on this. I have never worked on the, uh, computer vision models, but, uh, in my experience based on the similar applications I have worked on, what I would be doing is the basic minimum tools required for the Kubernetes cluster to orchestrate the application hosting and all. So the tools I would recommend is the carpenter. Assume that I'm in a case cluster and all. I would be using a carpenter to dynamically manage the worker nodes, uh, for the cluster to host the applications. In parallel to that, I would be using the Argo CD to deploy and manage my application deployment contacts and all from the GitHub flow events. Uh, in parallel to that, I would be putting ingress. And if the service has to be exposed to the external users and all, I would be taking ALB ingress controller by default. We have others as well, but I would be considering ALB in risk controller and all. And by this way, let's assume that I would keep the things in place. Uh, I would declare if no tolerance and no change will be added on the Kubernetes node group layer carpenter node group layers to mask the specific workload to be run on specific notes. By this way, uh, we know exactly, okay, what kind of instance type has been provisioned and ultimately it comes to the cost optimization layer as well.
Discuss your experience with setting up distributed ML interface on the platform like AWS SageMaker or Kubernetes based solution. I have not worked on AWS HMAKER on ML interface.