
Have worked and developed MicroService architecture, orchestrated containerized applications in Google's Kubernetes Engine platform, Managed resources in AWS, GCP, AZURE cloud platform. Worked with various DevOps tools to optimized Development and Operations processes with various teams.
Developed Python scripts on automating image creation process Using packer. Maintained and monitored Cloud Infrastructure using Infrastructure as a Code (Terraform & CloudFormation Template).
Automated application-specific Tasks and cloud resource updates using Power-shell and Shell Scripting.
Senior DevOps engineer
Velotio TechnologiesSenior Software Consultant L2
NashTechSoftware Consultant
NashTechDevOps Engineer
VirtusaDevOps Engineer | Cloud
BYJU'SSoftware Consultant
NashTechDevOps Engineer
Bridgelabz
ADO

GitHub Actions

CloudWatch

DataDog

Liquibase

Sysdig

TeamCity
.png)
Docker

Kubernetes

Python

Bash

Terraform

Ansible

CloudFormation

Prometheus
.jpg)
Grafana

SonarQube
Hello, I'm Sachin. I'm a senior DevOps engineer at Nasdaq. I have a multi-cloud expertise in implementing microservice architectures and infrastructure orchestration for hosting applications and services. I also have expertise in implementing DevOps CI/CD pipelines and to automate end-to-end solutions for application release, build, and testing areas. So I have skills in scripting, mainly in Linux and Python, and also in managing workloads to automate tasks. That's about me.
Do you minimize downtime during a rolling update on an AWS EC2-based application? Downtime during a rolling update on an EC2-based application is so, basically, for this one, let's assume that you're hosting an application directly on an EC2 instance layer. What you would be doing is creating a parallel AMI, assuming the application is exposed through a load balancer and all. You would be creating another target group, which is the other version of the application running and registered to the load balancer. When it comes to a rolling update, you can easily remove the target group that is added as a listener to the load balancer that is serving the application. That way, you can minimize a lot of downtime. You also have a backup of the AMI of the current version of the application as well as the previous version of the application.
What efficient method would you use to deploy a Lambda function update using a CICD pipeline in AWS? I would use to deploy a Lambda function update using so let's assume that I have a Lambda function deployed on the Python 3.9 runner. I would like, since it has a limitation that if the application function, the zip files are more than 10 MB, you cannot directly upload. So, usually, what I would do is take the application function, either the Lambda function, into an S3 bucket. And in my CICD scripting, we can use any platform. But as you mentioned, in this one, I can go with Jenkins or I can go with GitHub Actions or I can go with AWS CodePipeline itself, which has the AWS CLI to deploy, or like, update the Lambda function zip and all. So, what I would be doing is referring to the specific S3 UI path that points to the zip file of the function that I need to update in the specified Lambda function. So, that way, it would be easier to manage the entire application deployment through a CICD. And as well as managing the Lambda version of the function update as well as the release version.
Up through the process to convert a monolithic application architecture into microservice on the Kubernetes. Assume that I have a monolithic application, which has the front-end proxy server, proxying all the requests coming into the main application server. The application server is connected to the database, and I'm assuming it's a three-tier architecture. So, what I would be doing is, first of all, I would be setting up the database either on the direct Kubernetes or an RDS instance. Assuming that it's on the Kubernetes layer, I would be deploying a database on a stateful set application in the backend. And when it comes to the application, I would be deploying it as a deployment, so that it connects to the backend services, which has a separate manifest file for configuring the application itself. When it comes to exposing that application to the environment of the end users, I would preferably use the ingress controller. There are plenty of ingress controllers out there, for example, the ALB ingress controller, traffic ingress controller, and NGINX ingress controller. I would be deploying one of these to expose the application in parallel to the ingress resource, which has the definition for how to route requests to the application and as well as the context of how to segregate the requests and select the service for forwarding. That way, microservice architecture can be implemented.
Automating the security patch for Linux based Kubernetes cluster involves several steps. On the Kubernetes application layer, I consider the API versions responsible for deploying or creating objects in the cluster. When it comes to Kubernetes updates and cluster updates, I consider the major changes and the application version. I take into account that resource versions are duplicated and use tools responsible for managing the security context of the Kubernetes cluster. There are several tools available in the CNCF and Datadog, and I'm experienced with using them to manage the security context. I check the cluster vulnerability states and consider the first changes to the application if it's affected. I test the changes in the lower environment before updating in production and apply security patches. I update the script to capture vulnerabilities responsible for security patches and the context of the cluster. Then, I apply the changes through security scripts or directly on the cluster controller, which I use like cURL and other tools for managing.
Offline strategy for managing the state in Terraform across multiple environments. For managing the Terraform state, first of all, I would be using a remote backend. In terms of orchestration layers and all, I'm assuming each environment is distinguished in different accounts. For example, it's in AWS. Let's say we have a dev environment, which is dedicated to one AWS account. The stage environment is dedicated to one AWS account. And the production environment is dedicated to one account. So, I would be configuring the remote state resources in each environment by creating a DynamoDB table and an S3 bucket in the specific accounts. I would also be using the Terraform backend configuration that is specific to the dev environment, the stage environment, and the prod environment. That way, we are isolating the states of each environment, provisioned in specific accounts and resources within those accounts.
Assume you are removing a telephone module for deploying an e c two instance and notice the following block. Can you point out an potential risk here? AWS resource, AWS instance is there, AMIT 2 micro deployment key is there, and security uplist is there, and the tag is there. Variable is deployment key. SSH key used to the instance type string and default. Okay. Assume you are in a Terraform module. Oh, yeah. So if I assume this as a Terraform module, the values for the AMI, the instance type, and the security group as well as the tags are coded. It cannot be reused if I want to pass custom values, for example, for the AMI or the instance type, I want to change it from t 3 to t 2 to t 3. That is a restriction. It is quite specific to one set of configuration that can be deployed. This cannot act as a module. But if you ask me the changes, I would be parameterizing all the fields in the AWS instance resource block. For example, I would be creating variables for the AMI ID, the instance type, the security group ID to capture, and the tags as well. So everything is variable and parameterized. So only thing is that I need to source this module in my main Terraform module and declare the variable values. By that way, we can minimize this risk.
Examine this partial Kubernetes deployment YML where ML model service is defined. Okay. Identify what crucial details are missing, which is necessary for the distributed ML interface setup. Assuming it's a Kubernetes deployment with a replica of 3, and the selectors, templates, and containers are defined, a crucial detail missing is the specification of pod tolerations that declare which nodes to be picked and all. By default, it would be going to whichever the taint and the tolerations can be matched. Another crucial detail missing is the definition of commands to be initialized for the command documents to control the containers. Additionally, the image pull policies are not being distributed added properly; it takes the default if not present. For the distributed ML model service, secrets management and volume mounts are not being declared. I would say it's a bare minimal deployment manifest file has been produced. However, if given more details, such as what interface and what volumes to be added, whether an EBS is needed, and whether file systems are supposed to be mounted, and any security keys that need to be added into the system, I can provide the complete details of the deployment manifest.
How would you design a system to auto scale, control, and machine learning workload in a hybrid cloud setup using Kubernetes? Okay, I'd design a system to auto scale containerized machine learning workloads in a hybrid cloud. To do this, I would install a custom events manager on the Kubernetes layer. I would declare a Horizontal Pod Autoscaler (HPA) model, which would monitor resource consumption, such as CPU, and trigger horizontal pod scaling when a threshold is crossed. For example, if the CPU usage exceeds a certain limit, the HPA would determine the minimum or desired number of instances required, and I would add those declarations. In this context, I would install Karpenter into the Kubernetes cluster, as it's well-versed in managing dynamic workload management, especially in provisioning the desired node type and node configuration for hosting the workload. Assuming the load increases or parallel jobs for the workload need to be managed for the ML application, a port scaling event would occur, which would be captured by Karpenter. It would then provision a node specific to schedule those pods, which is a dynamic process. By this approach, I could achieve auto scaling. Additionally, I would define scaling up and scaling down policies based on metrics, such as memory and CPU usage. If the threshold value is reached below a certain limit, the port would scale down automatically.
I would be explaining my approach to optimizing Kubernetes cluster for deploying computer vision models developed in Python. In my experience based on similar applications, I would do the basic minimum tools required for the Kubernetes cluster to orchestrate the application hosting. The tools I would recommend include the Cluster Autoscaler, assuming I'm in a case cluster. I would use the Cluster Autoscaler to dynamically manage the worker nodes for the cluster to host the applications. In parallel, I would use Argo CD to deploy and manage my application deployments from GitHub flow events. In parallel to that, I would set up ingress. If the service needs to be exposed to external users, I would use the ALB ingress controller by default. We have others as well, but I would consider ALB ingress controller. By this approach, I would keep things in place. I would declare no tolerance and no changes will be added on the Kubernetes node group layer, specifically to mask the specific workload to be run on specific nodes. By this, we know exactly what kind of instance type has been provisioned, ultimately coming to the cost optimization layer as well.
Discuss your experience with setting up distributed ML interface on the platform like AWS SageMaker or Kubernetes based solution. I have not worked on AWS HMAKER on ML interface.