A DevOps /Site Reliability Engineer with hands on experience in automating, optimizing, supporting mission critical deployments, leveraging automation, IaC, K8s and implementing effective solutions on Cloud using CI/CD and DevOps processes.
Site Reliability Engineer
Infracloud TechnologiesDevops/ Technical Consultant
OneTrustAssociate Devops/Cloud Support Engineer
AccentureAWS
Azure
GCP
Azure Devops
ArgoCD
Docker
K8s
Terraform
Prometheus
Grafana
Bash
Kyverno
Kustomize
Yep. Hi, all. Uh, myself, Kostu, and, uh, I'm working as a DevOps and site reliability engineer from the last, uh, almost, uh, 5 to 6 years now. So I have been part of, uh, 3 companies. I've been working on the service side of things as well as, uh, the, uh, product side of things. So my main experience, uh, comes in the field of DevOps, and, uh, I have been dominantly, uh, working on Kubernetes and different uh, cloud native Kubernetes stacks, like AKS, EKS that is from AWS on Azure and, uh, GKE on GCP. Along with that, I've also used various, uh, cloud provider services and integrated them, uh, along with the the Kubernetes, uh, managed services. Uh, also, I have worked, uh, on, uh, infrastructure as code, uh, on different clouds, uh, with Terraform. So I've used Terraform for the last almost 4 to 5 years now. And, um, yeah, I have been working with, uh, shell scripting, uh, more than Python. I have not used Python much. I've not got a chance to use, uh, Python much. Apart from that, I also worked on, um, Kustomize, Helm, uh, and different Kubernetes, uh, kind of, uh, automations and distributions, like Argo CD for GitOps. I've also used CICD, uh, GitLab CICD, Azure DevOps. I've integrated Azure services with Azure DevOps, used Terraform along with that in different stacks. Yeah. And, um, I have worked, uh, on the customer side of things as well when I was working in, uh, an organization. Um, I have worked with different customers from different geographical locations, uh, for helping them set up our product on their infrastructure, uh, which is deployed on, uh, Kubernetes. Right? So this is the, um, this is from the, uh, professionals, uh, site. And, uh, yeah, I am basically from, uh, India, Karnataka. Uh, right now, I'm working, uh, remotely for an organization as a site reliability engineer. Yeah. My hobbies include, uh, you know, playing different musical instruments, working out, uh, going for a jog, going for a bike ride, etcetera. Yep.
Okay. Uh, so, uh, when it comes to AWS, uh, development kit, I have not got a chance to use, uh, AWS CDK, uh, when it comes to infrastructure as code. Uh, so I have used, uh, Terraform and a little bit of CloudFormation as well. So what helps with, uh, okay. Uh, I can just go on and explain about how, uh, infrastructure as code works with, uh, use case like, you know, network provisioning. Right? Yeah. So, basically, you set up your account and, uh, then using, uh, Terraform, you basically do a, um, you you queue what what is the, uh, back end storage where you want to uh, store your state file for the Terraform for creating any, uh, resources like network resources, etcetera. So the first and foremost thing is, uh, you create a VPC, uh, and then you, uh, create different, uh, VNets and subnets, uh, as you require. So with Terraform first, uh, once you write all the Terraform code for VPC subnets, uh, you can place your, uh, ideas in a private subnet and in public subnets or different things with your VMs, which needs to be, uh, externally accessed. And, uh, then, uh, you basically write, uh, modules for VPC, RDS, your subnets, v nets, or you can also use reusable modules if somebody else has written. Right? So, uh, then you pass different commands, Terraform minute, Terraform validate, Terraform plan, um, and then, basically, Terraform apply if everything goes out well. Right? Yeah. Yeah. And, also, you can create different things. Uh, you you using Terraform, you can create, um, like, network provisioning. Right? Yeah. So you can create, uh, different VPCs. You can have your resources in different VPCs and then have a peer to peer connection or, uh, VPC peering that we see. And, also, you can have transit gateways as well. If there are a lot of VPCs involved in your, uh, account, then you can have a transit gateway as well. Yeah. I have not got a chance to use AWS.
Yeah. So you, uh, your if you if you use the Python as your programming language, right, then the first thing to do is, uh, write a Docker file. Uh, and then what what you can basically do is you can you you can have your Dockerfile written at one place and then use CICD, uh, to build their Dockerfile to create an image, uh, and push it into Docker hub or Docker registry. Any any kind of Docker registry. Like, when you are using AWS, you can push it into ECR. Right? Uh, also, you can use another thing called har harbor as well, which we'll use in our project here. So yeah. So you you write your code using Python, and then you write a docker file. You use a a base image any base image of your choice, and then you copy the code, and then you use the CMD. You expose any ports if you want. Uh, you can also write a multistage Dockerfile. And then you write a CICD pipeline using, uh, AWS code pipeline. Right? So you can have different stages where you will build the Docker file. You'll build the image, then you will do a sneak scan, uh, where the it scans your images for vulnerabilities. And then you do a Docker push, which basically pushes the Docker image created into ECR. You can also have ECR, uh, related variables, uh, involved with AWS code pipeline or any other, uh, GitLab pipeline that you use. And, uh, yeah. And you also, uh, separately, you can also create a, uh, EKS cluster where you want to run this as, uh, the your docker, uh, image, uh, at in the Kubernetes as ports. Right? So you create an EKS cluster. Uh, any public facing cluster or for production visits, uh, go to create a private cluster. Right? So once you create the, uh, that cluster, uh, you can basically write customize customized codes for which can be used for different development and production environments. Again, when you're creating different clusters, uh, using Terraform for development and production, you can use different variable files as well, uh, and you can use different variables to, you know, make, uh, make changes just for the names for the different clusters which we are using for development and production. And, uh, in order to deploy to a different to different clusters, you can use Argo CD as well here. You can use customize. You can use customize with Argo CD as well. So, basically, what happens when you are using customize is, there is something called base and overlays which you can create. So base refers to the, uh, manifest files which are going to be common across environments. And then when you can also have overlays which, uh, which will which will, uh, the the code that you put in overlays will, uh, you know, segregate what, uh, what is in a common file and what is, uh, going to go into, uh, all the, uh, the both the development and the production. So yeah. So I think this is how we can explain it.
Uh, this one, I'm not very sure, uh, about automating the security patching. Uh, so, basically, when you are doing any kinds of upgrades to your Kubernetes cluster, uh, the best way to do, uh, it is using, uh, using either a blue green deployment, uh, where, you know, uh, where only a certain part of the node let's say there are 2 different nodes, uh, Linux notes that are running on your Kubernetes cluster. What you can do is basically, first, uh, create another note then, uh, you know, drain all the parts that are running in one note so that when you train all the ports running in this node, uh, it will, you you know, uh, get scheduled in an, uh, in the new node that is there. And then you can, uh, basically do all the security patching for, uh, this the first node where you want to, uh, do it. And then you go to the next second node where you want to do the security patching, and then you can, you know, uh, you you can again, uh, do the same call in node or, you know, not call in, actually, drain. So what happens is it it gets scheduled on the other two nodes that are running. And, uh, so, basically, then after that is done, the second node is done, you go to the 3rd node and then you, uh, you know, train that node as well. So that will come and sit in the the the second node that is, uh, that you're going to schedule. And then you can basically delete that. So this is how we can do it, uh, without, uh, without having any kind of downtime, uh, on your cable and disc cluster. And, also, when you're using a managed, uh, managed cluster, I think security patching can be done be done by the AWS side itself. But, yeah, uh, in order to do any upgrades of security patchings, uh, otherwise, you can use this blue green or canary deployments, uh, or the type of example that I told. Right? Yeah.
What are you including? A variable file. Yeah. Variable file, uh, where you can use the same names across all the different multi cloud infrastructures. Right? Uh, or any local file or any prefix or suffix that you want to add to your file. Basically, any kind of variable file will help you. Other than that Yeah. I think the variable file with the names, uh, with with the names of the cluster that you want to create or, uh, what is the count that you want to have for different, uh, services for the different components that you're running through the, uh, on different clouds? Yeah. I think that should help.
Um, sorry. Not I never heard about this. What is on? Terraform configuration, how does it impact the capability of the sorry. I have not heard about this.
Yeah. So here, uh, you are using the key directly. You're not using any kind of, uh, Terraform resource blocks for creating, uh, creating the keys. So, ideally, SSH keys or anything like that, uh, can be created using Terraform resource blocks, or you can also create it on your command line and then use it here. But here, you are directly hard coding the key key value, uh, that can be used to log in to your AWS instance. So that should that's a security risk. The default equal to default key, this is the value that you are using here, uh, directly hard coded. And it's also stored in your state state file as well, which in case if it is breached, uh, everything can be seen directly. It's and it can be used to breach your, uh, EC two instance. So, yeah, we are we need to use our Terraform resource block, uh, or you need to create SSH keys on your machine, and it should be encrypted. This is this is hardcoded, and it's not the way.
Uh, here's a snippet from Build docker image. Docker image of latest. So it's using latest as the tag. I think you cannot use latest as the tag here, and that's not the
So, yeah, uh, basically, you can have 3 tiers. Uh, 1 is for your database. 1 is for your application tier. 1 is the, uh, front end tier. Front end, back end, and, uh, database tier. So in order to ensure a very high availability, we can, uh, deploy this in multiple availability zones in both AWS and Azure. You can using, uh, traffic manager, and, uh, we can also use, uh, we can we can also use content delivery network to ensure, uh, that the latency is less. And, uh, yeah.
I'm not very sure about this.