profile-pic
Vetted Talent

Abhimanyu Prajapati

Vetted Talent

Experienced DevOps Engineer with 6 years of expertise in architecting, automating, and optimizing large-scale, mission-critical deployments. Proficient in driving end-to-end DevOps processes, including advanced configuration management and CI/CD pipelines, to enhance system reliability, scalability, and performance.

  • Role

    Senior DevOps Engineer

  • Years of Experience

    7 years

Skillsets

  • version control
  • CI - CD
  • Kubernetes
  • Linux
  • AWS
  • Kubernetes
  • Monitoring
  • Deployment
  • Google Cloud
  • MLOps
  • Terraform
  • Shell Scripting
  • Python
  • automation
  • Security compliance
  • Scripting
  • Monitoring
  • Logging
  • infrastructure as code
  • Databases
  • Container orchestration
  • Configuration Management
  • Collaboration Tools
  • cloud platforms
  • CI/CD
  • Build Tools

Vetted For

15Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Software Engineer, DevOpsAI Screening
  • 77%
    icon-arrow-down
  • Skills assessed :infrastructure as code, Terraform, AWS, Azure, Docker, Kubernetes, 組込みLinux, Python, AWS (SageMaker), gcp vertex, Google Cloud, Kubeflow, ml architectures and lifecycle, pulumi, seldon
  • Score: 69/90

Professional Summary

7Years
  • Feb, 2025 - Present1 yr 3 months

    Senior Software Engineer Platform

    Sanas
  • Dec, 2023 - Feb, 20251 yr 2 months

    Senior DevOps Engineer

    Zoomcar
  • Mar, 2023 - Dec, 2023 9 months

    Senior DevOps Engineer

    Observe.AI
  • Feb, 2019 - Jun, 20212 yr 4 months

    DevOps Engineer

    TO THE NEW
  • Jul, 2021 - Dec, 2021 5 months

    DevOps Engineer

    SailPoint
  • Jan, 2022 - Mar, 20231 yr 2 months

    DevOps Engineer

    Observe.AI

Applications & Tools Known

  • icon-tool

    Harness

  • icon-tool

    AWS

  • icon-tool

    SageMaker

  • icon-tool

    terraform

  • icon-tool

    Jenkins

  • icon-tool

    Okta

  • icon-tool

    Github

  • icon-tool

    bitbucket

  • icon-tool

    argocd

  • icon-tool

    Chef

  • icon-tool

    Ansible

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    AWS ECS

  • icon-tool

    Dynamodb

  • icon-tool

    Elasticsearch

  • icon-tool

    logstash

  • icon-tool

    Kibana

  • icon-tool

    loggly

  • icon-tool

    Grafana

  • icon-tool

    Prometheus

  • icon-tool

    cloudwatch

  • icon-tool

    ELK Stack

  • icon-tool

    Amazon EKS

  • icon-tool

    gke

  • icon-tool

    New Relic

  • icon-tool

    CloudWatch

  • icon-tool

    Fluentd

  • icon-tool

    Logstash

  • icon-tool

    MySQL

  • icon-tool

    DynamoDB

  • icon-tool

    MongoDB

  • icon-tool

    Bash

  • icon-tool

    Python

  • icon-tool

    Terraform

  • icon-tool

    Slack

  • icon-tool

    Jira

  • icon-tool

    GitHub

  • icon-tool

    Bitbucket

  • icon-tool

    AWS

  • icon-tool

    Harness

  • icon-tool

    ArgoCD

  • icon-tool

    GitOps

  • icon-tool

    Terragrunt

  • icon-tool

    Helm

  • icon-tool

    Kustomize

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    Azure

  • icon-tool

    Terraform

  • icon-tool

    AWS ECS

  • icon-tool

    Prometheus

  • icon-tool

    Kibana

Work History

7Years

Senior Software Engineer Platform

Sanas
Feb, 2025 - Present1 yr 3 months
    Set up high-availability RKE2 cluster, migrated ML training to on-prem infrastructure, integrated scalable ML pipelines, implemented CI/CD pipelines, and enabled secure rollouts via Spinnaker.

Senior DevOps Engineer

Zoomcar
Dec, 2023 - Feb, 20251 yr 2 months
    Optimized application monitoring, reduced compute costs using AWS Graviton, migrated EKS clusters, and supported tool adoption.

Senior DevOps Engineer

Observe.AI
Mar, 2023 - Dec, 2023 9 months

DevOps Engineer

Observe.AI
Jan, 2022 - Mar, 20231 yr 2 months
    Automated deployment lifecycle, achieved compliance standards, configured SSO for AWS users, and implemented autoscaling solutions in EKS.

DevOps Engineer

SailPoint
Jul, 2021 - Dec, 2021 5 months
    Developed Terraform modules, managed Kubernetes infrastructure, and led CI/CD processes using Jenkins and ArgoCD.

DevOps Engineer

TO THE NEW
Feb, 2019 - Jun, 20212 yr 4 months
    Automated infrastructure provisioning, led cross-region cloud migration, implemented disaster recovery strategies, and managed cost optimization techniques.

Achievements

  • Implemented multi-container deployment for ML-based models
  • Created an automated pipeline in Harness
  • Automation of SageMaker pipeline with MLOPS
  • Infrastructure hardening for compliance certificates
  • Set up SSO for AWS users
  • Implemented Signoz for APM monitoring
  • Event-based auto-scaling using Keda in EKS
  • Graviton Instances setup for cost reduction
  • Karpenter setup for node autoscaling
  • Led a cost-saving initiative
  • Spearheaded the implementation of Signoz for APM monitoring, delivering detailed performance insights and strategic improvements, resulting in a 30% reduction in New Relic costs
  • Strategically configured Graviton Instances in AWS, achieving a 30% reduction in compute costs through decisive planning
  • Directed a cost-saving initiative that slashed operational expenses by 50%, ensuring high-quality outcomes from inception to execution
  • Led a comprehensive AWS EKS migration from version 1.21 to 1.28, orchestrating a seamless transition and enhancing cluster performance significantly, successfully eliminating AWS extended support costs while introducing enhanced new features
  • Evaluated Builder.ai, Cast.ai, and Redis Enterprise, playing a pivotal role in the selection and integration of these tools
  • Acted as a key SME for the cloud team, providing support and contributing to critical initiatives
  • Managed Helm charts and utilized Kustomize for scalable provisioning of EKS clusters, ensuring consistency and performance
  • Automated infrastructure provisioning and configuration management using Chef, streamlining operations and enhancing reliability
  • Led the migration of production infrastructure to a cross-region cloud platform, improving redundancy and efficiency, and implemented Disaster Recovery strategies to reduce Recovery Time Objective (RTO) and Recovery Point Objective (RPO), ensuring business continuity
  • Orchestrated the automation of the multi-cluster deployment lifecycle through Harness, enhancing efficiency and reliability, and cutting deployment time by 50%
  • Achieved SOC-2, PCI-DSS, and ISO compliance by implementing access controls, encryption, continuous monitoring, and anomaly detection, reducing vulnerabilities by 60%
  • Engineered the configuration of SSO for AWS users via Okta, resulting in a 70% reduction in support tickets and a 50% improvement in user management efficiency
  • Implemented Keda and Karpenter for autoscaling in EKS, optimizing resource usage and reducing costs by 30%
  • Automated SageMaker pipelines, boosting operational efficiency by 50% and decreasing processing time by 40% using MLOps best practices
  • Spearheaded the upgrade of the AWS EKS cluster from version 1.23 to 1.28 and adopted Istio for blue-green deployments, reducing deployment time by 80%.
  • Orchestrated a migration from AWS to GCP, cutting operational costs by 30%.
  • Implemented Signoz for APM, slashing New Relic costs by 50%.
  • Transitioned to microservices architecture and Graviton processors, cutting operational expenses by 50%.
  • Led the implementation of Keda and Karpenter, reducing operational costs by 30%.
  • Automated SageMaker pipelines, decreasing processing time by 40% and boosting operational efficiency by 50%.
  • Achieved compliance, reducing vulnerabilities by 60%.
  • Optimized SSO configuration for AWS users via Okta, reducing support tickets by 70%.

Major Projects

3Projects

Enhanced Performance Monitoring

Dec, 2023 - Present2 yr 5 months
    Spearheaded the implementation of Signoz for APM monitoring, delivering detailed performance insights and strategic improvements, resulting in a 30% reduction in New Relic costs.

Cost Optimization with Graviton Instances

Dec, 2023 - Present2 yr 5 months
    Strategically configured Graviton Instances in AWS, achieving a 30% reduction in compute costs through decisive planning.

AWS EKS Migration

Dec, 2023 - Present2 yr 5 months
    Led a comprehensive AWS EKS migration from version 1.21 to 1.28, orchestrating a seamless transition and enhancing cluster performance significantly.

Education

  • BTech in Computer Science with specialisation in Cloud Computing and Virtualization Technology

    University of Petroleum and Energy Studies, Dehradun (2019)

Interests

  • Travelling
  • Cricket
  • Biking
  • AI-interview Questions & Answers

    Okay, so, my name is Abhi Manu, and I'm working as a senior DevOps engineer in Zumkhan, and I have 6 years of experience. And I started my journey at 2.10. You and I worked on a project named Nykaa. And I was working on AWS, Terraform, ELK stack. We did the Doctor exercise as well. We used for monitoring, Grafana, InfluxDB, and Telegraf. And I have also worked on Chef. And, yeah, mostly in that project, I was handling the ecommerce website. So I was handling the infrastructure part as well as all our services were on ECS. So, I mean, everything from CICD, we were using Jenkins. So, the CICD part, infrastructure part, and my monitoring and logging and configuration management using Chef. So these are the tools I worked on in my initial years. And later on, I moved to SailPoint, and there I was working as an infrastructure engineer. And I majorly worked on Kubernetes and Terraform. And from there, I moved on to Observe.ai. In Observe.ai, I have worked on Loggly. That's a new tool. So Loggly, I worked on. I have created multiple accounts or multi-cluster, multi-account pipelines, automated pipelines that will automate from dev to QA then to production and automated all the steps. And we didn't have to manually go and change anything in that. And I extensively worked on Kubernetes. I've worked on scaling. I've implemented Carpenter. I've implemented KEDA. And then, I mean, I moved on to Zoomcar. And in Zoomcar, I've been working on currently working on AWS to GCP migration. And I'm also working on EKS upgradation. I have upgraded EKS from 1.23 to 1.29, and I have worked on cost-saving exercises as well. I have worked on signals and implemented signals for our APN metrics and distributed tracing. So we were getting a lot of, I mean, New Relic was there, but it was very cost-ineffective. So what we did is we implemented signals. And in signals, we have OpenTelemetry that will fetch those APM metrics and give us the distributed tracing. And so it's currently open source, and we implemented that in our prod and non-prod environment. And so that saved us a lot of money from New Relic. And, yeah, a lot of cost-saving exercises I have done. I have done Doctor exercises, I've told. So and, I majorly worked on AWS, GCP, and a few of the projects I've done on Azure as well for Azure DevOps and worked on AKS, mostly around Kubernetes. And, yeah, I think, overall, this is my experience.

    So EC2 based application, there is always a challenge when you are trying to deploy, well, you're trying to do a deployment. So the best case would be to do a rolling update. And for rolling update, the basic architecture is, let's say you have a load balancer and then you have a target group. And below the target group, you have EC2 instances running in your auto scaling group. Right? And that auto scaling group is managed through a launch template. So, basically, if you're going to deploy your new code, you need to update the launch template. There is a tool called Packer. So what Packer does is, Packer will move I mean, you will run the Packer script, let's say, in your Jenkins instance or some other instance which you use for deployment, CICD. So Packer will build your image. So what Packer will do is it will run all the commands in an EC2 instance, and they'll create an AMI for that. Right? So it will build your AMI image. And what you can do is you can use that image and replace that image in your launch template and make it a new version. And you can use that version. You can update that version once that version is updated. So the auto scaling auto scaling group will automatically get bigger. And what you can do is, you can do a rolling update. So in the rolling update, you can say that let's say my 3 instances are running. So once the 3 instances are up and running, you will not delete or you will not terminate the old three instances. It's not like one is coming, and the other one is going down. You can also manage it such a way that, your all the instances, let's say if 3 are running, 3 should come up, and then only, the traffic should be moved to the other 3 and, the rest of the 3 should be terminated. Right? So this, if in this case, let's say some issues or some bugs are coming in, so we can directly, you know, move back to the old version. So termination will not be taking place. Let's say we can directly switch to the previous instances. See, rollback is a problem when we are dealing with EC2 instances. But, overall, I think, this will be this strategy is optimal in case of an EC2 upgradation. Or, you can have a, for rolling update, this is the best way. And for blue green deployment, we can use a blue green deployment, but that is not asked in the question. So to sum it up, we'll have a launch template. We'll have an auto scaling group. We'll have a target group, and the target group will have all the target instances. And we will deploy using, we'll deploy by updating the launch template. Once the launch template is updated, that will be reflected in the target group, and new instances will be coming up. And that will be in a rolling update fashion. And once that is done, new we can have our deployment, with minimum downtime.

    So okay. State plus c b k. So, AWS CDK is native to AWS, while Terraform is cloud agnostic. We can have different Terraform configurations. We can use Terraform in different clouds, such as Google GCP. And in terms of network provisioning, I think because AWS CDK is tightly coupled with the AWS environment, you can directly integrate it with your VPC in your VPC. Wherever in your VPC, you will have different EC2 instances, which are running in your environment. You can directly access those instances. In Terraform, you have to manage a state. So, if you have created a network, such as a VPC, the state will be managed in your S3 bucket or in a local location wherever you are keeping it. And you can use that VPC ID and VPC ID, subnet ID, and security groups wherever you require. Let's say you want to use that VPC ID somewhere while you are creating a load balancer or while you are creating C2 instances, so that you can use that directly. You can dynamically provision and get the data, and you can use that. And in Terraform, you can have modules, and Terraform can use those modules to spin off many easy-to-use networks, many VPCs. Whereas, your CDK, or cloud development kit, is used for beginner-level or if you are just starting with AWS, you can use AWS CDK, which will be easier for you to deploy your environment. You will have an AWS VPC present, and you can create your security groups and everything will be there, and you can create your environment there. While in Terraform, if you want to create a big environment or a very complex environment, Terraform helps compared to CDK.

    So, the workflow using Docker Python AWS services to a consistent, repeated environment for both development and production is as follows: You have a code in your Docker flow using Docker Python AWS services. The Python service is the main service. You have a Python service and you have to run Docker containers and using the AWS services. The first and foremost thing is to replace Lambda with Docker containers, which was recently introduced by AWS. You can use Docker images to run Lambda. With the help of GitHub actions or some other workflows like Jenkins, you can create a pipeline. You have a code in your GitHub or a script in your GitHub. Using a Docker image, you can create a Docker image using a Docker file. The Docker file will have all the commands to start the Docker image. You'll then make that image and push it to ECR, and you can use AWS Lambda to use that image that is present in the ECR. So, directly, you can deploy the code in the Lambda itself. If it's a script or for doing some particular task, that will work. Else, you can create a workflow where you are going into an instance or going into SSH into an instance and using that same image, you can deploy a new container and stop the old one. This will be saved in production and non-production. So, only the branch will be changed. You will have a different branch in GitHub. There will be a deployment branch and a production branch. If you want to deploy production, you can use the production branch. If you want to deploy, you can use the deployment branch. The third option is to use AWS services like ECS. You'll have an ECS service, a service file is there, the service is there, then the target definition is there. You can update the definition with the image, and you can deploy update the service to create the Docker images Docker containers in ECS. ECS clusters are there, and you can deploy the service there. So, directly, you can manage non-production and production. You'll create an image, the image will be moved to ECR from there will update the target definition, and from there, you'll deploy it to the ECS cluster.

    States in Terraform can be managed in multiple ways, but basically, we manage state locally. However, locally, it is not very secure, as it can be corrupted or overwritten. And what we do is manage it in S3 or storage. So we have S3 where we store our state file. That state file can be configured to have multiple versions. Versioning can be enabled in S3. So that way, we will have multiple versions of the same state file. Right? So let's say in the future, if one of the state files, the latest state file gets corrupted, we can move back to the old one. And we can have DynamoDB locking as well. With locking, no two people can write on the same state file. Let's say if I am creating an EC2 instance and my teammate is creating an EC2 instance, if we are both using the same state file and updating it, the right operation can be logged. And that way, only so we can have a lock in there. So that way, no two people will be able to update the same file. And S3 is there. We can enable replication. We can have policies around S3. Like, we can keep our Terraform from a particular box, and we can allow only the box, or instance, and we can allow a particular role of that instance to access the S3 so that no two other people can access the S3. Or if you are running it from your local, then we can use manage it using the locking. And across multiple environments, let's say we have a broad development and QA there. So we can use workspaces. Different workspaces will have a different state file. Let's say, we'll have to switch to the workspace. We'll have to switch to let's say, if you are working on QA, we'll have to switch to the QA workspace. Working on development, we have to switch to the development workspace. And each workspace will have a different path for the state file. And more segregation is beneficial. Let's say, for example, if you are creating a VPC, easy to load balance or target group in one state file, there are very many chances are there that this file will get corrupted, because these are getting it's a very big state file. Right? So what we can do is we can divide the state file into multiple resources. We have VPC. We have EC2. We have load balancers. So that way, all your target areas get smaller and let's say, there is some changes in the VPC that can be easily visible. If you have a very big state file, if a state is changed in some places, it's very easy to overlook that mistake. You know, somewhere, something is connected to something. Let's say a load balancer is connected to a target group, and you're changing some path. And because of that, some other thing is being changed. So you may not catch it, but if you're individually doing it is easier to change.

    Okay. So, implement a zero-downtime deployment strategy for a Kubernetes also. Yeah. So, zero downtime for Kubernetes. Let's say for let's take an example that we have a Python service. We have a GitHub repository there. And for this example, I'll be using Jenkins, and we have a GitHub repository, and in GitHub, we have one code repository, one YAML file repository. Right? We'll have a lot of YAML files out there. We'll manage the scaling, and our deployments will be there. The service file will be there. Your secrets, config map, and Helm charts are there. Right? So we are managing Helm charts for that service, and we have a code. So the first step would be in our Jenkins pipeline to fetch the code. First, we'll do a checkout of the code, then we'll get the code, and we'll create a Docker image. So we'll have a Dockerfile in the code itself, and that will fetch all the required code, and we'll have a Docker image. The Docker image will be pushed to the ECR. And from that ECR, we'll connect. So let's say we have a Jenkins server. The Jenkins server will have access to create a user in, let's say, have access to the Kubernetes. So now, since the deployment part, as we are using Helm Chart, so we can manage it using Helm Chart. So there will be a hand deploy command is there. So we'll fetch the image and we'll update the image. I mean, we'll fetch the Helm Chart repository. So there's a different repository for Helm Chart. So we'll fetch that, and we'll do a checkout in the repository for that Helm Chart. And we will update the Docker image in the command itself in the Helm command. Right? So this will be a Helm upgrade command. Helm upgrade with install, then we'll pass the parameter. Let's say, the Docker images there. So we'll update directly from the command itself. We'll update the Docker image with the tag. And so how it will do a rolling deployment. Once, let's say you have, 10 or you can say you can have four pods running. Yeah. So, from the Kubernetes side, it will create a port. It will create a service, and service will be let's say, it will create a deployment. It will create a port. It will create a service. Service will create an ingress, the ingress load balancer will be there. Let's say, plus ingress controller using that will create a load balancer. The ingress controller will have an entry with the, let's say, endpoint of the service, let's say, x y z dot com, and which will be mapped to the service. Right? So once we are doing the deployment, that service will not be touched. Only the deployment part will be touched in where we are updating the image. Right? That image is updated, so Helm will deploy it in a rolling update fashion. So in rolling update, what happens is, let's say, four pods are there. Let's say one goes down. Not one goes down. One pod will come up, and the other one will, once this has come up and this has passed all the health checks, let's say I have probes out there, startup probe and different probes I am using, readiness probe, startup probes. And once those probes are completed, then only the other one will go down, and this one will start having the traffic. Same with the other three as well. So this will go in the rolling update fashion. So this won't take much time, and the deployment will have zero downtime.

    So, I think this subprocess dot run is the issue. Try checking the works up. Accept the process. Docker build failed to raise Docker image latest. So subprocesses run, I think this comma-separated thing is not correct and which is causing the issue. I mean, the syntax is not correct where the command needs to be comma-separated. The current syntax should be okay.

    So we have model service replica 3, match labels. Labels are selector is there and corresponding label is also there. Spec is container. Name model image is also there. Container port is there. Okay. So container port is 80. So what we need is a service as well. And this is only the deployment part. Service also needs to be defined, but deployment even when ML model service is defined. What crucial detail is missing? So requests and limits are missing, if I want to see what else I can add in this deployment. So requests and limits are missing. We can have probes as well. And for this deployment, we'll also require a service to expose the ML model and, requests and resources. And the service will also have on what will be the node port and container port. And in here, I think resources, requests and limits are missing, probes are missing. We can add mount points as well. Volume mount is missing.

    Now would you design a system to auto scale containerized machine learning or cloud in a hybrid cloud setup? Okay. So to auto scale machine learning workloads, let's say the workloads are running in a Kubernetes hybrid cloud setup. Okay. So one thing is we have Kubernetes running. Right? So all our workloads are running on Kubernetes. So to auto scale, you can say there are two types of auto scaling in a Kubernetes environment. One is node auto scaling, another one is port auto scaling. So port auto scaling, we can use tools like carpenter and KEDA. So, I think for machine learning perspective, we can make use of KEDA. So what KEDA will do is it will have a lot of scalars. So they will scale on the basis of certain parameters. Let's say CPU memory is the very basic, and we can do on the basis of request. We can do on the basis of some other parameters, like the number of requests made to a particular API or these kinds of things we can do because it can have different scalars for New Relic, Prometheus, and AWS metrics as well. So these kinds of scaling, we can utilize. So for port scaling, KEDA will be beneficial. And for example, in AWS, I have defined scaling on the number of requests on the load balancer. Let's say, the load balancer of that machine learning part, on which our target group is there. So let's say if the load increases from 100 requests to 200 requests, my machine should get scaled. So that way, I can do the scaling on the request basis. And based on the request, the board will automatically get scaled up. Okay. And other than that, I can have carpenter for node auto scaling. I can use carpenter and create different templates in that, and that will have, let's say, machine learning kind of machines we are using. Right? Like G5 for that, which will require for our machine learning application. I'll create a template for that. So whenever there is a need for a template, and our board port is increased, sending ports are increased, and there's a requirement for an extra node. So carpenter will automatically spin up that node, and that rescaling will happen. And I think this will work similarly in a hybrid setup as well, where you are managing your cluster control plane from your own setup or in your data centers, and your worker nodes are there in your AWS. But, this will be like managing your own cluster. If you're managing using EKS, then it makes things easier. But you can utilize these tools to scale based on your requirements.

    I have experience with setting up distributed demo inferences on platforms like Kubernetes based on our Kubernetes-based solution. So, I have worked on SageMaker for setting up ML models on SageMaker. I will go through the whole process of setting up that. So in SageMaker, we have a data source. Let's say RDS or Redshift is there. S3 is there. We have a data source, and we have a SageMaker Studio. It's a SageMaker Notebooks environment out there. And we create a code artifact. We store the code artifact in GitHub, and we manage the environment using ECR. We then store the model artifact in S3. And then we have a SageMaker pipeline that helps in preprocessing of the jobs, and we run AWS Lambda and train the job using EMR. And the whole AWS pipeline we were using. And then also we had an event bridge to trigger it. Once the model artifact is there in S3, and we have SageMaker pipeline processes everything, the event bridge will trigger. So then we will have a model registry. So, we will have a model registry, and the model registry will have a model. Lambda will be there. So, Lambda will use the model registry, then we will do our deployment. For deployment, we will have an API gateway, and the API gateway will talk to Lambda, and Lambda will be there. Lambda will talk to the model registry. And we have SageMaker Pipelines. So, the whole setup will be S3, then we will have SageMaker Studio. And in that Studio, we'll generate some code artifacts from GitHub, then ECR environment to be managed to ECR, and then the model artifact will be stored in S3. Then we will have a SageMaker pipeline for processing of the jobs, and then we'll have a SageMaker model registry that will move to Lambda. Then from Lambda, we will have endpoints. Then endpoints, API Gateway, Lambda endpoints, and the model registry. So, I have also done it on Kubernetes as well. So, Kubernetes is simple, like, how we use a service. I have used multiple ports, you can say multi-port, multi-deployment structure is there. So, multiple ports are running that are doing different functions, in a particular container. I mean, multiple containers are there which are running, and then a service is there.

    There are many methodologies for GDPR and SOC 2. For SOC 2 and GDPR, we have a separate account where we provide read access. Developer access is not provided, but data access for very important data, such as database access or S3 access where customer data is present, is not accessed by anyone. Only the administrator has access to some data, and a few people, such as DBAs, have administrator passwords and manage that. The rest of the databases are managed through services. We first go to I'm so that is managed through G Suite. You will have your email address there, and based on that, you can provide user access. You will have different access based on different teams, such as developers, DevOps, and other groups. Let's say the platform team or other teams. Whichever team requires particular access, those teams only will have access to that resource. Let's say they want to access their Kubernetes cluster. They want to check the port logs and port metrics. They will have read access to the port and logs, but not to environment variables or secrets. Secrets can be managed through a secret manager, and your groups will be there. You will be managing through groups, and you need to maintain logging as well. You need to have six months of all the data, or you can say CloudFront CloudTrail will manage only three months. Those data should be there so that you can have auditing as well. You need to do auditing. Regular auditing should be there, and users should have MFA enabled. In terms of applications, most of your applications should be private databases, private endpoints should be kept in private subnets. Only your load balancers should be public, and all your traffic should come through a single point, such as a load balancer or from a CDN. Only that traffic should be allowed. If you have a public S3 bucket, only traffic from the CDN or from the load balancer should be allowed if you're accessing it through a third party. You should only allow third-party IPs for that S3 bucket. Your services should have different user-created accounts. You should not use personal users for accessing databases. And your EC2 instances should not deal with public data or personal data; they should be kept private. And whichever services are there, you should have a way to integrate tools like SonarQube so that you can check vulnerabilities in the code and check if any in your Docker image, you should also check vulnerabilities.