
Senior Cloud DevOps engineer with 6+ years experience in Cloud (Azure, AWS), DevOps, Configuration management, Infrastructure automation, Continuous Integration and Delivery (CI/CD). Can implement effective strategies for application development in both Cloud and On-premises environments. Experience in dealing with Unix/Linux and Windows server administration. Expertise in Architecting and Implementing Azure Service Offering, such as Azure cloud services, Azure storage, IIS, Azure Active Directory (AD), Azure Resource Manager (ARM), Azure Synapse and analytics, Azure Storage, Azure, Blob Storage, Azure VMs, SQL Database, Azure Functions, Azure Service Fabric, Azure Monitor, and Azure Service Bus. Hands on experience on Backup and restore Azure services and in Design and configure Azure Virtual Networks (VNets), subnets, Azure network settings, DHCP address blocks, DNS settings, security policies and routing. Azure cloud services, Blob storage, Active directory, Azure Service Bus, Cosmos DB.
Cloud DevOps Engineer
Anlage Infotech (India) P LtdSenior Infrastructure Associate
Publicis SapientDevOps Engineer
XT Global AzureSenior Process Associate
CognizantCloud/DevOps Engineer
AFF Soft IT Solutions Pvt limited
Azure Cloud Services

Azure Active Directory

Azure Resource Manager

Azure Synapse Analytics

Azure DevOps

Git

Terraform

Kubernetes

OpenShift
.png)
Docker

Prometheus
.jpg)
Grafana

Ansible Tower

Azure Data Factory

Azure Databricks

Azure Stream Analytics

Azure Logic Apps

Azure Key Vault
Hi, team, this is Ravi. I'm having like total seven years of experience. During the start of my career, I've started as an infrastructure associate. And later on, I moved into Azure DevOps. So when I'm working as an infrastructure associate, my roles and responsibilities are giving access to the team members for the Azure resources which I've created, and also giving access to the team members where I can give access to them for the Active Directory and even for the other resources. So coming to the Azure resources. So after me working as an infrastructure associate, I totally moved on to Azure DevOps, where my total experience in Azure DevOps is six years. So like within six years, I got a good exposure into the CICD pipelines, Kubernetes, Terraform, Ansible. Even I got a chance to work on other monitoring tools like New Relic, Prometheus, Grafana. And even I've integrated it with Azure pipelines with the help of creating the service principles. So even I do have a good exposure in AWS environment. In AWS environment, I almost worked on all the services. So it's like a multi-cloud platform where I worked with. I do have a good experience in AWS as well, where I've created the storage accounts, I've created the pipelines, and other services as well as per my requirements within the pipelines. So when talking about AWS, I worked on EC2 services, storage accounts, database creations and all. Even I do have a good exposure in Jenkins pipelines. So within the Jenkins pipelines, I've created the pipelines with different jobs so that I've deployed that Jenkins pipelines within the other environments as well. Even I do have a good exposure in the pipeline models where I've created the CI process, I've created the CD process. So within the CICD process, I do have a good understanding of building the pipelines and deploying into the AKS cluster, as well as even I have a good exposure on deploying the apps into the app services in AWS, as well as deploying as a web service in AWS. So this is a brief about my expertise in the AWS. So coming to my overall exposure, even I do have a good exposure in Terraform where in Terraform I've created, I wrote the Terraform code for both AWS and Azure environments. That's it for me.
So on designer morning. So, basically, to design a monitoring solution for the CloudWatch, uh, and the Datadog that, uh, like, which flags a deviation in performance benchmarks for AWS application. So first step is to set up a CloudWatch metrics. So I will use a CloudWatch metrics to collect the metrics from the AWS services such as easy two instances, Lambda functions, or, uh, RTS database, etcetera. So after that, I'll define the custom metrics if needed to monitor the specific aspects of the application performance. And later on, I would create a cloud virtual items. Like, I will configure the cloud virtual items, like, to trigger when performance metrics is deviated from the defined thresholds. Like, for example, we can set our alarms for CPU CPU utilization, memory utilization, and latency. There are any, uh, even for the error rates also we can. So after that, I will integrate it with the Datadog. I'll connect the Datadog with with my AWS account firstly so, uh, to collect the additional metrics, um, and the logs also, uh, like, which are beyond the CloudWatch provide which is beyond the CloudWatch provide. So this integration allow us to have a more comp comprehensive view of the application performances and all. So after that, I will define the custom, uh, boards. I will create a board in data logs to, like, to visualize the metrics from the from both CloudWatch and the Datadog. Uh, so by customizing this, uh, after that, I'll customize these dashboards, um, uh, to display the relevant performance benchmarks as key indicator of the application and etcetera. Like, after that, I'll set up alerts in Datadog. So I'll configure alerts in Datadog to, like, to notify when performance is deviate when there is a deviation in the performance, it will be detected. So that for that, I will be creating an alert. After even, um, so even I'll implement, uh, anomaly detections. Like, I will you utilize Datadox anomaly detection feature to automatically identify unusual patterns, etcetera. After that, even I'll implement, uh, like, remediation actions. I will define the automated remediation actions that can be triggered, like, when performing deviations are detected. For example, like, we can automatically scale up the resources. We can automatically restart the instances, or we can automatically deploy the code changes to, uh, to address the performance issues, basically. So even so by combining this CloudWatch and, uh, Datadog, we can create a robust monitoring solution, I can say, that which provides, uh, like, real time visibility visibility into the performance of the AWS hosted applications. And it'll be easy, uh, and it'll help us quickly. That's it. Thank you.
Okay. So basically, when troubleshooting a Linux server with high, high load, like using CloudWatch and Datadog indications, like we can follow a few steps, like we need to first check the CloudWatch metrics, we should start examining the CloudWatch metrics for servers, CPU utilization, memory usage, disk input and output, we can check, we can also check the network traffic, we should look for any spikes, if there are any spikes or if there are any high values, we should check for those metrics basically. And after that, we need to review the Datadog dashboards, like we need to access that Datadog dashboards to get like more detailed view of the server's performance, like metrics, even we can include some custom metrics to my application or the environment. We also look, we also will look at the correlations between the metrics and to identify the anomalies, etc. And even like we should identify the resource bottlenecks. Basically, we need to analyze the CloudWatch and Datadog metrics to like a pinpoint, we need to analyze the CPU bound process, if there are any memory leaks, if there is any disk saturation or network congestion, we need to identify those. And even we need to check the system blocks, like we need to review the system blocks for any error messages or if we have any warnings or any other relevant information we do have. We can also, we should also use a Datadog tracing, like we need to utilize like Datadog distributed tracing capabilities, we need to check the API calls, database queries or the external service dependencies, we need to check those. We need to check the process activity, like we can use tools like Top, Htop or Atop to monitor the process activity in the real time, so that we can see the CPU memory utilizations. And even we should also use, we can also look at the analyze disk usage, we need to examine the disk usage patterns, we can use tools like DEF, IOstat, through that we can see like long queue, if there are any long queue lengths, we can use that basically. Even we can optimize the configurations, we need to review the server configuration settings, like if it is like for example, kernel parameters, we need to look at network settings, we need to look at, even like we can implement the remediation actions, so based on the like analysis what we did, like we have to take appropriate remediation actions to address the underlying cause, which is causing an error, is it because of high load or anything, like it may involve scaling up of the resources and all. After that we need to even monitor the impact as well, so these are the process.
Method to enable tracing for con enable tracing for configuration Uh, so, uh, for enabling the tracing of configurations applied use like, if it is applied through Terraform, uh, like, uh, if we can involve set, um, like, we can involve setting up, like, a systematic approach to track these changes, I can see. So first is we can, um, there are some methods to achieve this. 1st is, like, we can use a version control with Git. We can, uh, we can utilize a version control with Git to manage the Terraform configurations. So we can ask each developer should work on their feature branches and submit pull request to review, like, before merging the changes, like, into the main branch. And, um, we can also may not we need to give a commit message for sure. Uh, we need to enforce a descriptive commit message, uh, so that it will clearly explain the purpose of each change, uh, which includes, uh, details such as the which resources are being modified, what is the impact of the changes, what we did. And, also, we need to use a Terraform state management. We need to use a centralized Terraform state management using we can use for that, we can use Amazon s 3, or we can even use HashiCorp console. So this will ensure the consistency and allow us, like, for collaboration among, like, multiple developers. Even we need to use, like, a Terraform Cloud or Terraform Enterprise, like, for the state management, for collaborating features, uh, and all we need to use this. Also, we need to enable Terraform debugging. We need to set TF log environment variable to debug and capture the details and, uh, logs during Terraform operations. Like, basically, this all helps, uh, to diagnose the issue and trace the execution flow of configuration changes and all. So, uh, even we need to implement the infrastructure change management. We need to implement a change management process to review and approve the infrastructure changes, like, before applying them to the production. The, uh, we need to review in the lower end environment. And after that, we also need to implement monitoring and alerting. Uh, we can use a monitoring and alerting tools like CloudWatch or Datadog to track the infrastructure changes and monitor all the impact, basically, um, the impact on the system performance, the impact on the availability. And we also set we also need to set up an alert, and we need to notify the stakeholders, um, if there is any unexpect unexpected behavior. We need to implement automated testing, and we also need to, uh, document, uh, the collaborations. Um, like, we need to maintain a comprehensive documentation for Terraform configurations, like, which includes architecture diagrams and all. So that's it from my end. Thank you.
So, basically, to fine tune your horizontal pod autoscaler for cost efficient staging, like, basic in Kubernetes, we can follow like, we need to use we know we can follow some procedures. Like, we need to target resource utilization so we can set up an appropriate target resource utilizations, like, uh, for the like, for horizontal power auto scaling. Like, based on we we need to analyze the application workload patterns. We need to maintain the resource utilization levels so that, uh, we can balance the performance, cost effectiveness. Also, we need to adjust the target thresholds as well in this case. So even we we need to scale in and scale out the policies, we need to define the conservative scale in, scale out policies look so that, uh, unnecessary scaling actions can be avoided, which, like, which will inflate the cost. We should, uh, consider factors such as, uh, like, um, average utilization over a long time rather than short term spikes. Uh, we need to, uh, like, scale in and scale out according to the policies. We need, uh, like, even, uh, utilization metric selection, we need to, uh, choose a relevant utilization metrics, like, for scaling decision, uh, for scaling decisions, like, which are aligned with cost optimization as well. For example, uh, like, uh, we need to prioritize scaling based on the CPU utilization rather than memory. Like, if CPU bound workloads are more like, as we know, CPU are more cost sensitive. We need to define a cost, uh, custom metrics and external factors, uh, and, also, we need to, uh, use a vertical pod autoscaler. We need to evaluate the use of vertical pod autoscaler in conjunction, like, with HP 8, like, to dynamically adjust the resources request and, uh, and also the limits for the individual pause ports based on the resource utilized, uh, like, usage pattern. And even, uh, we need to predict the scaling. We need to explore, uh, explore a bit more on the, uh, usage capacity, the spikes, and all. So by by that, we will be able to, like, uh, easily ensure ensure, um, that the scaling horizontal product scaling will be, like, uh, cost effect efficient. That's it. Thank you.
I plan to deploy AKKS. Using Jenkins and Terraform. Uh, so, basically, we need to configure the CICD pipeline to deploy in it. Like, first, the process all the steps, what we need to follow is we need to first install the Jenkins. We need to set up a Jenkins on server or the cloud instance. We need to install necessary plugins. Like, as we are using Terraform integration, like, we need to install Terraform plug in. Um, like, after that, we need to set up a version control. We need to create a Git repository, like, to store the application code and all the Terraform configuration. After that, we need to ensure, uh, like, that Jenkins has access to the repository. After that, we need to define the Jenkins pipeline. We need to create a Jenkins file in the repository, uh, uh, like, to define the CICD pipeline stages. After that, we need to define stages for building, testing, deploying the application. Also, we can use a Jenkins state shared libraries or the like, reusable pipeline code if if it is needed. After that, we need to install the Terraform on Jenkins server or or even we can use a Docker image in in my in my pipeline to install Terraform so that we need to ensure Terraform is available in the Jenkins environment path. Uh, so after that, we need to configure the back end. We need to define the Terraform back end configuration. We need to set up an authentication and access to the back end services. Uh, after that, we need to create, uh, um, like, we after that, we need to create scripts. I can say we need to script, uh, we need to create some scripts to define, uh, like, infrastructure as code for deploying the application. Also, we need to define the, uh, also, we need to define the CICD process. Like, we need to organize a Terraform code into modules so that it'll it'll be, like, uh, reusable. It is easy to maintain. And coming to the pipeline stages, we need to use a checkout stage to pull the latest code from the repository. After that, we need to use a build stage. We need to compile the application code and run, uh, any unit test cases. After that comes Terraform in it. We need to initialize the Terraform and configure the back end. After that comes the Terraform plan, we can, um, uh, like, uh, we need to generate an execution plan to preview the changes. After that, Terraform apply. So Terraform apply, uh, will create or update the infra existing infrastructure. Uh, so after that, we need to apply an application. After that comes the integration test. If there is any cleanup process we need to do, we need to, like, create that stage. After that comes a pipeline trigger, we need to configure the triggers in the pipeline for the credential management. Even we can use API tokens and SSH keys. After that, the testing and the validation to be done, and even we can, uh, create a monitoring and logging. We can implement monitoring and logging for CICD pipelines. So if for performance metrics. So, yeah, that's it from my end.
So in the Python code for AWS Lambda, uh, like, uh, the mistake which will prevent the function from, uh, it is import botters deflamda handler. Actually, there is a syntax error and a missing import statement, so we need to correct that. Uh, big like, we need there is an incorrect import statement. Uh, the code which was provided, the code attempts to import a module named Boto's, uh, which is likely a typo. The correct module for interacting with AWS, uh, in Python is boto 3. The import statement should be boto 3. So the syntax error, uh, is in the function definition line. The blacks backslashes are used, like, instead of parenthesis, like, uh, to define the function parameters. So this results in syntax error. Like, additionally, the backslashes are unnecessary, and, like, it can be removed, uh, like, to define the function parameters correctly. So this tool needs to be changed so that, um, the lambda function will correct correctly work, like so that it'll list the objects in the s three bucket, like, when this function is triggered.
So in the given Terraform snippet, like, uh, if I modify the instance type attribute and apply the changes, Terraform will detect the modification and plan, uh, to recreate the AWS c c two instance, like, with a new instance type, uh, which is specified. Uh, so the principle reflected here is immutable infrastructure. So, basically, immutable infrastructure is an approach for, like, where the infrastructure components such as server or virtual machines are these are never modified after they get created. So instead, uh, like, changes are made by replacing entire component with a new instant, uh, like, um, which incorporates, like, we can include the design changes as well. Like, when we modify the instance type or reviewed in Terraform, like, it mean that we are effectively changing the configuration of e c two instance. Like, here, uh, like, uh, when we change the instance type, what happens is, like, it'll Terraform will firstly detect the change in configuration. So after that, Terraform will create an execution plan that includes destroying destroying the existing easy two infrastructure. It'll create a new one with the change of configuration. So when we apply the changes, Terraform will initiate the execution part, which will include, like, the termination of existing instance. Uh, it will also provide a new instance, like, with some instance type and all. Uh, so, basically, when a new instance is successfully provisioned, so TerraForm with, uh, updates updated state, uh, like, to reflect the changes. So by following this immutable infrastructure principle, Terraform will ensure that infrastructure changes are predictable, like, they are cons consistent and all. So that is the main thing what what will happen here.
K. Uh, so, basically, to secure the manage, um, secrets and sensitive configuration mainly across AWS environment, We can follow some steps. Like, we can use AWS secret management, or we can use AWS system management parameter store. Uh, it'll store the sensitive configuration securely, and, um, it, um, it will also leverage some features such as we can also include encryption. We can include the rotation policies, and we can also change the access controls, and we can integrate that with our secret. Uh, we can even grant some least privileges. We can implement the IAM rules and policies to Jenkins. Um, so which are necessary, basically, to retrieve the secret. We we should follow the principle of least privilege to, like, uh, to restrict the access on a specific secret also. We need to configure AWS credentials for Jenkins. We need to create IAM rules, IAM users within a prop with some appropriate permissions. We need to configure AWS credentials in Jenkins even with IAM roles. Uh, we cannot, um, by using, like, access keys, AWS security token services, and all. Uh, so even we need to, uh, securely store the Jenkins credentials. Uh, even, uh, we need to incur integrate Jenkins with the AWS services. We need to use plug ins or integrations to enable Jenkins to interact with the AWS services and all. Uh, also, we need to encrypt secrets and, uh, transit and even at rest. We need to ensure that the secrets are encrypted, like, when in transit. We can also use TLS and s s SSL for the secure communications. We need to implement secret rotations as well. We need to implement audit monitoring, and, also, we need to con like, continuous, um, work on continuous improvement and compliance as well.
For optimizing log mani log management for microservices applications mainly using Terraform native tools. They can like, we can follow some methods. Like, we can use a centralized login with Amazon fraud watch logs, like, um, aggregate logs from all the microservices into, uh, it can be navigated to the Amazon CloudWatch, like, for centralized storage analysis. We can also use CloudWatch logs agent for AWS SDKs mainly to push, uh, logs from containers, lambda functions, and all. Even we have log group organization, uh, which will allow us to organize logs into some logical log groups, like, based on the microservice boundary environments. We also need to implement proper naming conventions and also and, um, like, tagging easily to so that it'll be easy to identify and filter the log groups. We also need to implement retention and storage management. We need to configure some retention policies in within the CloudWatch logs. Uh, so to retain the logs for any appropriate duration, like, based on the, uh, compliance requirements or also the operational needs. We can use a log group life cycle policies, uh, like, to automatically archive or, uh, to automatically delete or add the logs. Uh, so we need to have a real time monitoring alert. Uh, We need to set up a CloudWatch alarms. I like to monitor long metrics such as error rates, latency, and, um, some expect exceptions in the real time. We need to configure alerts, uh, like, uh, to trigger the notifications, like, via it it may be via Amazon SMS and all. Uh, we can also use a log analytics and insights. Uh, a, um, like, it have an enhanced log visualization. We have we should also implement some dashboards as well. Uh, we can even integrate it with AWS X-ray for distributing freezing. We can also implement additional monitoring solutions as well. Like, Datadog can be used. Splunk can be used. The LK Stack, Elastic, Stuarts, Search, Logstash, Logstash, Kibana, all this can be used. So that's it for me.
Elaborate on leveraging and civil chef for automated configuration. Development in this case. So we need to develop ansible playbooks, uh, that can define the desired state for the easy two instance configurations. Playbooks can include, uh, tasks for, uh, like, including the packages within con for configuring the services, managing users. Even we can set up a secure like, uh, setting up some security settings as well. Like, for dynamic inventory management, we can use Ansible dynamic inventory plugins for scripting, uh, to dynamic discover and, like, to manage within the AWS instances as well. Ansible can, uh, query AWS APIs to dynamically generate the inventory, like, based on the EC 2 instances or attributes, uh, like, uh, such as tags, regions, instance types, we can you we can have and other all the metadata. And even, uh, integration with the meta, uh, like, AWS modules can leverage, uh, like, we can automate in this way as well. We can create an item put in configuration management, Uh, like, so this ensures consistency, predictability in the configuration management as well. Even parameterization templating will help us to allow us for a flexible configuration in, uh, like, within a specific environment as well. Like, we can even even we can integrate with, uh, it with cloud in it so that we can execute the Ansible playbooks and shell scripts. We can automate the orchestration process as well. Uh, even the error handling and reporting, we need to implement the error handling mechanisms and logging mechanisms within the Ansible Playbooks. So to capture the errors or failures, if any, during the configuration task as well. These are some, yeah, which we can follow to, like, automate the configuration. So which are in the newly launched AWS CD, like, easy 2 instances.