Vetted Talent

SURAJ KUMAR DAS

Vetted Talent

To leverage my extensive experience in DevOps and Cloud Infrastructure, aiming to contribute to an organization's innovation and efficiency through expert orchestration of AWS, Kubernetes, and CI/CD pipelines, while growing into more strategic roles and d

Role
DevOps Consultant / Architect
Years of Experience
13.5 years

Skillsets

Splunk
cloud transformation
NO SQL
DevOps
CI/CD - 9 Years
AWS Cloud - 8 Years
Security - 5 Years
IAC - 6 Years
AWS Services - 8 Years
Apica
NO SQL
Cloudformation
AWS - 8 Years
Grafana
Docker - 6 Years
EKS
Terraform - 5 Years
Ansible
Kubernetes - 03 Years
Git
Jenkins
SQL
Python - 08 Years

Vetted For

8Skills

Roles & Skills
Results
Details

AWS Solutions Architect(Remote)AI Screening
66%

Skills assessed :.NET, CI/CD, AWS Services, IAC, Networking, Docker, Kubernetes, Security
Score: 59/90

Professional Summary

13.5Years

Nov, 2024 - Present1 yr 5 months
Senior Technical Lead
Coforge
Dec, 2022 - Aug, 20241 yr 8 months
Senior Associate
JPMorganChase
Oct, 2020 - Dec, 20222 yr 2 months
Assistant Consultant
Tata Consultancy Services
May, 2017 - Feb, 2018 9 months
Development Support Professional
Kofax/Hyland
Apr, 2018 - Aug, 20191 yr 4 months
Senior Software Development Engineer
Euclid Innovations
Aug, 2019 - Jul, 2020 11 months
Senior Implementation Engineer
Thoughtworks
Jan, 2015 - May, 20172 yr 4 months
Associate Consultant
Virtusa (Polaris Consulting & Services Limited)
Jan, 2012 - Dec, 20142 yr 11 months
Software Developer
Tech Mahindra

Applications & Tools Known

AWS
Jenkins
Ansible
Python
Bash
Git
Docker
ECS
ECR
Kubernetes
EKS
SQL
NO SQL
Linux
Splunk
Grafana
Terraform
CloudFormation
AWS
SQL
NoSQL
Linux
DataDog
Terraform

Work History

13.5Years

Senior Technical Lead

Coforge

Nov, 2024 - Present1 yr 5 months

Senior Associate

JPMorganChase

Dec, 2022 - Aug, 20241 yr 8 months

Managed EC2 Instance Refresh for Unbound (Onyx): Directed application refresh activities, including the replacement of EC2 instances, ensuring seamless operation and system stability for Client Unbound (Onyx). Automated SSL Renewals & Workflows (Python): Developed Python scripts to automate SSL certificate renewals and streamline team workflows through proof-of-concept automation, enhancing security, compliance, and efficiency. Resolved Customer Data Flow Issues: Addressed critical customer issues related to data flow and errors, providing high-level escalation support to maintain operational continuity. Streamlined Operations & Client Solutions: Developed and maintained efficient Ansible playbooks that streamline complex operational workflows, significantly improve deployment efficiency and system reliability, while also tailoring and maintaining Ansible jobs to meet specific client requirements, optimizing automation processes and ensuring client alignment. Provided On-Call Support: Delivered on-call support, ensuring prompt and effective resolution of issues outside regular business hours, maintaining high levels of service availability. AWS EC2 Instance Repaving: Directed the comprehensive repaving of AWS EC2 instances, crucially supporting the stability and performance of the Onyx blockchains mission-critical systems. Enhanced Security with Encryption: Instituted robust encryption protocols, safeguarding sensitive customer information and reinforcing the security framework against potential threats. Enhanced Monitoring & Performance (Grafana & APICA): Developed and maintained comprehensive Grafana dashboards for real-time system health insights and APICA checks to ensure application dependability and consistent performance, facilitating immediate response to potential issues, safeguarding operational integrity, and guaranteeing uninterrupted service delivery. Developed RCA Documents: Created detailed RCA documents that provided actionable insights, driving process enhancements and preventing future issues, contributing to ongoing operational excellence. Debugged APIs for Reliability: Showcased advanced skills in diagnosing and resolving API issues, ensuring smooth and reliable API functionality that underpinned critical business applications.

Assistant Consultant

Tata Consultancy Services

Oct, 2020 - Dec, 20222 yr 2 months

Designed Multi-environment Infrastructures: Designed, maintained, and scaled infrastructures across production, QA, and other environments, ensuring robust performance and scalability. Streamlined Deployments & Scalability (Kubernetes & CI/CD): Implemented Kubernetes and CI/CD tools to automate deployments, accelerate application delivery, and enhance microservice scalability and system reliability through optimized resource allocation and performance. Automated AWS Operations for Ericsson: Developed automation scripts using Jenkins to streamline the management of EC2 instances, S3 operations, load balancers, and database installations, significantly improving operational efficiency for Ericsson. Led Automated Builds and Deployments: Spearheaded initiatives to automate builds, deployments, and validations for client servers, enhancing deployment speed and reliability. Enhanced Security and Automation for Ericsson: Improved company codes and automated security systems to mitigate risks for Ericsson, ensuring robust protection of critical systems and data. Optimized Cloud Infrastructure with AWS and Jenkins: Played a key role in optimizing and monitoring AWS and Jenkins cloud infrastructure, enhancing operational efficiency and ensuring high availability of services. Facilitated Knowledge Transfer: Led knowledge transfer sessions for new team members, ensuring smooth onboarding. Managed Escalation for Job Failures: Acted as the primary point of contact for managing escalations related to job failures, swiftly resolving issues to maintain uninterrupted service delivery. Contributed to Agile Development: Played an integral role in an agile development environment, collaborating closely with cross-functional teams to deliver high-quality solutions and meet sprint goals.

Senior Implementation Engineer

Thoughtworks

Aug, 2019 - Jul, 2020 11 months

Led COVID-19 Hospital System POC for Odisha Govt: Spearheaded Proof of Concept (POC) initiatives for COVID-19 hospital system in collaboration with the Odisha Government, using technology to support public health initiatives. Orchestrated AWS, Jenkins, and Python CI/CD Pipelines: Led the development team in orchestrating automated CI/CD pipelines using AWS services, Jenkins & Python, ensuring streamlined software delivery and enhanced efficiency. AWS Infrastructure: Managed and optimized AWS infrastructure, including EC2, S3, and RDS, by implementing efficient backups, patches, and scaling strategies, resulting in a monthly cost savings of $3,000. Architected Bahmni Implementations: Designed scalable architectures for Bahmni implementations in hospital management. Containerized Monolithic App with Docker: Transformed a monolithic application into microservices architecture using Docker, significantly improving scalability and operational speed for Bahmni. Automated Build/Deployment with Jenkins: Reduced errors & accelerated workflows, enhanced development efficiency and deployment reliability. Custom Solutions for Bahmni: Developed tailored solutions to meet specific customer requirements for the Bahmni product. Managed AWS Deployments for Bahmni: Supported the Bahmni product by handling customer onboarding and AWS deployments, optimizing infrastructure for efficient operations. Optimized AWS Billing for PSI Zimbabwe: Implemented cost-saving measures and optimized AWS billing for PSI Zimbabwe.

Senior Software Development Engineer

Euclid Innovations

Apr, 2018 - Aug, 20191 yr 4 months

Development Support Professional

Kofax/Hyland

May, 2017 - Feb, 2018 9 months

Designed AWS architecture for migrations; guided client setup; developed Python solutions; optimized costs by removing unnecessary servers and databases.

Associate Consultant

Virtusa (Polaris Consulting & Services Limited)

Jan, 2015 - May, 20172 yr 4 months

P+ rating for outstanding service; Spot Excellency Award for resolving critical issues in the CITI Bank SMART II project.

Software Developer

Tech Mahindra

Jan, 2012 - Dec, 20142 yr 11 months

Boosted Oracle ELP-999 training performance by 90%; Star Performer and recognized for impactful contributions to training.

Achievements

Significantly improved the stability and reliability of a vendor's AWS environment by conducting a thorough audit and optimizing the use of EC2, S3, Route53, DynamoDB, and RDS services. Analyzed usage patterns, implemented targeted improvements, and established robust monitoring systems. Resultantly, minimized disruptions, enhanced performance and cost-efficiency, and ensured that project timelines were met.
Successfully implemented an automated certificate renewal process using a Python script and CKMS/vendor APIs, replacing manual renewals. The automation ensured timely updates, reduced manual effort, and maintained security compliance, significantly improving system reliability and uptime. Integrating the solution into the CI/CD pipelines streamlined operations and eliminated the risk of service disruptions due to expired certificates.
Significantly improved the stability and reliability of a vendor's AWS environment by conducting a thorough audit and optimizing the use of EC2, S3, Route53, DynamoDB, and RDS services.
Successfully implemented an automated certificate renewal process using a Python script and CKMS/vendor APIs.
Led migration of critical workloads to AWS, achieving significant infrastructure cost reductions.

Major Projects

3Projects

Migration of Critical Workloads to AWS

Led migration of critical workloads to AWS, achieving significant infrastructure cost reductions.

AWS Environment Optimization

Jan, 2023 - Dec, 2023 11 months

Significantly improved the stability and reliability of a vendor's AWS environment by conducting a thorough audit and optimizing the use of EC2, S3, Route53, DynamoDB, and RDS services. Analyzed usage patterns, implemented targeted improvements, and established robust monitoring systems. Resultantly, minimized disruptions, enhanced performance and cost-efficiency, and ensured that project timelines were met.

Automated Certificate Renewal Process

Jan, 2022 - Dec, 2022 11 months

Successfully implemented an automated certificate renewal process using a Python script and CKMS/vendor APIs, replacing manual renewals. The automation ensured timely updates, reduced manual effort, and maintained security compliance, significantly improving system reliability and uptime. Integrating the solution into the CI/CD pipelines streamlined operations and eliminated the risk of service disruptions due to expired certificates.

Education

B. Tech - Computer Science and Engineering
JITM, Biju Patnaik University of Technology, Odisha (2011)

Certifications

Cka: certified kubernetes administrator | the linux foundation | jun 2024
Aws certified developer - associate | amazon web services (aws) | apr 2024
Aws certified solutions architect - associate | amazon web services (aws) | feb 2023
Aws certified solutions architect - associate | aws | feb 2023
Aws certified developer - associate | aws | apr 2024

Interests

Cricket

Badminton

AI-interview Questions & Answers

Hello. I in total, I have 13 years of experience into IT, and, I have been working in AWS and, infrastructure record ISE into last 8 years. So, initially, I was into Linux environment and working for the client like Tech Mahindra and Polaris. After that, I moved to the COFAX and, you know, Thoughtworks and, maintaining the infrastructures and implementing the solutions for the customers who are, utilizing the product for them, utilizing the product, and, we are providing as, the service to the customers. Whenever any like, any new instances, new VM, new servers required, we manage the infrastructure and then provide the details to them. And, anything like any application, CSUs, or, any databases in RDS or DynamoDB or any accessing related to the applications which is hosted in the AWS. We are the first point of contact, and we're, working on resolving those issues. And besides that, any decom request come into the picture, like any, old servers or any project getting decom. We are cleaning up systems and, everything is happening in the agile development process. We, work on the sprint wise. And besides that, we do, write, new Terraform scripts, as well as cloudformers and templates, in order to automate the infrastructures, whatever the client is requiring. So in one of the in my last organizations, I do, write one info, Terraform scripts for the ONIX client, which is a major client in terms of Bitcoin, mining and everything, all that. And, we deploy the instances, through JUULES. JUULES is nothing but the Jenkins actually. So yeah.

So in AWS, when data is in transit, we can use SSLTLS, and the certification 5509, in order to, encrypt the data in transit. And the data enriched, there are 2 way, whether it's a customer managed or the AWS managed. So the customer managed, the customer will fully have control on the, encryption and decryption and only use the AWS services for data storage and everything. And, when it comes to the KMS, it's managed by the AWS and AWS will take care of the auto rotation of the, game SKUs every once in a while. But we can configure that. So, at rest, for example, in s 3, if we are storing the data Hello? We can enable. K. By default, it is the s 3 encryption is enabled, but we can enable by using, KMS, which is be managed by the, AWS. So and another thing, SSCC, where, the customer has to be request the customer will, manage the encryption part and, AWS has nothing to do with it.

BPC pairing is nothing but, It's a communication between 2 VPCs. Suppose VPC a and VPC b, we are we can establish connection among these 2 VPC by doing the VPC pairing. So, and, the securities, can be, like, there are 2 way we can provide securities. Like, one is using security groups, which will only which is stateful and, which will only allow traffic to it. And in top of that, in order to provide additional security, we can provide security, to in the subnet level by using Netl network access list. So that is stateless. So allow and deny, everything is different on us. Whatever traffic we want as an inbound and output, everything can be defined, check that. So in VPC peering, communication between 2 different VPCs can be, done by using some providing security groups and, NACL level, by providing NACL in the subnet level.

So in code pipeline, we can use code deploy. The code deploy is, like, where we can have these 2 instances, deploy to it. And, the there are, number of way we can have, without any having downtime. If we are, if we are using, like, For example, the deployment type like Kennedy deployment or, in order to, roll rolling in batches or rolling. So in that case, we, you know, we won't have any deploy, deployment downtimes. So when we have that, like, for example, the deployment, in that case, we can have a another issue to instance where we can redirect our traffic to a part of the traffic can be redirected to that. And once everything is sorted out, the full traffic can be redirected to the newly created instances. And the older one can be, terminated. And another one is, rolling ways. So what happens in rolling in means another set of instances can be created, and once that is, have the latest code and everything, we can, terminate the existing one and all the traffic can be done. All at once, if we implement, then they will there will be have some downtime. So we won't be doing that.

So what does CloudTrail does is it stores information related to the, user accessing the APIs and, it is it is best sure for, figuring out what user has access for the API and, for trailing the information, is there any, person who is trying to access who's who is not supposed to be. So, that thing can be done using the Clyde CloudTrail. And in the AWS config, so right now I don't recall what AWS Config does.

The version control, can be, created in the git or Bitbucket, anywhere we want it use. And the usable modules means we don't need to, suppose one resource is being created, like, EC 2 instance by using CloudFormation Terraform. So that resource, we don't need to, rewrite it every time we wanted to have one EC 2 instance. In that case, we can have 1 module 1 small module for that, which what it does is it gets the right set of it starts with the right permission, and that can be reused. That's where the reusable module is. And, same thing, in cloud formation.

But, in any ISE, the database, especially when stateful set should not be part of an ISE. It should be handled, externally, and the endpoint, the connection endpoint should be provided to the ISE, regardless of the cloud formation or the, Terraform. And the username, password, it should not be hardcoded, in the template itself. there is a concept called parameter where while providing you the template and the user, the username and password can be provided as a parameter. So if that is the best case, then the password also can be stored in the secret manager, and it can be accessed through the, secret manager or SSM parameter. Anywhere we can store it, and, we can while showing it, we make sure that, it is KMS encrypted and, we can fix that. So username password should not be hardcoded in this case.

Effect allow action s 3. Resource here in my bucket. Condition string not report. My Slush. Well, the resource section is here. given my bucket, wildcard. It means everything which is inside the my bucket should be accessible. And then again in the condition section, if the string is not, home slash AWS username. So AWS colon username is, given in specific to the username who is trying to access so that he can only access what he's storing. He cannot see what other steps to those details. So effect is a low. This was condition string not equal to. So it should not be home, but, it should be instead, my bucket then AWS username. Or in the resource section, we can have, home slash.

Well, the monolithic applications, the approach would be like, first, we'll have, the connectivity, like, whether we wanted to use SSO login or SAML 2 dot o. And, we need to figure that out first. And once we have that, the monolithic applications, first of all, it requires some measures and just like, we have to have the microservices instead of, you know, putting everything in a single issue to instances and having a numerous issue to instances, coming to the load balancer and, AZ coming to the picture. So what we can do is we can utilize, Kubernetes, Docker containerization. All that stuff can be used, which will be definitely required a major development changes. So if we want to directly use the monolithic application, what we can do is we can, have the, suppose let's assume for the static content if that application is processing. So what we can use is we can redirect the user to the CloudFront. And from that, directly, we can send the request to the s 3. If not, then the API gateway, then the load balancer and, underneath all the EC two instances. And each EC two instances will have, an individual a stand alone component. And so yeah. So in order to, have a microservice service approaches, we need to, do the development changes and all individual component can be deployed into a port and port means in EKS. So yeah.

Well, when I'm unsure about how many issue 2 instances will be created, like, in ACS, that the task is task definition based on the task definition, the ec two instances are getting created. Suppose if we don't want to manage the underneath resources and, we don't know the load, how much how many containers will be getting created. In that case, it's better to give it to the AWS now. AWS will take care of, all the under net resources. we don't need to worry about the how many issue two instances we want, how many, resources we want for our purpose. So AWS will take care of the auto scaling and, auto scaling part, like, underground resources. If it required more, then the forget will have more and, based on the request is not that much, then again, it will scale down. So everything will be happening in the back end. We won't be managing the resources. The AWS will take care of it.

Well, Lambda, we can send the, Lambda log into the CloudWatch, and the CloudWatch is the best part to see, the log informations, unless if we are, the there is another way, like, we can use Kinesis Data Stream to, by using, fire host to send the data to the 3rd party, observability tools like Splunk, Datadog, and all that stuff. That is one way to figure it out. And we need to see, what lambda function, is whether it's a synchronous or an asynchronous. If synchronous, we know right away that, we are not getting 200 or, the success message. In asynchronous, what it happens like if, something is sending to the s three bucket and we don't wait for the response. We do get a response right away, but underneath it still, works, putting data into the s three or SQS sending queue. So whenever any, issue happens, it, sends the data to the dead letter queue and process subsequently with exponential backup. So, if those things are still not working, everything is failed, we need to see, the CloudWatch and CloudWatch logs. And, we need to, do a testing on the Lambda, like, we can have another lambda function and try to do a test on if, everything working fine or not. So, the error loops lambda usually used for the short interval of time. We don't, write a big complex program into that. So most of the, complaint which is which can be handled within 15 minute, those things would be, mentioned within those things should be handled by using Lambda, and test everything can be in a linear application itself. So yeah.