profile-pic
Vetted Talent

Subodh Dubey

Vetted Talent

He has more than 6+ years of experience working on Python, AWS services (S3, EC2, AWS Lambda, Kinesis, DynamoDB, RDS, Cloud Formation, SQS, SNS, CloudWatch, IAM, API Gateways, AWS Config, etc.), deployment services, and security practices. He also possesses strong experience working on SFMC API, Salesforce Marketing Cloud, and DevOps Tools like Jenkins, Docker, Git, and Terraform.

He is a problem solver who can design algorithms to make use of technology to enhance human life.

  • Role

    Member of Technical Staff

  • Years of Experience

    8.25 years

Skillsets

  • Python - 6 Years
  • SQL - 6 Years
  • Algorithms
  • Bash
  • cloud architecture
  • Container orchestration
  • Data Engineering
  • Data Pipelines
  • Data Structures
  • ETL workflows
  • GraphQL
  • infrastructure as code
  • Legacy transformation
  • serverless computing
  • System Design
  • Unit Testing

Vetted For

9Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Python Cloud ETL Engineer (Remote)AI Screening
  • 73%
    icon-arrow-down
  • Skills assessed :SFMC, Streamlit, API, AWS, ETL, JavaScript, Python, React Js, SQL
  • Score: 66/90

Professional Summary

8.25Years
  • Mar, 2025 - Present1 yr 2 months

    Member of Technical Staff - System Design Engineer

    AMD
  • Jul, 2023 - Mar, 20251 yr 8 months

    Sr. Cloud Solutions Architect

    Cloudtech
  • Apr, 2022 - Jul, 20231 yr 3 months

    Cloud Consultant

    Amazon Web Services (AWS)
  • Oct, 2017 - May, 2018 7 months

    Software Developer Intern

    WiseL Keycard Technologies Pvt. Ltd.
  • Jun, 2018 - Sep, 2018 3 months

    Software Engineering Trainee

    VANTAGE SYSTECH PRIVATE LIMITED
  • Nov, 2018 - Apr, 20223 yr 5 months

    Senior Software Engineer

    Consultadd Inc.

Applications & Tools Known

  • icon-tool

    Git

  • icon-tool

    Python

  • icon-tool

    PostgreSQL

  • icon-tool

    AWS (Amazon Web Services)

Work History

8.25Years

Member of Technical Staff - System Design Engineer

AMD
Mar, 2025 - Present1 yr 2 months
    Designed and deployed a production-grade, highly available Kubernetes control plane using RKE2 across on-prem nodes, with Kube-VIP providing virtual IP failover and API HA. Installed Rancher in HA mode for centralized multi-cluster Kubernetes management, using a custom TLS certificate with DNS and private CA integration. Enabled full observability by deploying Rancher-native Monitoring (Prometheus + Grafana) and Logging (Loki + Promtail) stacks with persistent volumes via Longhorn. Designed secure ingress configuration and enforced HTTPS with proper certificate chaining and validation to prevent browser TLS warnings. Automated storage provisioning using Longhorn and implemented monitoring retention and log indexing with persistent storage for resilience. Documented architecture, implementation, and certificate management workflows for internal teams using Confluence and K9s-based tooling.

Sr. Cloud Solutions Architect

Cloudtech
Jul, 2023 - Mar, 20251 yr 8 months
    Designed and implemented a similarity search system for patient medical data to enhance diagnostic accuracy for a client in the healthcare sector. Integrated AWS HealthImaging to extract and manage metadata from medical images, enabling efficient retrieval of relevant information. Architected a semantic search solution using Amazon OpenSearch to perform similarity searches across patient data, improving diagnosis by identifying related cases based on medical records and imaging data. Designed a scalable, event-driven architecture utilizing AWS Lambda, SQS, SNS, and Step Functions for efficient event filtering and data validation across services. Implemented automated data loading and storage workflows using AWS S3, ensuring secure and cost-effective data management. Observability using Cloudwatch metrics & Dashboards. Designed and implemented serverless APIs through Lambda integrations and API Gateway (including Proxy integration). Architected scalable solutions for the quiz application, including WebSocket API Gateway for real-time leaderboard updates. Utilized SAM for efficient application deployment into AWS.

Cloud Consultant

Amazon Web Services (AWS)
Apr, 2022 - Jul, 20231 yr 3 months
    Developed serverless APIs using Lambda functions and API Gateway for asset management and sharing functionality. Implemented Step Functions to streamline and orchestrate Lambda workflows. Established S3 data lake and lifecycle policies for optimized asset storage and backup. Designed automated creation of multiple AWS accounts for Virtual Labs. Utilized AWS Organizations and Service Control Policies (SCPs) for robust access management. Developed a cost monitoring solution for individual users, integrated with SNS notifications.

Senior Software Engineer

Consultadd Inc.
Nov, 2018 - Apr, 20223 yr 5 months
    Architected data migration pipeline (Snowflake to 3rd-party API) using AWS Lambda, API Gateway, and Step Functions. Led design approval and documentation collaboration with VPs and Managers. Deployed Lambda functions utilizing Docker images. Employed Terraform for architecture deployment (Dev, NonProd, Prod). Conducted unit testing with the Python unit-test package. Utilized DynamoDB (data backup) and S3 (Snowflake snapshots). Designed programmatic workflows with Airflow DAGS for API authentication, calls, ETL, data storage, and validation (BigQuery). Employed Pandas for JSON, SQL, CSV data manipulation, normalization, and transformation.

Software Engineering Trainee

VANTAGE SYSTECH PRIVATE LIMITED
Jun, 2018 - Sep, 2018 3 months

Software Developer Intern

WiseL Keycard Technologies Pvt. Ltd.
Oct, 2017 - May, 2018 7 months
    Developed PyQt-based desktop application for face detection and recognition. Implemented advanced face detection and recognition algorithms (OpenCV, Dlib). Created Python-based RESTful APIs using Flask for seamless integration. Utilized MongoDB for efficient data and image storage. Collaborated on ER diagram design and desktop application UI/UX.

Testimonial

ConsultAdd

ConsultAdd

https://www.linkedin.com/in/subodh-dubey/details/recommendations/

Major Projects

5Projects

High Availability RKE2 Kubernetes Platform with Full Observability

Mar, 2025 - Present1 yr 2 months
    Designed and deployed a production-grade, highly available Kubernetes control plane using RKE2 across on-prem nodes, with Kube-VIP providing virtual IP failover and API HA. Installed Rancher in HA mode for centralized multi-cluster Kubernetes management, using a custom TLS certificate with DNS and private CA integration. Enabled full observability by deploying Rancher-native Monitoring (Prometheus + Grafana) and Logging (Loki + Promtail) stacks with persistent volumes via Longhorn. Designed secure ingress configuration and enforced HTTPS with proper certificate chaining and validation to prevent browser TLS warnings. Automated storage provisioning using Longhorn and implemented monitoring retention and log indexing with persistent storage for resilience. Documented architecture, implementation, and certificate management workflows for internal teams using Confluence and K9s-based tooling.

Similar Patient Cases Search using Amazon Opensearch

Jul, 2023 - Mar, 20251 yr 8 months
    Designed and implemented a similarity search system for patient medical data to enhance diagnostic accuracy for a client in the healthcare sector. Integrated AWS HealthImaging to extract and manage metadata from medical images, enabling efficient retrieval of relevant information. Architected a semantic search solution using Amazon OpenSearch to perform similarity searches across patient data, improving diagnosis by identifying related cases based on medical records and imaging data.

Serverless Control Plane

Jul, 2023 - Mar, 20251 yr 8 months
    Designed a scalable, event-driven architecture utilizing AWS Lambda, SQS, SNS, and Step Functions for efficient event filtering and data validation across services. Implemented automated data loading and storage workflows using AWS S3, ensuring secure and cost-effective data management. Observability using CloudWatch metrics & Dashboards. Established robust data validation mechanisms to maintain data integrity during transit between services, preventing invalid data from entering downstream systems. Managed infrastructure deployment with Terraform, enabling reproducible and scalable Infrastructure as Code (IaC) practices, optimizing resource management.

Serverless Web Application

Jul, 2023 - Mar, 20251 yr 8 months
    Designed and implemented serverless APIs through Lambda integrations and API Gateway (including Proxy integration). Architected scalable solutions for the quiz application, including WebSocket API Gateway for real-time leaderboard updates. Utilized SAM for efficient application deployment into AWS.

Apache Solr to OpenSearch Migration Tool

Jul, 2023 - Mar, 20251 yr 8 months
    Architected and built a migration tool for seamless schema and data transition from Apache Solr to AWS OpenSearch. Facilitated XML to JSON schema conversion and Solr to OpenSearch mapping.

Education

  • B.Tech in Information Technology

    G.H. Raisoni College of Engineering, Nagpur

Certifications

  • AWS

    Amazon Web Services Training and Certification (Sep, 2023)

    Credential URL : Click here to view

Interests

  • Cricket
  • Singing
  • AI-interview Questions & Answers

    I have around 5 plus years of experience working as a Python cloud engineer. During this time, I have majorly worked with Python frameworks such as Flask, Django, and databases such as SQL, NoSQL, both. In SQL, I have mostly worked with Oracle SQL and MySQL. Coming to NoSQL, I have worked with DynamoDB mostly and MongoDB as well. I have experience with both the clouds, that is AWS and GCP. In AWS, I have acquired 5 AWS certifications, which include the solutions architect professional, solutions architect associate, developer associate, security specialty, and database specialty as well. Coming to containerization, I have worked a lot with Docker and Docker Compose. In AWS, I have worked with ECS and ECR for storing containers. Coming to DevOps tools, I have worked a lot with Git, pipelines, and CICD tools like Jenkins, GitHub Actions, and GitHub CI as well. So I have experience of developing an application from scratch till its deployment to production moment. And recently, I have been working mostly with design in AWS. I have gained a lot of experience in cloud-agnostic solutions in AWS using a serverless stack. So I also have worked a lot with CDK, CloudFormation, and TerraForm for the deployment of the whole infrastructure to the cloud environment.

    To ensure atomic transactions in a Python script executing SQL operations, we need to see that our script is exactly run once. It should not be duplicated, or else it may impact the operations in the background. So for that, we can have some conditional checks before we execute things. So we can ensure this atomic transactions by leveraging the capabilities provided by the database management system and Python libraries used to interact with them, such as SQLAlchemy, which is an ORM that can be used for SQLite, database or SQL server as well, MySQL as well. One approach to use this concept in database transactions allows you to group multiple SQL operations into a single unit and then perform it in batch. So this can help you to achieve atomic transactions. So we can write some statements like begin transaction and then commit. And in between those, we can write our SQL statements. So this is how we can ensure how to make transactions in a Python script.

    To use Python to automate the deployment of AWS infrastructures for ETL purpose. So, there are multiple ways to achieve this. One better way to do this would be to use CDK directly because CDK, in the background, creates CloudFormation templates. So, the process would be like writing CDK files, which would act as infrastructure as a code for us. And let's say if I want to develop a serverless stack, including creating a Lambda function. So, what I do is import that Lambda function in a Python file. I'll just say, "Create a Lambda function with Python as a language and 1.25 MB of memory and a time limit of 20 seconds." So, I can write these things inside a Python file, and then write all the infrastructure that is needed for our ETL purpose. And then, just deploy it using one command, CDK deploy. Another way to do this would be to use Terraform directly or CloudFormation directly, where we need to write those YAML files and then upload them. So, those files should contain this ETL infrastructure, like S3 buckets, I/O rows. Let's say, if you're using an EC2 instance for transformation, that's also needed. And also for compute services, if you're using a Lambda function, we need to include that as well as a step function. So, all such things, we will need to include either in a Terraform file, either in a CloudFormation template, or a CDK template. So, other ways that we can do this would be to use AWS SDKs in Python, like Boto3. So, we can leverage that to directly create infrastructures like step functions, DynamoDB, which can be directly created from this. So, all such options can be used to automate the deployment of AWS infrastructures for ETL purposes.

    So to automate orchestration of multiple ETL jobs in AWS, ensuring data consistency. There are multiple options to achieve this. The first would be to use AWS Glue directly, which is an inbuilt ETL tool. Another option is to develop custom pipelines as well. The third thing which I know is that we can leverage AWS Airflow, which is powered by Apache, and is also a good ETL tool where we can define various directed acyclic graphs, which would consist of tasks that will perform ETL operations. So firstly, we need to define those ETL tasks and their sequence. This is how we can start it. And let's say we are using custom pipelines. This can be achieved by using step functions, and then each task can be a Lambda function. And then we can have some data in between, like for backup or for storing archive data, we can use S3 buckets. So these can also be steps inside step functions. I'm talking about custom pipelines now. But talking about the automated pipeline in which AWS Glue provides, we can use data extraction, transformation, and loading. So we need to define blue jobs for each ETL task here, specifying the source data, and then transformation how we need to transform it, and target a destination as well. In Airflow, this follows the same steps where we need to define each task as a graph there. And then we get a way to also replay that data so that if some steps get failed, we can also re-assess it or look at the logs, troubleshoot it, and then not directly reject that step again. So this is how it can be achieved.

    To troubleshoot an unsuccessful API data integration in Python ATL, process, what we can do is check for logs first. You know, we can look at what error we are getting. If it is about the error that is coming from our code, we have already specified some specific error codes like 404 for unauthentication, and let's say 500 for server error. So we can review those logs and look at where the problem is actually coming from. So the first step would be to review error logs. So for this, if we have integrated cloud formation, I'll look into those logs first. And look at specific error codes description and stack status as well as what went wrong. And then I would see API documentation, how this API is being created. And if I'm calling correct endpoints or not, if I'm passing correct query string arguments or not, string parameters or not. And also if there is something that I'm passing in a post request using body. So if I'm passing it in a specific format or not. And the third thing I would check is the API access as well. So let's say if I'm calling a API and I don't have specific access to it, like read access. So what I can do is I can check if I am being authenticated for that API or not, if I have access to those API or not by using logs itself or error codes itself. But also I can check where this API is defined and what are the API authentication methods being used. Let's say, is it API key? Is it an access token? Or are these credentials? So this, I can check. And also I can check the API connectivity by using tools like Postman and verify if those requests are successful or not. And this also helps us isolate the testing before we do it in production environment. And other things are like I'll also check the response data. How is it coming? And if there is anything wrong in receiving those data. Let's say, if I am if the API is sending data in JSON format and I'm trying to access it in text format, that might also create a problem for us. So these are all the things that I would troubleshoot to check what is making an unsuccessful API call.

    To handle exceptions in a Python script for data transformation, I would start with the first step which is writing the try and except block in Python because that would allow me to check specific errors that might be coming from that script. So I would write the try block first, and then inside that block, I would write this specific code which might generate some errors. And then I'll except the specific error inside the except block and write those things accordingly. I'd also check for potential failure points, like what are the specific areas in the data transformation code where exceptions might occur. Let's say it's an I/O operation, data passing, database interaction, API calls, or customer data processing logic. So I'll check all these things and use them inside a try-except block. And also, as I said, I'll try to catch those specific exceptions only. So while doing this, what we can do is, let's say, if I'm doing some certain operations while transforming data, so I can see I'm using arithmetic operation here. So I use that arithmetic error inside the except block. So this can be used to check specific errors and handle proper exceptions in Python. So I'll also log those errors properly. I'll also have a proper fallback mechanism as well. Let's say if this error occurs, what should be the next step?

    So here, the caller is using pandas to read CSV data that is large dataset dot CSV, and then optimize that optimized query for this dataset. So inside the function, he's attempting to optimize the query. So what he's doing is, he's using data frame column a. Okay. He's trying to filter it by data frame column a which are greater than 100 and data frame column b which is smaller than 200, and then copying it inside result variable. So I can see here that it has some improper column access, that can be checked here. So the code snippet contains syntax error in the function that is optimized query. So in attempts to access columns from data frame using incorrect syntax. So let's say, data frame a instead of data frame 1 is being used here. And okay. And the other thing is copying data frame unnecessarily because we are actually making a copy call, which is actually a performance bottleneck here. So this will copy all the rows from this column, which is a very costly query for us and which will introduce overhead and performance issue for us when we are dealing with large dataset. So we can also face potential memory overhead depending on the size of data frame that we are using. So applying the query connection directly, might create an intermediate Boolean mask that consumes additional memory. Instead, a better way to do this would be directly using data frame without using copy operation. So to solve this, we can use data frame column a greater than 100 and data frame column b smaller than 200. We don't need to use copy function here. So this way we can avoid unnecessary copy operation, and it would definitely solve performance bottleneck for us.

    The techniques I would use in Python to ensure efficient data manipulation of large data frames would be to consider that as we are dealing with a large amount of data, we'll need to look at things where we can filter data and not copy data unnecessarily. And also, it should be like the operations we are doing with the database server should be minimal because if I'm doing it for every row, then it would be costlier because we are dealing with a large dataset. So we can use Pandas chunking here, like when dealing with large data frames. And we can define some chunk size while reading a file, let's say a CSV file or SQL file. And then, you know, we can manage those chunks accordingly. And also, we can select filtered columns. Like if you want to fetch some specific columns from the database, then we can specify it while querying the database only. And we don't let's say, the other way can be we just take out this whole data and then filter it inside our state, inside our pandas state maybe. But this can be avoided and just we'll just fetch the amount of data that is required, so that we don't need to worry about it filtering later. And also it will save a lot of memory cost as well while we are calling the API for database data. And then we can use some optimizations for proper column types as well. Let's say integers, float categories to reduce memory usage. So let's say for Numeric, we can directly convert it using pd.to_numeric. Instead of having it on each row, we can directly convert those columns directly. And other things is like we can pass data frame using sparse data frame function, which is provided by pandas directly. We can also use categorization there by using pd.categorical. And while performing operations, we can use the concept of parallel processing here, where we send those API calls in parallel so that those calls are made concurrently to the database. It would also save a lot of time for us and fetch the data in less time than if we do it sequentially. We can use proper group by. We can use proper pivot table functions which is provided by data frame in pandas. So we can also avoid iterations, as I said, by using parallel processing. And we can also optimize memory when we are dealing with data transformation.

    To architect a cloud-based ATL solution that is resilient to data schema changes over time, it's actually a complex thing to do because, as we know, the data schema is changing over time. So we cannot use something that is static. And if we do define those things beforehand and then use it for data transformation, we'll need to have some modular architecture for this. So we need to define ATL solutions with modular architecture that separates each stage of the pipeline. Let's say extraction, transformation, and loading into discrete components. This allows for easier maintenance and updates when data schema changes over time. The second thing that I do is use schema evolution handling. So I'll actually define a mechanism to handle schema evolution gracefully. I use techniques such as schema interface, schema on read, schema versioning, and adapt to changes in data structures without requiring immediate modifications of the ETL pipeline. I'll also do metadata management. So, depending on the data that is coming, we need to maintain data schema and also change that data schema dynamically. So to handle this, we need to deal with data metadata management as well. The mapping between these changing schemas needs to be done in a proper way. So, let's say previously it was like the column type was like integer, and now we are using it as float. So those mappings need to be stored somewhere and proper management is needed, because data schema is changing over time. So this needs to be handled. I'll also do data validation and quality check because we are changing this data more frequently here. So we'll need to validate the data accordingly and before putting it to production environment. I'll also have flexible data transformation. So it doesn't need to be fixed because the schema is very much flexible here. So our data transformation needs to be flexible in place. So for this, we can use tools such as Apache Spark, AWS Lake Formation as well. And have dynamic schema resolution during transformation. So this can be used here. But when we are dealing with custom pipelines, we really need to handle it ourselves using some Lambda functions in between, which handle those schema changes. But if you are using inbuilt tools like ETL tools like Apache Spark and Glue, then dynamic schema resolutions can handle this. Thanks. We can also use versioned data stores here to minimize the overhead for us to handle schema changes.

    Okay. So I use AWS Cognito to secure a Python ETL system. When I am dealing with APIs for ETL operations, those APIs can be authenticated using AWS Cognito. And I can define who has access to those APIs. So Cognito has multiple options when it comes to authentication. It can have SAML operations. It can have OAuth, and it can also authorize data. For these things, we can use Cognito. The first thing that I'll do is authenticate the data source. If your ETL system needs data access that requires authentication, we can use AWS inbuilt services like Cognito to authenticate the ETL system before accessing the data source. The second thing that can be used here is authentication for API access. First, it was data access. Second, it is API access. If your ETL system exposes APIs for data injection and extractions, you can use AWS Cognito to authorize access to these APIs. This ensures that only authenticated users get access to those APIs directly. We can also have user management and data consumers defined in our AWS Cognito environment. We can manage access by using group access policies there. We can also have policies that are directly attached to specific groups of users, specific endpoints, if you're using API gateway there. This can be easily integrated with other AWS services like Lambda, S3, and DynamoDB. We can also use Cognito tools on-site. So the steps I would take are: I'll create a user pool by creating an AWS Cognito user pool in the AWS management console and define user attributes and password policies there. I'll integrate authentication by SAML or OAuth as I said. And then attach this Cognito system to either an API or the data source that we are using.

    To create data visualization in a cloud environment, we can use Streamlit. It is a platform that can analyze data and create specific graphs or visualize data based on the data that is coming. Firstly, we'll ensure that our cloud environment setup deploys our Streamlit environment. We can use serverless platforms like Docker or Fargate to deploy this. Alternatively, we can install the Streamlit environment in another environment and use the data from the cloud to visualize it. One way to do it would be to use a Python package manager that is pip, and then install Streamlit by running pip install streamlit. The easiest way to do it would be to use an EC2 instance and have Streamlit there. Another option is to have it installed in our Lambda functions layer, which can act as a package container for all Lambda functions. We can install it there and use it from there via an API call as well. If we send some data to that API, we can get the response as a visual data there. We can also develop interactive visualizations using Streamlit. The few things we need to consider here are using virtual machines, a containerization environment, or a serverless stack deployment.