Vetted Talent

Rajat Agrawal

Vetted Talent

Dynamic Python Back-end Developer with a proven track record in designing and implementing complex software solutions across diverse industries. Demonstrated expertise in deploying applications on Azure AKS Clusters within internal cloud infrastructures and hands-on experience in AI and Gen AI projects, particularly with LLM models and training data. Proficient in AWS services such as EC2, S3, RDS, and Lambda, along with strong database management skills using SQL, MySQL, and PostgreSQL. Familiar with Elasticsearch for search functionalities and adept at utilizing Pandas and NumPy for data analysis. Experienced in CI/CD practices with tools like Jenkins, and skilled in task automation and workflow management. Knowledgeable in Redis and MongoDB for data caching and storage solutions, and proficient in executing ETL processes. Capable of developing web applications with Django and the AWS SDK, while exhibiting strong debugging skills in Python. Excellent communicator with the ability to collaborate effectively within teams and deliver high-quality results under tight deadlines.

Role
Technical Lead
Years of Experience
7 years
Professional Portfolio
View here

Skillsets

Kubernetes
PySpark
MongoDB
Elasticsearch
Data Engineering
Data Analysis
AWS
SQL
Python
Jenkins - 4 Years
System Design
Redis
Python - 6 Years
ETL - 5 Years
Debugging
AWS - 6 Years
SQL - 6 Years
AWS - 6 Years
Kubernetes - 3 Years
Django - 3 Years
NumPy - 2 Years
CI/CD - 6 Years
pandas - 2 Years

Vetted For

9Skills

Roles & Skills
Results
Details

Python Cloud ETL Engineer (Remote)AI Screening
64%

Skills assessed :SFMC, Streamlit, API, AWS, ETL, JavaScript, Python, React Js, SQL
Score: 58/90

Professional Summary

7Years

Jun, 2023 - Present2 yr 11 months
Technical Lead
Apptad
Sep, 2021 - Apr, 20231 yr 7 months
Software Engineer
Gartner
Jan, 2021 - Sep, 2021 8 months
Business Technology Solutions Associate
ZS
Jan, 2019 - Dec, 20201 yr 11 months
Junior Associate
Daffodil Software

Applications & Tools Known

Splunk
Kubernetes
Git
Jenkins
Linux
Django ORM
ELK Stack
ETS
API Gateway
CloudWatch
WAF
MySQL
PostgreSQL
Elasticsearch
Pandas
NumPy
Redis
MongoDB
ETL
Django
AWS SDK
AKS
Azure
Snowflake
NFS
API Gateway
WAF
Falcon
RabbitMQ

Work History

7Years

Technical Lead

Apptad

Jun, 2023 - Present2 yr 11 months

Deployed ETL pipeline to load data from Right Data servers to snowflake databases, reducing data processing delays by 25%. Deployed SaaS microservices on Kubernetes, improving system reliability by 30%. Built and maintained pipelines with Azure services, achieving 99.9% uptime. Automated ETL processes, saving 10 hours per week on manual tasks. Deployed data extensive APIs in Flask microservice leveraging snowflake for large dataset.

Software Engineer

Gartner

Sep, 2021 - Apr, 20231 yr 7 months

Developed Python-based engineering solutions, enhancing data pipeline throughput by 20%. Leveraged AWS tools like Lambda, API Gateway, and DynamoDB for scalable applications. Integrated API Gateway with third-party systems, reducing integration failures by 15%. Delivered clean, aggregated data pipelines for business use, cutting errors by 40%.

Business Technology Solutions Associate

Jan, 2021 - Sep, 2021 8 months

Designed data pipelines using Elasticsearch, ensuring 99% reliability for downstream consumption. Collaborated on cross-functional projects, increasing delivery speed by 15%. Improved efficiency and performance of event-driven systems, reducing latency by 10%.

Junior Associate

Daffodil Software

Jan, 2019 - Dec, 20201 yr 11 months

Developed scalable Python backends using Django and Falcon, improving API response times by 25%. Implemented database integrations with SQL and NoSQL systems, ensuring data integrity. Optimized backend solutions, reducing system downtime by 30%.

Achievements

HackerRank Problem Solving Certificate (Sep 20 - Present)
HackerRank Python Star Achieved (Jul 20 - Present)
Machine Learning and Python by Ducat India (Mar - Present) The Workshop focused on Basics of Python and its libraries along with brief introduction of Machine Learning

Major Projects

2Projects

Data Catalog and Quality Processing System

Improved microservice architecture, increasing processing efficiency by 20%.

Data Workstream Processing

Integrated Python and Java-based microservices, streamlining operations across teams.

Education

PGDBM in IT
NMIMS (2022)
B.Tech in Computer Science Engineering
ABES Engineering College (2019)
Intermediate
Wisdom Public School (2015)

Certifications

Hackerrank problem solving certificate (sep20 - present)
Hackerrank python 5 star achieved (jul20 - present)
Machine learning and python by ducat india (mar18 - present)
Machine learning and python by ducat india
Hackerrank python 5 star achieved
Hackerrank problem solving certificate
Blood donor and volunteer certificate

AI-interview Questions & Answers

So my background is mostly in, I'm currently working as a Python developer, and I have worked on AWS, Azure as cloud technologies, SQL, MySQL as database types, worked on-premises and DB servers, cloud services as servers, and I've worked on REST APIs also. This is a little bit about me. I worked on frameworks of Python. I have around 6 to 6 years of experience. Worked on multiple companies, clients, and projects. I have hands-on experience with Pandas, NumPy, and other tools for data transformation.

So how would you monitor and log a Python ETL process that interfaces with AWS services and SQL DB to ensure reliability? To ensure the reliability of a Python ETL process that interfaces with AWS services and a SQL DB, we'll use ticketing to service like AWS CloudWatch and CloudWatch events to monitor the coming logs of the scripts. Then, we'll create CloudWatch events to perform logic and rules onto the coming logs. For instance, if there's any debug or error, that would create an alarm or a notification to ensure any activity that needs to be done post the issues. There's also cloud trail, but I don't think it's necessary because it's for the actual running of the service, not the internal logs of the application that's running. I think CloudWatch and CloudWatch events should be the ones to go forward where we create the rules and set the alerts based on the code coming from the service application.

I think, 1st is I mean, let's. Yeah. I think this should be done by doing incremental data loads based on time stamps, where you provide date and time ranges and increment your data accordingly. And to accommodate growing data volumes, make sure your storage is flexible and can scale. Like, let's say, if it's a data lake in the form of S3, then you can scale it to your needs, setting limits wherein it can auto-scale. Right? Second, if it's a DB, then for VTL, Redshift is a solution to store terabytes or petabytes of data. So that's the infrastructure in terms of databases, where you can use cloud solutions. And in terms of scripting and code, I'd say initially, when data volume is limited, you can use AWS Lambda. But if intermittent data grows and Lambda's limit is exceeded, then I think you should use AWS Glue to run ETL jobs and batches. So, AWS Glue is a standard industry-wide application solution to perform ETL solutions and track data, and to perform transformations per data. Thank you.

What you would use in Python to ensure that a sequence of SQL operations and there are two asset properties. There is one library, I think, wherein you implement the changes in the form of transactions, you implement the changes in Python. So what happens is when you perform a sequence of code or a sequence of functions in an atomic manner, wherein either they are implemented completely or they are rolled back. So if, let's say, you have 10 different SQL queries to implement, which are doing insert, update, and everything, and let's say five of them are implemented correctly and then the sixth one gives an error, what would happen is your DB would be in an inconsistent state. So, in order to do that, we have a library in Python which helps you implement these 10 steps in either completely or complete rollback. So, there is one atomic transaction library, okay, which helps you ensure that your SQL operations are either implemented completely or a complete rollback. If at any point they get a failure, so that your DB is always in a consistent state. Yeah. Okay. Atomic transaction, yeah. I think that's what one more thing I think we can add on to this is we can make our DB connection in such a way, right, wherein if, let's say, if we are implementing a set of SQL operations, then either they are executed when we are executing those sequences of commands. So even if one of them gives an error, then we'll abort the whole transaction, okay, so that our DB is always in a consistent state. Alright.

So we can propose a method for doing incremental data loads in Python API to minimize resource usage. First approach would be to do if this is a time series data. So let's say if the data has a given date and time, then definitely we can compare and then only update the delta data. The change in the data. So let's say if we are doing an incremental data load and we already have data till yesterday. So what we want to do is compare the existing data and the new data, and then upload only the data from yesterday to today. For that, we can do a compare. And if this is a time series data, then we can definitely write a query wherein we filter out the reports which are already there and then insert only the new data that was there. So that way we can improve the resource consumption in terms of it won't have the duplicate data first, and it will insert only the remaining data that we need. That means only the newer data that we need. There should be a one way to implement the incremental data loads. And to compare this, we can connect to our DB using the Python script, run the SQL command. Get the data, and we'll compare the source and target system DB. Wherein our source will have the newer data, and our target would have the data only till yesterday. So getting the data out of those two systems and then upload only the differentiated or the delta data.

Python, how would you handle transaction rule process calls at any point? Yeah, I think there is 1, I mean, the SQL library for the DB, they support one functionality wherein they provide us to write or execute our Python code in a atomic manner. Okay, so what it does is the piece of code which is going to execute in a sequence. Right? So those parts will make it in the transaction dot atomic form so that anything that is executed within that block, either it implements completely or it doesn't get any. Okay? So what it means is either whole 10 or 15 SQL statements implement correctly successfully. Or, otherwise, if any one of them fails, then the others will be rolled back. Okay? So this is how Python has a way to support it. You write your piece of code in that transaction dot atomic block and write your queries into that. Okay? So if any retail process fails in between, then all of the previous SQL statements will also be rolled back. So I think, yeah, that is completely straightforward.

From sales, the revenue is greater than the previous one. Right. And how would you debug it? So, from sales, the revenue is greater than the revenue of the previous one. Order by month is clear. I think, yes. So I think cases where we need to compare we should compare the revenue when we step forward, the way to do it is to use an inner query or a nested query instead of using this lag. Okay? And since we are doing an order by, they have no mechanism to hear the point of failure would be that there is no ascending or descending. Right? So it won't be able to know how it should compare to the previous month only. Right? So when you do an order by month, it should be by month in a descending manner. Okay? And, yes, wherein the last one or second last one will be the default one. Okay? And, yes. And I think an inner query should be the one way to tackle this problem. So from your dataset, you when you do a select star, then you compare I mean you order by the whole data based on the month. Okay? And then only you compare the previous one to the current one. Okay? There is one use case where it might fail. So if let's say we have the month only and we don't have the number of years, so in that case, we should also order by based on the month and year. So let's say the month of January might be there for 2022, 2023, 2024. So doing it only by over the month is the one point wherein it could fail and the result might be inconsistent. So I think it would be better if we do an order by both month and year. Okay? Yes. So that's all I think should be.

This code detail was supposed to exchange JSON data from an API transform loaded to the app. Identify the experiment. What is it? Okay. And I request if you request get data is coming.json. Transform data. Item for item in API response item is a JSON array, I guess. And, what we are doing is we have created a list of dates, okay, for each row that is an item. We've created a list of dictionaries. Now what we are doing, pd.df.from_dict transform data. Okay. Honestly, onto the first look. It looks okay, but, yeah, I think something should be some quantity, and price. So first thing, so I think first thing is we are passing a list of dates. Okay? So I think, from date function calling it on a list of dates, I think this should be one point of failure. Okay? And second thing is for any attribute where the value is not present in the item JSON, then it might give you NaN or inconsistent values. So that could also be a second point of issue which might occur. Because when we are doing a product on the value of quantity and price, then if one of them is None, then it is going to give you a mathematical error. Okay? And what else? One more thing I'm observing is, like, when we have this list. So I have seen in pandas, like, we can directly use a list. I mean, we can directly pass the list onto the pd.dataframe function, okay? Like, pd.df. Okay? I mean, we don't have to call specifically the from_dict on a list. Okay? So what we can do is pd.dataframe, and then in it, we can directly pass this list. And also, it would be better if we pass headers, okay, headers list so that it could give the headers in the exact sequence in which we need. Okay? It would return our data in that specific format. Alright. Yes. So I think these are the few observations which I have related to this code. Alright. Thank you.

What would be a secure method to manage the estimate information? So just keys to the manager. Yeah. So I have two, three approaches here. First thing is to encode it in a certain algorithm. Okay, certain encryption encoder so that your data if you're hard-coding it into your config files. Okay. So first, if you make it part of the code, then make it encrypted. Okay? I mean, encrypted with a particular algorithm and then pass that hash into the code. Second, thing is to use a separate config file and place that config file on another server and read that configuration from there. Okay? From your code to that server. The third thing is to store your credentials directly into a vault. Okay. So, like, secrets manager is there on AWS. A third-party service called DCPS is there, which stores your passwords and secrets and API keys and things like that. Okay? So it is always very recommended to keep these things out of the code. So I think, and there are definitely pre-built functions provided by the libraries of particular services to just call them and get your exact values. So that way, it makes it more secure, robust, and less chance of any security and sensitive data constraints. Okay. Yeah. So I think, yeah. These two, three things. Either make it encrypted, or store the config file somewhere else out of your current code, or your third one is to store it in a secret vault. Okay? That would be the most recommended approach.

Haven't used, I have no hands-on experience on the Salesforce app as such, but, yeah, I think one good approach would be to implement try except in Python. Okay? So all the possible causes. The second thing is trying to implement a traceback of the error. Okay? So that we return or print or log everything that is in the scenario it is failing. Okay? And creating those alert mechanisms and notification mechanisms to provide the error what exactly is happening. And the fourth thing would be to, in order to troubleshoot, like, create a subset of the data. So let's say the data coming to the pipeline is incorrect, then we should also have a log of the execution data that we are getting, input data, right, so that we can exactly see on what run the data field that we have the data directly to see. This data was coming from the source. And for this particular subset of data, this execution has failed, so that we can save our time and quickly debug the issue on the data side. Yeah. And, also, don't forget to write try catch or try except statements wherein there are more chances of failures. So you know, so that you don't have to go through the whole code, okay, to identify the cause of the issue. One more thing, like, if you have connection I mean, set your code locally, then it would be good to run a debugger on your local machine on that particular dataset so that you can easily run the code step by step to identify on which step, on which condition, or on which piece of code it is failing.

Is it the same question? What are some key considerations when integrating Salesforce marketing cloud API with Python, ETL? Considerations would be, go through the Salesforce docs first. You use their best security and approaches and standards. While connecting to Salesforce, you don't leave your network connection open. Always close it once your task is completed. Third, you must go through their documentation, see what all the libraries and functions they have so you don't have to rewrite them again, and use most of their prebuilt functions in your code. Fourth, apply the same third-party logs, like AWS CloudWatch logs or Splunk logs, to store everything that's happening on your application, so you have a track of things. Fifth, log the intermediate files. So, let's say you're performing a three-step process, you should have the intermediate files which are that input file is changing to step 1 output. You should have those particular datasets so that you know if anything fails or if any output is not correct, so you'd know at which point your input or the intermediate output was correct or at which exact step your output has been wrong and where exactly you need to correct it. So, I think that should be the way to go forward. Thank you.

Rajat Agrawal

Technical Lead

7 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Technical Lead

Software Engineer

Business Technology Solutions Associate

Junior Associate

Achievements

Major Projects

Data Catalog and Quality Processing System

Data Workstream Processing

Education

PGDBM in IT

B.Tech in Computer Science Engineering

Intermediate

Certifications

Hackerrank problem solving certificate (sep20 - present)

Hackerrank python 5 star achieved (jul20 - present)

Machine learning and python by ducat india (mar18 - present)

Machine learning and python by ducat india

Hackerrank python 5 star achieved

Hackerrank problem solving certificate

Blood donor and volunteer certificate

AI-interview Questions & Answers