Dynamic Python Back-end Developer with a proven track record in designing and implementing complex software solutions across diverse industries. Demonstrated expertise in deploying applications on Azure AKS Clusters within internal cloud infrastructures and hands-on experience in AI and Gen AI projects, particularly with LLM models and training data. Proficient in AWS services such as EC2, S3, RDS, and Lambda, along with strong database management skills using SQL, MySQL, and PostgreSQL. Familiar with Elasticsearch for search functionalities and adept at utilizing Pandas and NumPy for data analysis. Experienced in CI/CD practices with tools like Jenkins, and skilled in task automation and workflow management. Knowledgeable in Redis and MongoDB for data caching and storage solutions, and proficient in executing ETL processes. Capable of developing web applications with Django and the AWS SDK, while exhibiting strong debugging skills in Python. Excellent communicator with the ability to collaborate effectively within teams and deliver high-quality results under tight deadlines.
Senior Software Engineer
ServiceNow [via Apptad Inc.]Software Engineer
GartnerBusiness Technology Solutions Associate
ZS AssociatesJunior Associate Software Engineer
Daffodil SoftwareSplunk
Kubernetes
Git
Jenkins
Linux
Django ORM
ELK Stack
ETS
API Gateway
CloudWatch
WAF
MySQL
PostgreSQL
Elasticsearch
Pandas
NumPy
Redis
MongoDB
ETL
Django
AWS SDK
AKS
Azure
Snowflake
NFS
API Gateway
WAF
Falcon
RabbitMQ
Project Details -
Tech Stack - Python, Flask, Azure, Snowflake, Kubernetes, fine tuning, NLP, ML.
So my background is mostly, like, uh, I'm currently working as a Python developer, and I have worked on AWS, Azure as a cloud technology, SQL, MySQL as the databases type, um, worked on on premises and DB servers, cloud services as a server, and I've worked on REST APIs also. So yeah. This is a little bit about me. I worked on frameworks of Python. I have around 6 5 to 6 years of experience. Worked on multiple companies, clients, and projects. I have hands on on Pandas, NumPy, and etcetera for data transformation. Yeah. Thanks.
Okay. Yeah. So how would you monitor and log a Python ETL process that interfaces with AWS services and SQL DB to ensure reliability? Okay. Yeah. So I think we'll, uh, we'll use, uh, ticketing to, uh, service like, uh, AWS, uh, CloudWatch and CloudWatch events to monitor the, uh, coming logs of the scripts, okay, and then create CloudWatch events to, uh, perform logic and rules onto the coming logs. So let's say if there is any debug or any error, so that would create an alarm or a notification for us to ensure any activity that needs to be done post the issues. Okay. And uh, there is also one more thing as cloud trail, but I don't think that's any more necessary because, uh, that is for the, uh, actual running of the service, not with the internal locks of the application that is running. Yeah. So I think CloudWatch and CloudWatch events should be the one to go forward where we create the rules and set the, uh, and make and make the alerts based on the code coming from the service application application box.
How would you design a Python intel solution that can scale to accommodate growing data volumes. I think, uh, 1st is I mean, let's. Yeah. I think, uh, this should be done by doing the incremental data load and, data load based on the time stamps here in you provide the date and time ranges and increment your data accordingly. And to accommodate the growing data volume, make sure your storage is flexible and aware, and it can, uh, it can scale. Okay? Like, let's say, if it is a data lake in the form of s three, then you can scale it to your needs. You set those limits wherein it can auto scale. Right? Second thing is, let's say if it is your DB, then for the VTL kind of solution, Redshift is the one solution to go where in it you can store terabytes or petabytes of data. Okay? So that would be the infrastructure in terms of the databases wherein you can use, right, the cloud solutions. Alright? Yeah. And in terms of the scripting and the code wise, I think, initially, I would say when data volume is limited, we can use AWS Lambda. But then if the intermittent data grows, then, uh, and Lambda limit exceeds, then I think we should use, uh, AWS Glue to, uh, run the ETL jobs and batches. So yeah. I think AWS Glo is the standard industry wide application solution to perform the to perform and implement ETL solutions wherein to track the data and the I mean, to perform the transformation on the per data. Thank you.
What would you use in Python to ensure that a sequence of SQL operation and there's 2 asset properties? There is 1 library, I think, wherein you implement the in the form of transactions, you implement the uh, changes in Python. So what happens is when you perform a sequence of code or a sequence of functions into a atomic manner, okay, wherein either it implements completely or it's all rollbacks. So if, let's say, you you have 10 different SQL queries to implement, which are doing insert, update, and everything, And let's say your 5 of them are implemented correctly and then 6th one give an error, So what would happen is your DB will be an inconsistent change. So, uh, in order to do that, we have a library in Python which helps you, uh, implement these 10 steps in either completely or or complete rollback. So there is one atomic transaction something, a library, okay, which helps you which ensures that your SQL operations are either implemented completely or a complete rollback. Okay? If at any point they get failure so that your DB is always is in the consistent state. Yeah. Okay? Atomic dot transaction something. Yeah. I think that's what One more thing I think we can add on to this is we can make our DB connection in such a way, right, wherein if, let's say, if we are implementing a set of uh, SQL operations, then either they are when we are execute when we are executing those sequence of commands. So even if one of them gives error, then we'll, uh, abort the whole transaction, okay, so that our DB is always in the consistent state. Alright? Thank you.
Can you propose a method for doing incremental data loads in Python API to minimize resource usage? Usage. I think for the incremental data load, first approach would we do if this is a time series data. Right? So let's say if the data has the given date data and time, then definitely, we can do compare and then only update the delta data. Right? The change in the data. Okay? So let's say if, uh, if we are doing an incremental data and then we have already data till yesterday. Right? So what we wanna do is we wanna compare the existing data and compare the new data and then the data the change that is the data from yesterday to today only we want to upload. Right? So for that, we can, uh, do a compare. Okay? And then if this is a time series data, then we can definitely, uh, write a query wherein we filter out the reports which are already there and then implement the and then, uh, insert only the new data that was there. Okay? So that way we can, improve the resource consumptions in terms that it won't have the duplicate data first thing, and it will in in it will insert only the remaining data that we need. That means only, um, the newer data that we need. Alright? So there should be a one way to implement the incremental data loads. And to compare this and I mean, we can connect to our DB using the Python script, run the SQL command. Okay? Get the data, and, uh, we'll compare the source and target system DB. Right? So wherein our source will have the newer data, and our target would have the data only till yesterday. So getting the data of out of those 2 systems and then in, uh, upload only the differentiated or the delta data. Okay? Thank you.
Python, how would you handle transaction rule process calls at any point? Yeah. I think there is 1, um, I mean, the SQL library for the DB, they support 1 they they support one functionality wherein they provide us to write or execute our Python code in a, uh, atomic manner. Okay? So what it does is the piece of code which is going to execute in a sequence. Right? So those part will make it in the, uh, date, uh, transaction dot atomic form so that anything that is executed within that log, either it in in, uh, implements completely or it doesn't get any. Okay? So what it means is either whole 10 or 15 SQL statements implement correctly successfully. Or, otherwise, if if any one of them fails, then they all the others will be rolled back. Okay? So this is how Python has a way to support it. You write your piece of code in that transaction dot atomic block and, uh, write your queries into that. Okay? So if any retail process fail in between, then all of the previous SQL statement will also be rolled back. So I think, yeah, that is completely straightforward. Okay.
Look at this SQL query which is meant to select all report from sales that other revenues are in the previous one. Right. And how would you debug it? So, sir, from sales, the revenue is greater than would like to run you over order y. Month. Order by month is clear. I think, uh, yeah. So I think cases where we need we should I mean, I think when we step forward, the way to do it is uh, use inner query or a nested query instead of using this lag. Okay? And since we are doing an order by, they have no mechanism to, uh, hear the point of failure would be there is no sending or descending. Right? So it it won't be it won't know how how it should compare to the previous month only. Right? So when you do an order by month, uh, it should be by, uh, first thing is a descending manner. Okay? And, yeah, wherein the last 1 or second last 1 will be the default one. Okay? And, uh, yeah. And I think inner query should be the one way to, uh, tackle this problem. So from your dataset, you when you do a select star, then you compare I mean, you order by the whole data based on the month. Okay? And then only you compare the previous one to the current one. Okay? Um, there is one use case where it might fail. So if, let's say, we have the month only and we don't have the number of years, so in in that case, we should also order by based on the month and year. So let's say month of January might be there for 2022, 2023, 2024. So doing it only by over the month is the one point wherein it could fail and the the result might be inconsistent. So I think it would be better if we do an order by the both month and the year. Okay? Yeah. So that's all I think should be
This code detail was supposed to exchange JSON data from an API transform loaded to the app. Identify the experiment. What is it? Okay. And I request if you request get data is coming.json. Transform data. Item for item in API response item is a JSON array, I guess. And, uh, what we are doing is we have created a list of date, okay, for each row that is an item. We've created a list of dict. Now what we are doing, pd.df.fromdictransform data. Okay. Honestly, onto the first look. It looks okay, but, yeah, I think something should be some quantity quantity price. So first thing So I think first thing is we are passing a list of date. Okay? So I think, uh, from date function calling it on a list of date, I think this should be, uh, one point of failure. Okay? And second thing is for any attribute where the value is not present in the in the item JSON, then it might give you a NAND or inconsistent values. So that could also be a 2nd point of issue which might occur. Because when we are doing a product on the value of quantity and price, then if one of them is none, then it is going to give you a a multiplication error or a mathematical error. Okay? And And what else? What else? One more thing I'm observing is, like, uh, when we have this list. Right? So I have seen in pandas, like, we can directly use a list. Uh, I mean, we can directly pass the list onto, uh, the p d dot data frame function, okay, like, uh, pd.df. Okay? I mean, we don't have to we call specifically the from date on a list. Okay? So what we can do is p d dot data frame, and then in it, we can directly pass this list. And also, uh, it would be better if we pass headers, okay, headers list so that it could give the headers in the exact sequence in which we need. Okay? It would return our data in in that specific format. Alright. Yes. So I think these are the few observation which I have related to this code. Alright. Thank you.
What would be a secure method to manage the estimate information? So just keys to the manager. Yeah. So I have 2, 3 approaches here. First thing is to encode it in a certain algorithm. Okay? Certain encryption encoder so that your data if you are hard coding it into your config files. Okay? So first thing is if you make if you're making it part of the code, then make it, uh, then you should make it encrypted. Okay? I mean, encrypted with a particular l o and then pass that hash into the code. Second thing is to use a separate config file and place that config file on another server and read and, uh, read that configuration from there. Okay? From your code to that server. 3rd thing is to store your credential directly into a vault. Okay. So, like, uh, secrets manager is there on AWS. A third party service called DCPS is there, which stores your passwords and secrets and API keys and things like that. Okay? So it is always very recommended to keep these things out of the code. So I think and there are definitely prebuilt functions provided by the libraries of particular service to just call them and get your exact values. So that way, it makes it more secure, robust, and less chances of any security and sensitive data constraints. Okay. Yeah. So I think yeah. These 2, 3 things. Either make it either encrypted, either store the config file somewhere else out of your current code, And, uh, your 3rd one is to either store it in a secret vault. Okay? That would be the most recommended approach.
Haven't used, uh, I have no hands on experience on the Salesforce app as such, but, yeah, I think one good approach would be to implement, uh, try accept in Python. Okay? So all the possible causes. 3rd second thing is trying to implement traceback of the error. Okay? So that we return or print or log everything that is in in which scenario it is failing. Okay? And creating those alert mechanisms and notification mechanism to return to provide the error what exactly is happening. And 4th thing would be to, um, in order to troubleshoot, like, create a subset of the data. So let's say the data coming to the pipeline is incorrect, then we should also have a log of the execution data that we are getting, uh, input data, right, so that we can exactly see on what run the data field that we have the data directly to see, hey. This data was coming from the source. And for this particular subset of data, this, uh, execution has failed so that, uh, we can save our time and quickly debug the issue onto the data side. Yeah. And, also, don't forget to write try catch or try accept statements wherein there are more chances of failures. So you know so that you you don't have to go through the whole code, okay, to identify the cause issue. Uh, one more is, uh, like, if you have connection I mean, set your code locally, then also it would be good to run debugger on your local to on that particular dataset so that you can easily run the code step by step to identify on which step, on which condition, or on which piece of code it is
Is it the same question? What are some key considerations when integrating Salesforce marketing cloud? API with Python, ETL. Consideration would be I mean, go through docs first. I mean, the sales force docs. Okay? So you use their best security and, best approaches and standards. Okay? Second thing is while connecting to the Salesforce and everything, like, you don't leave your network connection open. Okay? Like, always close it once your task is completed. 3rd 1 would be, um, I mean, again, I must like you have to go through their documentation, see what are the all the libraries, all the functions that they have so that you don't have to rewrite these them again, and you use most of their prebuilt functions, okay, in your code. 4th thing would be to apply this, yeah, same, uh, third party logs like AWS CloudWatch logs or Splunk logs to store everything that's happening on your application, okay, so that you have a track of things. Okay? The 5th thing would be to log the intermittent files. So let's say whenever you are saving some files on your side, okay, like, uh, let's say you are performing a 3 step process. So you should have, uh, the intermittent files which are that input file is changing to, uh, step 1 output. Right? So you have you should have those particular datasets so that you know if anything fails or if any output is not correct so that you'd know at which point till which point your input or the intermittent output was correct or at which is exact step where your output has been wrong and where exactly you need to correct it. Okay? So, yeah, I think that should be the way to go forward. Thank you.