
In his dynamic technical journey, Binoy has cultivated a remarkable expertise in web development, with a strong focus on Django, and Machine Learning (ML) and Artificial Intelligence (AI), leveraging these skills to tackle intricate challenges and drive progress. With a wealth of experience spanning over five years, he has demonstrated proficiency in developing robust web applications using Django framework, implementing cutting-edge ML/AI algorithms, and architecting intelligent solutions across diverse domains. His adeptness extends to other web development frameworks and technologies, such as Flask and FastAPI. Additionally, Binoy possesses a deep understanding of ML/AI concepts and techniques, including libraries like PyTorch, TensorFlow, and Pandas. Renowned for his technical problem-solving prowess and innovative mindset, Binoy continues to make significant contributions to both the web development and ML/AI communities.
Software Engineer II
Abnormal SecuritySenior Software Engineer
Urban PiperSenior Software Engineer
Crest DataGIC Associate
Intern (Python developer)
BoTree TechnologiesPython Developer
BoTree Technologies
Django

Django REST framework

Python
.png)
FastAPI
.png)
Flask
.png)
Docker
Jira

GitHub

GitLab
.jpg)
Terrafrom

Postman

PostgreSQL

AWS Cloud
AWS (Amazon Web Services)

AWS CloudWatch

AWS Secrets Manager

Amazon CloudFront

GCP Services

Kibana

Confluence

Bamboo

Kubernetes

GoLang

MongoDB

MySQL
REST API

AWS Lambda

Apache Kafka

RabbitMQ

Redis Stack

Atlassian

Git

Google Cloud Platform

AWS
.jpg)
Grafana

Kibana

Kafka

Pandas

Apache Airflow

PyTorch

Tensorflow
I'm a grammar editor with a background in English language and linguistics. I have experience in editing various types of texts, including academic papers, articles, and interview transcripts. As a senior software developer with 5.5 years of experience, you have worked in multiple domains such as education, e-commerce, cyber security, Fortech, and email security products. You have expertise in various technologies including Python, Django framework, FastAPI, Flask, Go, Java, Node.js, React, and have experience working with AWS services such as CloudFront, EC2, EKS, and Kubernetes, Docker Hub, GCP services, Secret Manager, and multiple databases including Postgres, MySQL, MongoDB, and queue-based mechanisms with Celery, RabbitMQ, Redis, and RabbitMQ. Additionally, you have experience in leading teams, maintaining team members, developing product architecture, and building product design.
So, in Python, we usually set a rate limit via the rate limit function like with the attribute called limit and we set it like around per minute or we say that, okay, we're going to have a request this many per minute or so. And backup procedures is when we say that, okay, we have set up a rate limit, we can say that requests exceeded, we can handle this with the proper response saying that when we have reached the limitations, we're going to send the proper response getting the message so that the user can have an idea about the same. Also, the other question is how we handle when we are integrating with the third-party REST API. So when we handle a REST API, usually we don't have control over the like basically how many requests we have to make and what requests we will be making. So ideally, the ideal situation should be that we should establish some structure or architecture where we can set up a rate limit saying that if the API is having 100 requests per minute rate limit, we should set it at 90 requests per minute and we should set a trigger which is going to trigger and say that, okay, we are reaching the threshold and we have to call the API accordingly.
So, when we Terraform, basically Terraform works on config files. The config files are responsible for defining the state of the application. The Terraform config file defines how the application will have its infrastructure. For example, we set it up for EC2, S3 instances, and so on. The Terraform config files are created as.tf files. Ideally, we can keep it as a S3 bucket, and use the bucket URL to call in our environment. In the environment, we set up and refer to the Terraform config file. Ideally, we say it in a way that for multiple environments, we copy it and have it in a common place. For local development, we can use that locally, avoiding committing changes while pushing features and tasks to environments. We can set it up as an environment file or as an S3 bucket with restricted access, accessible only via the S3 URL. This can avoid multiple conflicts between environments. The S3 one is a good way because it's common between all of them, and whoever wants to access it can exit locally with restricted permissions or in environments via access.
So the question is, how will we build a Python application and implement unit tests to ensure API integration points are reliable. When we say API integrations, we're referring to the integration where the API responses are always crucial. When the API responds, a successful API call will return a status code of 200. If it's a create API, it will return a 201 response status. If there's a problem with the request body, the API will return a 400 bad request status code. If there's an authorization or authentication issue, it will return a 401 status code. The API returns a response status code, which we need to ensure we've covered all aspects when building an API integration. To ensure what we've built is correct, we should write a unit test where we can mock the request body using the unittest module. We can say that with the unittest.mock module, we can mock the request body, the URL, and set the response of this API call. We can then say that if we're getting a 200, our function or view will return a particular response when we pass a specific kind of body. In that unit test, we can prepare some input body, pass it to the call, execute the request, and compare it with the expected output, such as the expected response status code, the response body, and assert that the expected output is equal to the response data and status code we're getting. We can compare this for different status codes to ensure our integration is reliable.
So, when we are dealing with high concurrency operations, ACID, basically, ACID property is a kind of property where we say that okay, it's atomicity, consistency, and durability. So, each of them is responsible for its property. Atomicity says that let's say for an example, we have a bank transfer, we have a transaction table for bank-related operations. So, in a transaction table, we are storing two savings accounts, A and B. If the money is deducted from savings account A and our process interrupts in the middle of that, and the money is not transferred to savings account B, then there is a loss of data or we can say that inconsistency of data. Hence, we say that atomicity should be established, which states that the transaction should be either successful or a failure, and there should be a rollback. That's what means by all or nothing. The transaction should either complete all of that or it should not complete partially. That is like atomicity, consistency, and durability. The transaction should be durable, so it should not be like we are receiving a high load and the transaction gets disrupted in the middle of that. To ensure that, we should use the with block, which is a context manager. We should use the context manager to ensure that our end with the transaction.atomic. In Python, it's supposed that with the transaction.atomic block, we can create our database statements, whatever we want to execute, either create a table or so. We can ensure that by this, what will be assured is that if in that block any of the failure is occurring, the rollback will happen to the first statement, and any of the executed record statements will be rolled back. This is what we can have on a relational database. We can ensure that the ACID property is maintained via establishing a log. This will not hamper it; this will state atomicity, consistency, and durability.
So to setting up an automated CI-CD pipeline for a Python application, there is an AWS service or itself that provides a deployment pipeline, where we can set up a CI-CD pipeline. When we say that the CI-CD pipeline, or continuous integration and continuous deployment, it means that whenever we have an application that is versioned, or we can say that feature development and feature addition tasks are happening. In those cases, when the tasks are deployed, or we are ready to go on production, instead of doing a manual effort, like deploying it to production, then restarting the ECS services or the instances that we have deployed for running the application, and checking and monitoring everything, the CI-CD pipeline comes into picture, which states that this is going to be autonomous, this is going to be continuous integration and deployment. We can set up actions, for example, we have a GitHub repo, and we say that we can establish some actions, which is going to trigger the AWS deployment pipeline. How it works is, in the GitHub actions, we can set up several steps, several checks to ensure that when we are merging the code to the production branch, it is passing all the checks. The checks can be, for example, a Python linter, unit test cases passing or not, versionings are proper or not, codes are working fine. So all those checks we can establish, based upon that, if all the checks pass, then we can trigger the AWS code pipeline, which will deploy first on the master, then automatically deploy that image to the slave instances, and that's how a CI-CD pipeline can be set up for the Python application, which will maintain the automated CI-CD for our application.
So either say that examine the Python code for automating finance tasks. Is there a logical error related to the handling of transaction dates? How do you debug or fix it? So here I can see that there's an import of datetime. Okay. And if there's a process_transaction function which accepts a transaction, probably it could be an object. I can see that it's a dictionary. So we're getting a datetime.date.today. And if we're comparing it with today greater than transaction of a date, so status is equal to processed and further processing steps. So here I can see that there's a problem because why it's a problem if a transaction isn't kind of we're accepting a date first of all. So we need to convert to the date object because here what we're seeing is datetime.date.today is actually getting the date in a format and we're comparing with the greater than. So it's not an ideal way to do that because a datetime object should be compared with a datetime object. So the comparison in the if statement itself is wrong, which is like comparing a string, which is greater than or not. It's not an ideal way to do that. The ideal way should be like both the ones should be in the datetime format or datetime object of Python. And we can find the difference between them. So that will ensure itself in the inbuilt functionality that it will cover it up. But doing this, it will not go under the process until and unless the data are really similar. For example, the transaction date is the same and it's going to compare the string comparison. So string comparison is not an ideal situation for this. Here instead of datetime transaction or datetime object should be taken for the today and from the transaction data, we should take a date and convert it into a datetime object and then we should compare it because here the formats can also differ because we don't know like in the transaction of the date, what kind of format we are going to receive and in the today what is going to be a format. So comparing both the different formats is not an ideal way. Ideally, we should be in a single format. So we should take today as a datetime object and compare it with the transaction date and transaction date should be in a particular format of a datetime object which today can be compared on.
In the Python code snippet for Elasticsearch data matching, please explain the usage of the should clause in the query and how it affects the result search. OK, so from Elasticsearch, import Elasticsearch, we have got an object of Elasticsearch where we have built a query. It's a Boolean. Then should match title data, description automation, minimum should match is one. OK, response is the Elasticsearch.search under the index documents body is equals to query. So here the query is basically having two matches. Like ideally, there are two match placed over here saying that we have to match two things. One is the title should be data, one is the description should be automation. But the should clause here is saying that this should be the match when we are getting the data. From the data, when we are getting the data, we should filter it out using this both match. We have to get the title column and the description column, and we have to match the title. We have to match the data accordingly. But here the minimum should match is one. So it states that from both of them. For an example, in a record, description is not automation. For an example, if a description is manual, and the title is data, then it is a passing and we should include that record. But for an example, if title is not data and description is manual, we should not include that because both of them are not matching. Here the clause is we have to ensure that one of them is match, either title or either description.
So, what approach I would be taking to build a resilient data pipeline that integrates multiple third-party data sources into a single database schema. When dealing with multiple third-party data sources, there is a concept called ETL, which stands for extract, transform, and load. This concept says that when we have data from multiple sources and a database schema in our database, we have a single schema. To achieve that, the first step would be to extract data from multiple third-party data sources. Let's take an example. We have data from APC, with three third-party integrations: integration A, integration B, and integration C, all with different schemas. We have to build an extraction layer that will fetch data from all the sources. Then we have a transformation layer. This transformation layer is responsible for converting the extracted data to a single database schema that we have. We will have a database schema with a structure of schema. The transformation layer will convert the received source data into the target data schema that we have on our side. The load layer will load the data into the database schema. Let's take an example. My data is coming from three sources: a GCP bucket, an S3 bucket, and a MongoDB. All are different databases, with one being a relational database, and the other a non-relational document DB. Based on that, the transformation layer will receive the data, convert it, and we have prepared a JSON-based mapping that will map the received source data into a single database schema. When the transformation is happening, it will happen for all the sources of data: integration A, integration B, and integration C. We will convert it and push it to the database. This can be a streamlined process, or we can have a service queue mechanism that receives the data, another queue calls the task for processing the data, which is transformation, and the third queue loads the data. By this way, we can establish a process to handle multiple data sources in the database schema.
Okay, so how would you automate data extraction from invoices using NLP in Python while ensuring high accuracy in diverse formats? The main thing can be achieved via the Q mechanism. In Q, we can have pagination-based extraction. When getting the data, we should ensure we're getting it from data sources and using the Q mechanism with batches. Batches is the word that I'll explain. A batch can be made for getting accurate data into diverse formats. If we're dealing with batches, we'll ensure we're not pulling all data together, but pulling it data batch by batch with the usage of offset. The usage of offset and batch can be a solution to establish high accuracy. Because in this case, we won't be taking time in a single request to get the data. We'll be making multiple requests, like with batch requests, and getting batch request data, ensuring we're getting diverse formats. This is basically about the ETL extraction, transformation layer process, which is going to convert the data into a required particular format and diverse formats.
So the method to diagnose IAM becomes too slow queries in Python application. So this is a common issue when we are using AWS RDS for Postgres. In Python, the queries become slow. So why the queries become slow is that sometimes we are using some framework which is responsible for object relation mapping, or ORM queries. So when we say that ORM queries, are high-level queries which are converted to SQL queries in the internal process. And the ORM queries apparently look a bit simpler but result in a very complex SQL query. So when we say that joins are placed when we try to join the data. So in those subqueries, in those kind of queries, we say that okay, when we have filters will be taking data from multiple tables either or we will be applying many clauses over that. We might be grouping also. So those things occur in a query execution. So that sometimes a query takes a bit of long time when we have huge data. For an example, if a table is having a million records and we are processing the queries on that. So for an example, if we have employee data like a million employee data is there, or transaction data, okay and we want to filter it with the customer name over the transaction data, okay and we have millions of transactions and we want to filter the customer name. So if a normal query is executed without the index being used on a customer name, what will happen is that it is going to search each and every record until it finds its record. And in the worst case, the query is going to take a bit of long amount of time. So the first and foremost thing that can be used is indexing. So we can establish an index on the customer name table, which will ensure that the tree structure is performed for the transaction table, which will ensure that the data is picked up very fastly. So this will improve the performance of the query being executed on the database. The other thing that could be done is we can establish monitoring and monitoring on the RDS instance. So in the RDS system, when we establish monitoring and CloudWatch logs, what is going to happen is that we can establish a matrix. So that matrix and the alerts will show that okay, which query is taking how much long time, and based on that, we will be able to know, okay, this query is taking a bit long of time. We can improve this query. We can optimize the query by either using indexing or converting the query into separate queries and executing it differently. So those can be multiple options, but yeah, the trigger is, the most important point is to monitor the RDS instance being called by the queries. What are the queries being executed on the RDS instance for Postgres. We should keep a monitoring on that, and we should track that, okay, each query is taking how much time, and based on that, we can establish like, for an example, if more than 10 seconds are there, then we should raise an alert, and accordingly, we can say that it will result in improved or slow queries.