profile-pic
Vetted Talent

Binoy Oza

Vetted Talent

In his dynamic technical journey, Binoy has cultivated a remarkable expertise in web development, with a strong focus on Django, and Machine Learning (ML) and Artificial Intelligence (AI), leveraging these skills to tackle intricate challenges and drive progress. With a wealth of experience spanning over five years, he has demonstrated proficiency in developing robust web applications using Django framework, implementing cutting-edge ML/AI algorithms, and architecting intelligent solutions across diverse domains. His adeptness extends to other web development frameworks and technologies, such as Flask and FastAPI. Additionally, Binoy possesses a deep understanding of ML/AI concepts and techniques, including libraries like PyTorch, TensorFlow, and Pandas. Renowned for his technical problem-solving prowess and innovative mindset, Binoy continues to make significant contributions to both the web development and ML/AI communities.

  • Role

    Software Engineer II

  • Years of Experience

    8.42 years

  • Professional Portfolio

    View here

Skillsets

  • Redis
  • Rest APIs - 5.5 Years
  • Git
  • REST
  • Algorithm analysis
  • AWS ECS
  • CLI
  • Docker
  • GCP services
  • Java
  • ML/AI
  • Node Js
  • NO SQL
  • Postgre SQL
  • Protobuf
  • Kafka - 2 Years
  • Spring Framework
  • SQL
  • Tastypie
  • Python
  • Django
  • Jira
  • FastAPI
  • Google Cloud
  • AWS
  • Node.js
  • Data Structures
  • Golang
  • PostgreSQL
  • PyTorch - 1 Years
  • Python - 5.5 Years
  • Django - 5.5 Years
  • JavaScript - 3 Years
  • Docker - 3 Years
  • AWS - 5 Years
  • Celery - 4 Years
  • FastAPI - 3 Years
  • Lambda - 2 Years
  • GCP services - 3 Years
  • Postgre SQL - 4 Years
  • MySQL - 4 Years
  • REST API - 5 Years
  • Fast API - 3 Years
  • NLP - 1 Years
  • Backend - 5.5 Years
  • Kubernetes - 2 Years
  • Mongo DB - 2 Years
  • API - 5 Years
  • Relational Database - 5 Years
  • Flask - 4 Years
  • Terraform - 2 Years
  • GCP services - 3 Years
  • GraphQL - 3 Years
  • Django Rest Framework - 5 Years
  • Jira - 5 Years
  • Postman - 4 Years
  • Go Lang - 1 Years
  • Mongo DB - 2 Years

Vetted For

11Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Backend Developer - 3rd party APIs Integration (Python) - REMOTEAI Screening
  • 80%
    icon-arrow-down
  • Skills assessed :3rd party API integrations, API, automations, Docker/Terraform, Integration Testing, Relational Database, AWS, Python, Celery, Postgres, Django
  • Score: 72/90

Professional Summary

8.42Years
  • Aug, 2024 - Present1 yr 9 months

    Software Engineer II

    Abnormal Security
  • Dec, 2022 - Aug, 20241 yr 8 months

    Senior Software Engineer

    Urban Piper
  • Dec, 2020 - Dec, 20222 yr

    Senior Software Engineer

    Crest Data
  • Aug, 2017 - Nov, 20181 yr 3 months

    GIC Associate

  • Feb, 2019 - May, 2019 3 months

    Intern (Python developer)

    BoTree Technologies
  • May, 2019 - Dec, 20201 yr 7 months

    Python Developer

    BoTree Technologies

Applications & Tools Known

  • icon-tool

    Django

  • icon-tool

    Django REST framework

  • icon-tool

    Python

  • icon-tool

    FastAPI

  • icon-tool

    Flask

  • icon-tool

    Docker

  • icon-tool

    Jira

  • icon-tool

    GitHub

  • icon-tool

    GitLab

  • icon-tool

    Terrafrom

  • icon-tool

    Postman

  • icon-tool

    PostgreSQL

  • icon-tool

    AWS Cloud

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    AWS CloudWatch

  • icon-tool

    AWS Secrets Manager

  • icon-tool

    Amazon CloudFront

  • icon-tool

    GCP Services

  • icon-tool

    Kibana

  • icon-tool

    Confluence

  • icon-tool

    Bamboo

  • icon-tool

    Kubernetes

  • icon-tool

    GoLang

  • icon-tool

    MongoDB

  • icon-tool

    MySQL

  • icon-tool

    REST API

  • icon-tool

    AWS Lambda

  • icon-tool

    Apache Kafka

  • icon-tool

    RabbitMQ

  • icon-tool

    Redis Stack

  • icon-tool

    Atlassian

  • icon-tool

    Git

  • icon-tool

    Google Cloud Platform

  • icon-tool

    AWS

  • icon-tool

    Grafana

  • icon-tool

    Kibana

  • icon-tool

    Kafka

  • icon-tool

    Pandas

  • icon-tool

    Apache Airflow

  • icon-tool

    PyTorch

  • icon-tool

    Tensorflow

Work History

8.42Years

Software Engineer II

Abnormal Security
Aug, 2024 - Present1 yr 9 months
    Analyzed and defined the migration strategy to transition infrastructure from Terraform-based to a Kubernetes (StratOS) architecture. Delivered a comprehensive strategy detailing the pros and cons of migrating Redis cache, DB tables, and Kafka queues, optimizing scalability and reducing maintenance overhead. Built a GoLang API for bulk record creation, replacing a previously manual process where 1,000 records required 1,000 individual requests. This optimization reduced API calls by 99.9%, enhancing system performance and reducing manual effort. Actively participated in on-call rotations, resolving critical production issues within SLAs, minimizing downtime, and ensuring uninterrupted service.

Senior Software Engineer

Urban Piper
Dec, 2022 - Aug, 20241 yr 8 months
    Led a team to architect and implement a dedicated integration service, transitioning from monolithic to microservices architecture. Established a robust ETL process to unify schema structures across diverse integrations, enabling seamless data processing. Managed schema design for new integrations, standardizing inconsistent data structures and improving data ingestion efficiency by 40%. Achieved 10,000 QPS throughput by leveraging Kafka for asynchronous communication and Redis for caching frequently accessed data, ensuring scalability and low-latency processing. Introduced webhooks for real-time order status updates from integrations, reducing polling overhead by 60%. Webhook requests were processed via AWS Lambda, and tasks were queued in a broker for execution, improving update latency by 30%. Designed and implemented a circuit breaker mechanism for order services, reducing downtime caused by downstream outages. Enhanced resilience by: Automatically rerouting requests to a dead-end queue after 30 failed attempts. Implementing CloudWatch and Slack-integrated alerts for outages exceeding 30 minutes, ensuring proactive resolution. Reduced queue overflow incidents by 40% and improved operational stability. Optimized database performance by creating required indexing columns, refactoring queries, and implementing pagination, improving response time for large datasets by 35%. Introduced monitoring alerts for storage thresholds to proactively address data growth. Reduced AWS infrastructure costs by 20% through clustering ECS services and optimizing resource utilization. Resolved critical on-call production issues, achieving 99.9% uptime and ensuring uninterrupted revenue generation. Conceptualized, planned, and executed feature implementations, collaborating with product, QA, and operations teams to deliver high-impact functionality.

Senior Software Engineer

Crest Data
Dec, 2020 - Dec, 20222 yr
    Led a team to develop the Google Chronicle CLI, including a parser that accepts various parameters and generates YAML files, which are processed by backend services to create logs. Streamlined operations with Feed, Parser, Forwarder, and BigQuery APIs, reducing log analysis time by 20%. The creation of CLI tool automated the generation of phishing/malware attack scenarios for sales demos, reducing manual effort by 40% and improving demo preparation time by 30%. Optimized log query performance with BigQuery and pagination, cutting response latency by 35% and enhancing portal responsiveness for large datasets.

Python Developer

BoTree Technologies
May, 2019 - Dec, 20201 yr 7 months

Intern (Python developer)

BoTree Technologies
Feb, 2019 - May, 2019 3 months
    Developed and deployed scalable web applications by gathering client requirements, adhering to technical standards, and delivering demos. Key Projects & Achievements: Learning Platform: Designed and implemented a platform offering 50+ courses to 5,000+ active users/month. Optimized AWS S3 with 70% edge caching via CloudFront, reducing data transfer costs from $900/month to $595/month, saving over $300/month (34% reduction). E-Commerce Website: Developed an online shopping platform for 10,000+ users, enhancing scalability by 20% with Celery task queues and increasing customer retention by 15% through Intercom integration. Boosted transaction success rates by 10% with secure payment gateway implementation.

GIC Associate

Aug, 2017 - Nov, 20181 yr 3 months

Achievements

  • Appreciated by CTO for the project work.
  • CAP Award by Crest Data Systems
  • Appreciation August 2021 Crest Data System For exceptional performance, I was honored with the CAP Award by Crest Data Systems.

Major Projects

5Projects

Google Chronicle CLI

Dec, 2020 - Dec, 20222 yr
    • Chronicle CLI allows customers to manage various operations that can be performed on Chronicle. This script provides a command line tool to interact with Feed, Parser, Forwarder and BigQuery APIs.
    • Worked as an contract based developer for Google in this project representing Crest company.
    • Tools: Python, GCP Services, Celery(Python), Docker for deployment, Protobuf.

TimeSketch Integration

May, 2022 - Dec, 2022 7 months
    • Integrated TimeSketch having the functionality to perform graph generation, aggregation and analysis operations on the logs uploaded to it. Build a middleware tool using cloud functions which can fetch the data from BigQuery database and submit it to the TimeSketch, also it maintains the credentials as secret using GCP services. I worked as Project lead.
    • Tools: Python, GCP services (Cloud Functions, Secret Manager, Bucket), Terraform(For deployment).

Mobile and web app for School Learning

Sep, 2020 - Dec, 2020 3 months
    • It is a project which provides some educational courses for the student into some languages where students can learn through various standards and materials available online. I worked on the back end of this project i.e. managing the APIs and handling the web application.
    • Tools: Django Rest Framework, Postgres, AWS Services for deployment(CloudFront, RDS, Load balancer, EC2 Instances)

Mobile app for teaching courses

Jan, 2020 - Sep, 2020 8 months
    • It is a project which provides some educational courses for the student into some languages with Mentor mentoring the Mentees with some predefined process. I worked on the back end of this project i.e. managing the APIs.
    • Tools: Django Rest Framework, Postgres RDS database (AWS), Cron jobs

E-Commerce Online shopping website

Feb, 2019 - Jan, 2020 11 months
    • It is a project for developing the complete E-Commerce website of online product selling. It has product, customer, order, subscription management, payment integration along with User Relationship Management via Intercom integration. It is fully built in Django Framework.
    • Tools: Django Framework, Postgres, Oscar Framework(E-Commerce library), Celery, AWS deployment

Education

  • B.E. (Computer)

    Government Engineering College, Gandhinagar (2019)
  • Bachelor of Engineers Degree, Faculty of Computer Engineering

    Gujarat Technological University (2019)

Interests

  • Reading
  • Badminton
  • Technology Research
  • Learning
  • Internet Surfing
  • AI-interview Questions & Answers

    I'm a grammar editor with a background in English language and linguistics. I have experience in editing various types of texts, including academic papers, articles, and interview transcripts. As a senior software developer with 5.5 years of experience, you have worked in multiple domains such as education, e-commerce, cyber security, Fortech, and email security products. You have expertise in various technologies including Python, Django framework, FastAPI, Flask, Go, Java, Node.js, React, and have experience working with AWS services such as CloudFront, EC2, EKS, and Kubernetes, Docker Hub, GCP services, Secret Manager, and multiple databases including Postgres, MySQL, MongoDB, and queue-based mechanisms with Celery, RabbitMQ, Redis, and RabbitMQ. Additionally, you have experience in leading teams, maintaining team members, developing product architecture, and building product design.

    So, in Python, we usually set a rate limit via the rate limit function like with the attribute called limit and we set it like around per minute or we say that, okay, we're going to have a request this many per minute or so. And backup procedures is when we say that, okay, we have set up a rate limit, we can say that requests exceeded, we can handle this with the proper response saying that when we have reached the limitations, we're going to send the proper response getting the message so that the user can have an idea about the same. Also, the other question is how we handle when we are integrating with the third-party REST API. So when we handle a REST API, usually we don't have control over the like basically how many requests we have to make and what requests we will be making. So ideally, the ideal situation should be that we should establish some structure or architecture where we can set up a rate limit saying that if the API is having 100 requests per minute rate limit, we should set it at 90 requests per minute and we should set a trigger which is going to trigger and say that, okay, we are reaching the threshold and we have to call the API accordingly.

    So, when we Terraform, basically Terraform works on config files. The config files are responsible for defining the state of the application. The Terraform config file defines how the application will have its infrastructure. For example, we set it up for EC2, S3 instances, and so on. The Terraform config files are created as.tf files. Ideally, we can keep it as a S3 bucket, and use the bucket URL to call in our environment. In the environment, we set up and refer to the Terraform config file. Ideally, we say it in a way that for multiple environments, we copy it and have it in a common place. For local development, we can use that locally, avoiding committing changes while pushing features and tasks to environments. We can set it up as an environment file or as an S3 bucket with restricted access, accessible only via the S3 URL. This can avoid multiple conflicts between environments. The S3 one is a good way because it's common between all of them, and whoever wants to access it can exit locally with restricted permissions or in environments via access.

    So the question is, how will we build a Python application and implement unit tests to ensure API integration points are reliable. When we say API integrations, we're referring to the integration where the API responses are always crucial. When the API responds, a successful API call will return a status code of 200. If it's a create API, it will return a 201 response status. If there's a problem with the request body, the API will return a 400 bad request status code. If there's an authorization or authentication issue, it will return a 401 status code. The API returns a response status code, which we need to ensure we've covered all aspects when building an API integration. To ensure what we've built is correct, we should write a unit test where we can mock the request body using the unittest module. We can say that with the unittest.mock module, we can mock the request body, the URL, and set the response of this API call. We can then say that if we're getting a 200, our function or view will return a particular response when we pass a specific kind of body. In that unit test, we can prepare some input body, pass it to the call, execute the request, and compare it with the expected output, such as the expected response status code, the response body, and assert that the expected output is equal to the response data and status code we're getting. We can compare this for different status codes to ensure our integration is reliable.

    So, when we are dealing with high concurrency operations, ACID, basically, ACID property is a kind of property where we say that okay, it's atomicity, consistency, and durability. So, each of them is responsible for its property. Atomicity says that let's say for an example, we have a bank transfer, we have a transaction table for bank-related operations. So, in a transaction table, we are storing two savings accounts, A and B. If the money is deducted from savings account A and our process interrupts in the middle of that, and the money is not transferred to savings account B, then there is a loss of data or we can say that inconsistency of data. Hence, we say that atomicity should be established, which states that the transaction should be either successful or a failure, and there should be a rollback. That's what means by all or nothing. The transaction should either complete all of that or it should not complete partially. That is like atomicity, consistency, and durability. The transaction should be durable, so it should not be like we are receiving a high load and the transaction gets disrupted in the middle of that. To ensure that, we should use the with block, which is a context manager. We should use the context manager to ensure that our end with the transaction.atomic. In Python, it's supposed that with the transaction.atomic block, we can create our database statements, whatever we want to execute, either create a table or so. We can ensure that by this, what will be assured is that if in that block any of the failure is occurring, the rollback will happen to the first statement, and any of the executed record statements will be rolled back. This is what we can have on a relational database. We can ensure that the ACID property is maintained via establishing a log. This will not hamper it; this will state atomicity, consistency, and durability.

    So to setting up an automated CI-CD pipeline for a Python application, there is an AWS service or itself that provides a deployment pipeline, where we can set up a CI-CD pipeline. When we say that the CI-CD pipeline, or continuous integration and continuous deployment, it means that whenever we have an application that is versioned, or we can say that feature development and feature addition tasks are happening. In those cases, when the tasks are deployed, or we are ready to go on production, instead of doing a manual effort, like deploying it to production, then restarting the ECS services or the instances that we have deployed for running the application, and checking and monitoring everything, the CI-CD pipeline comes into picture, which states that this is going to be autonomous, this is going to be continuous integration and deployment. We can set up actions, for example, we have a GitHub repo, and we say that we can establish some actions, which is going to trigger the AWS deployment pipeline. How it works is, in the GitHub actions, we can set up several steps, several checks to ensure that when we are merging the code to the production branch, it is passing all the checks. The checks can be, for example, a Python linter, unit test cases passing or not, versionings are proper or not, codes are working fine. So all those checks we can establish, based upon that, if all the checks pass, then we can trigger the AWS code pipeline, which will deploy first on the master, then automatically deploy that image to the slave instances, and that's how a CI-CD pipeline can be set up for the Python application, which will maintain the automated CI-CD for our application.

    So either say that examine the Python code for automating finance tasks. Is there a logical error related to the handling of transaction dates? How do you debug or fix it? So here I can see that there's an import of datetime. Okay. And if there's a process_transaction function which accepts a transaction, probably it could be an object. I can see that it's a dictionary. So we're getting a datetime.date.today. And if we're comparing it with today greater than transaction of a date, so status is equal to processed and further processing steps. So here I can see that there's a problem because why it's a problem if a transaction isn't kind of we're accepting a date first of all. So we need to convert to the date object because here what we're seeing is datetime.date.today is actually getting the date in a format and we're comparing with the greater than. So it's not an ideal way to do that because a datetime object should be compared with a datetime object. So the comparison in the if statement itself is wrong, which is like comparing a string, which is greater than or not. It's not an ideal way to do that. The ideal way should be like both the ones should be in the datetime format or datetime object of Python. And we can find the difference between them. So that will ensure itself in the inbuilt functionality that it will cover it up. But doing this, it will not go under the process until and unless the data are really similar. For example, the transaction date is the same and it's going to compare the string comparison. So string comparison is not an ideal situation for this. Here instead of datetime transaction or datetime object should be taken for the today and from the transaction data, we should take a date and convert it into a datetime object and then we should compare it because here the formats can also differ because we don't know like in the transaction of the date, what kind of format we are going to receive and in the today what is going to be a format. So comparing both the different formats is not an ideal way. Ideally, we should be in a single format. So we should take today as a datetime object and compare it with the transaction date and transaction date should be in a particular format of a datetime object which today can be compared on.

    In the Python code snippet for Elasticsearch data matching, please explain the usage of the should clause in the query and how it affects the result search. OK, so from Elasticsearch, import Elasticsearch, we have got an object of Elasticsearch where we have built a query. It's a Boolean. Then should match title data, description automation, minimum should match is one. OK, response is the Elasticsearch.search under the index documents body is equals to query. So here the query is basically having two matches. Like ideally, there are two match placed over here saying that we have to match two things. One is the title should be data, one is the description should be automation. But the should clause here is saying that this should be the match when we are getting the data. From the data, when we are getting the data, we should filter it out using this both match. We have to get the title column and the description column, and we have to match the title. We have to match the data accordingly. But here the minimum should match is one. So it states that from both of them. For an example, in a record, description is not automation. For an example, if a description is manual, and the title is data, then it is a passing and we should include that record. But for an example, if title is not data and description is manual, we should not include that because both of them are not matching. Here the clause is we have to ensure that one of them is match, either title or either description.

    So, what approach I would be taking to build a resilient data pipeline that integrates multiple third-party data sources into a single database schema. When dealing with multiple third-party data sources, there is a concept called ETL, which stands for extract, transform, and load. This concept says that when we have data from multiple sources and a database schema in our database, we have a single schema. To achieve that, the first step would be to extract data from multiple third-party data sources. Let's take an example. We have data from APC, with three third-party integrations: integration A, integration B, and integration C, all with different schemas. We have to build an extraction layer that will fetch data from all the sources. Then we have a transformation layer. This transformation layer is responsible for converting the extracted data to a single database schema that we have. We will have a database schema with a structure of schema. The transformation layer will convert the received source data into the target data schema that we have on our side. The load layer will load the data into the database schema. Let's take an example. My data is coming from three sources: a GCP bucket, an S3 bucket, and a MongoDB. All are different databases, with one being a relational database, and the other a non-relational document DB. Based on that, the transformation layer will receive the data, convert it, and we have prepared a JSON-based mapping that will map the received source data into a single database schema. When the transformation is happening, it will happen for all the sources of data: integration A, integration B, and integration C. We will convert it and push it to the database. This can be a streamlined process, or we can have a service queue mechanism that receives the data, another queue calls the task for processing the data, which is transformation, and the third queue loads the data. By this way, we can establish a process to handle multiple data sources in the database schema.

    Okay, so how would you automate data extraction from invoices using NLP in Python while ensuring high accuracy in diverse formats? The main thing can be achieved via the Q mechanism. In Q, we can have pagination-based extraction. When getting the data, we should ensure we're getting it from data sources and using the Q mechanism with batches. Batches is the word that I'll explain. A batch can be made for getting accurate data into diverse formats. If we're dealing with batches, we'll ensure we're not pulling all data together, but pulling it data batch by batch with the usage of offset. The usage of offset and batch can be a solution to establish high accuracy. Because in this case, we won't be taking time in a single request to get the data. We'll be making multiple requests, like with batch requests, and getting batch request data, ensuring we're getting diverse formats. This is basically about the ETL extraction, transformation layer process, which is going to convert the data into a required particular format and diverse formats.

    So the method to diagnose IAM becomes too slow queries in Python application. So this is a common issue when we are using AWS RDS for Postgres. In Python, the queries become slow. So why the queries become slow is that sometimes we are using some framework which is responsible for object relation mapping, or ORM queries. So when we say that ORM queries, are high-level queries which are converted to SQL queries in the internal process. And the ORM queries apparently look a bit simpler but result in a very complex SQL query. So when we say that joins are placed when we try to join the data. So in those subqueries, in those kind of queries, we say that okay, when we have filters will be taking data from multiple tables either or we will be applying many clauses over that. We might be grouping also. So those things occur in a query execution. So that sometimes a query takes a bit of long time when we have huge data. For an example, if a table is having a million records and we are processing the queries on that. So for an example, if we have employee data like a million employee data is there, or transaction data, okay and we want to filter it with the customer name over the transaction data, okay and we have millions of transactions and we want to filter the customer name. So if a normal query is executed without the index being used on a customer name, what will happen is that it is going to search each and every record until it finds its record. And in the worst case, the query is going to take a bit of long amount of time. So the first and foremost thing that can be used is indexing. So we can establish an index on the customer name table, which will ensure that the tree structure is performed for the transaction table, which will ensure that the data is picked up very fastly. So this will improve the performance of the query being executed on the database. The other thing that could be done is we can establish monitoring and monitoring on the RDS instance. So in the RDS system, when we establish monitoring and CloudWatch logs, what is going to happen is that we can establish a matrix. So that matrix and the alerts will show that okay, which query is taking how much long time, and based on that, we will be able to know, okay, this query is taking a bit long of time. We can improve this query. We can optimize the query by either using indexing or converting the query into separate queries and executing it differently. So those can be multiple options, but yeah, the trigger is, the most important point is to monitor the RDS instance being called by the queries. What are the queries being executed on the RDS instance for Postgres. We should keep a monitoring on that, and we should track that, okay, each query is taking how much time, and based on that, we can establish like, for an example, if more than 10 seconds are there, then we should raise an alert, and accordingly, we can say that it will result in improved or slow queries.