profile-pic
Vetted Talent

Prakash Premkumar

Vetted Talent

I am a seasoned backend web app, search engine, and distributed systems developer with 9+ years of professional software development experience. Proficient in Java, Golang, C, C++, and Python, I specialize in high-level design, low-level design, scalability, data structures, and algorithms. Throughout my career, I've delivered production-grade, high-quality software for backend services used by thousands of users. As a Software Engineer 3, I served as the technical owner for the Single Sign-On feature at Workspan Inc, demonstrating leadership, bug resolution, and code refactoring expertise. My extensive skill set includes cloud technologies, web services, REST APIs, MySQL, Golang, Kubernetes, and more.

  • Role

    Software & Compiler Engineer

  • Years of Experience

    10.9 years

  • Professional Portfolio

    View here

Skillsets

  • Kubernetes - 5 Years
  • Spring Boot
  • Raptor
  • PostgreSQL
  • NoSQL
  • Microservices
  • Golang
  • document databases
  • Database Sharding
  • Cassandra
  • Cadence
  • C++
  • C#
  • Amazon Web Services
  • Java - 4 Years
  • Kafka - 3 Years
  • Redis - 4 Years
  • Python - 1 Years
  • MySQL - 8 Years
  • Microsoft Azure - 2 Years
  • C - 9 Years
  • Docker - 4 Years
  • RDBMS - 10 Years
  • Elasticsearch - 1 Years
  • Google Cloud Platform - 4 Years
  • Google Cloud Platform - 4 Years
  • Java - 10 Years

Vetted For

7Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Software Development Engineer (SDE) - IIAI Screening
  • 62%
    icon-arrow-down
  • Skills assessed :Amazon Web Service (AWS), Micro services, Spring Boot, Good Team Player, Java, Postgre SQL, Problem Solving Attitude
  • Score: 62/100

Professional Summary

10.9Years
  • Dec, 2024 - Aug, 2025 8 months

    Member of Technical Staff 1, Software Engineer

    PayPal
  • Dec, 2023 - Nov, 2024 11 months

    Professional development

  • Dec, 2022 - Nov, 2023 11 months

    Software Engineer 3

    WorkSpan
  • Apr, 2014 - May, 20173 yr 1 month

    Member Technical Staff

    Zoho Corp
  • Jun, 2017 - May, 20191 yr 11 months

    Software Engineer 2

    Freshworks
  • Jul, 2019 - Dec, 20223 yr 5 months

    Software Engineer 2

    Striim

Applications & Tools Known

  • icon-tool

    Google Cloud Platform

  • icon-tool

    Microsoft Azure

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Java

  • icon-tool

    Golang

  • icon-tool

    C

  • icon-tool

    C++

  • icon-tool

    Kubernetes

  • icon-tool

    Bitbucket

  • icon-tool

    MySQL

  • icon-tool

    Apache Cassandra

  • icon-tool

    PostgreSQL

  • icon-tool

    Redis

  • icon-tool

    SQLite

  • icon-tool

    Python

Work History

10.9Years

Member of Technical Staff 1, Software Engineer

PayPal
Dec, 2024 - Aug, 2025 8 months
    Currently working in the Global Solutions Engineering teams, Mexico Rune microwallet sub-team. Built Fulfillment Flow for Add Funds feature from Debit Card to Rune Microwallet, enhancing code fulfillment flow to recognize the new microwallet as a funding instrument and coordinating downstream calls from fulfillment micro-service to downstream services like auth-capture service and settlements. Introduced Type 6 System of Records (SoR) to record the fund transfers in ledger. Worked on a feature to support new BINs (Bank Identification Numbers) from MAESTRO in Mexico, enabling capability to transact with these BINs for Mexico merchants.

Professional development

Dec, 2023 - Nov, 2024 11 months

Software Engineer 3

WorkSpan
Dec, 2022 - Nov, 2023 11 months
    Engineering lead of the SSO (Single Sign on) feature, fixed bugs that required manual intervention, refactored login code flow for readability and efficiency. Worked on Cisco/Zift Integration feature. Mentored juniors in features like SSO, user provisioning, and ad-hoc features. Login flow supports email-id/password and SSO from Google, Azure, and customer-defined SSO servers. Security settings are configurable and stored in GCP's datastore NoSQL database. SSO server credentials stored in identity platform service provided by GCP. SSO flow triggered in the front end using firebase.

Software Engineer 2

Striim
Jul, 2019 - Dec, 20223 yr 5 months
    Developed a managed kafka service from ground up for Change Data Capture, engineering lead of the feature. Developed auto scaling for kafka cluster persistent volumes, credit microservice for customer billing, pdfgen microservice for generating invoices, and various REST APIs for backend including JWT authentication, password resets, user invitation, and admin portal. Managed kafka service developed above Kubernetes in Google Cloud Platform using uber/cadence workflow orchestration engine. Implemented cluster pool algorithm for faster cluster provisioning. Credit microservice handled billing and usage display. Pdfgen microservice created PDFs from HTML/CSS templates for invoices.

Software Engineer 2

Freshworks
Jun, 2017 - May, 20191 yr 11 months
    Developed a horizontally scalable backend system to poll and fetch emails via IMAP Protocol in Java. Designed system to poll high number of mailboxes asynchronously and convert emails to support tickets. Developed microservices including Polling System, Fetching System, Poller Fetcher System, Mailbox Error Notification System, Failed UID Handling System, and Mailbox Configuration System. Implemented AWS SQS based email retrieval and Poller retrieval systems. Developed failure handling systems for email retrieval and connectivity errors. Successfully deployed to production in AWS, used by thousands of customers across Freshdesk, Freshservice, Freshsales, and Freshteam.

Member Technical Staff

Zoho Corp
Apr, 2014 - May, 20173 yr 1 month
    Developed the code generator for a new programming language compiler, converting high level language code to C code. Engineering lead of the feature. Converted constructs like classes, functions, database queries, loops, etc. to C code. Prototyped a NoSQL backend using Cassandra. Worked on code generation phase, converting language classes to C structs and statements to SQL queries. Developed code generator in Golang for functions, transactions, SQL, control flow, assignments, etc. Used reference counting and cycle detection with Boehm's garbage collector.

Achievements

  • ACM ICPC Indian Regional Finalist, 2011 and 2012
  • Aspirations 2020, Programming Contests held by Infosys, 2011 State Final Winner (Tamil Nadu) and National Finalist (4th Position)

Major Projects

2Projects

Data Structures With Spatial and Temporal Locality and Lock Free Concurrency For Build Fast Databases and Caches

    Independent research project on data structures with spatial and temporal locality and lock-free concurrency for building fast databases and caches. Technologies: C, concurrency, databases.

Ahead of Time Search Ranking Algorithm

    Developed an ahead of time search ranking algorithm to rank web search engine results based on the search query. Pre-computed ranking generates results for search queries. Technologies: algorithms, search engines.

Education

  • B.Tech in Computer Science & Engineering

    SASTRA University (2013)

Interests

  • Technology Research
  • Watching Football
  • AI-interview Questions & Answers

    I'm Prakash Lim Kumar. I'm a back-end distributor system, search engine, and web application developer, and I have a B Tech degree from Sastana University, Cantabur. I passed out with a CGPA of 9.26, and I'm currently pursuing an MCA in the same university, Sastana University, Kannauk. I have close to 10 years of experience in building high-scale web applications. I have experience in Java, Golang, C++, Amazon Web Services, the Google platform, and Microsoft Azure. On databases, I've worked on PostgreSQL, MySQL, and Cassandra. On caches, I've worked on Redis. I also have a sense of knowledge in Kubernetes. I'd like to take out some of my major projects. One of my major projects was building a managed Kafka service in a company called Stream, where I previously worked. We used Kubernetes GKE. On top of it, we had to deploy a Kafka cluster that could be spun up very similarly to how we spin up MySQL in Google Cloud Platform or MySQL Cloud SQL containers in Google Cloud Platform or AWS RDS clusters in AWS. So, similar to that, I had to build a managed Kafka service for one of my projects. Another interesting project I did was building an offline search tracking algorithm project. Search is an often very post area of interest. I built an algorithm to rank search results offline based on the query the user is entering. And, even before the query, there's a predefined algorithm that ranks users based on the number of words in the search query and presents the results based on that fact. Another important point I worked on was in Freshworks, where I built a system that uses IMAP protocol to connect to multiple thousands of email servers. These servers have tons of user password combinations along with dynamic server port numbers. I'm sorry, I made a mistake – and port numbers. The system retrieves emails from the servers as soon as it comes to the email ID and converts them into support tickets in Freshdesk. I don't have major projects in Freshdesk because everything relies on customer email tickets, which are converted to support tickets. That's one of the very cool projects I worked on with the platform team of Freshdesk. So, in front of the first project of my first company, Zoho, I worked on a compiler code generator for a compiler for a new language Zoho was developing. This is a quick introduction about myself.

    One thing that we can do is index the query. Indexing the query can help us improve the performance of the indexing the columns that form the back end of the query can help us improve the performance of the query. That is one thing. Then for a heavy traffic page, we can paginate the data. We can paginate the query result. If the result has 1,000 rows, we will not be displaying all 1,000 rows at once per user. We may display the first 10 results, the second 10, the third 10, and so on. Page recognition. Indexing, page recognition. The third is we can get materialized views. Materialized views is a very useful thing in process SQL. So we can have queries with predefined queries that are very frequently used, and we can store them in materialized views. And whenever a new query for the same query comes in, we can just look at the materialized view, which already has the data built in, and return the value to the user. In process SQL, we have to refresh the materialized view. We can have a cron or we can have the last refresh time of the materialized view. And every 5 minutes or something, we can refresh the materialized view based on the specific scenario or the specific use case. So indexing, materialized views, pagination, then caching. Caching is the most prominent technique, where we have to store the results of the application or the results of the query in a cache. And if the record which built the result in the cache hasn't been updated, then we can use the cache instead of going to the database. So we can keep the cache and query in sync by using change detection technique where we read from the transaction logs of the database and propagate it to the cache server using change data capture technique. So another thing is sharding. So obviously in a web application with heavy traffic, we need sharding. If all the users are going to hit the same shard, then the performance of the system is going to suffer. So we have to shard our data into multiple clusters. So that is sharding, and second is replication. So we can have duplicates for the same shard. Let's say we have 5 shards. Each shard will have its own read replica. Let's say 2 or 3 read replicas. So all the queries can go to read replicas, and only the writes will go to the master replicas. So we have discussed techniques like indexing, caching, pagination, sharding, replication, and materialized views. So these are the techniques that I would suggest to optimize a database.

    How can you create a change history of a table which is being returned to by many systems, some of which are not even known? We can use the transaction logs of the system. So if in a set of, like, postal SQL, we have the transaction logs. But in my postgreSQL, we'll have the write ahead log, or in my MySQL, we'll have the bin log. So we can look at the transaction logs, and we can collect all the changes that have happened to the table by deleting the transaction logs of the system. And, we can stream this transaction log into a messaging queue like Kafka. And this Kafka queue, like a messaging queue, like Kafka. And from there, we can consume it to multiple receivers. We can display it in a dashboard, or sync it to another database. So by reading the transaction logs like the write ahead log, bin log, or the MySQL bin log, we can create a change history of a table.

    How will you monitor SQS queue in production? Okay, or any other queue. So various things that we can monitor are in an SQS like you, there will be multiple nodes or multiple machines that will be stored in the data, and we can monitor the health of each node independently. You have to monitor the health of each node independently. Each node will have its own replicas. For example, let's say there are 3 partitions that are handling the data, and each has its own replicas. So if the first node goes down, the request will go to the replica node. So, that case will go to the replica node. So we have to check if there is a master as well as all the replicas are down, then a particular partition is down, and you won't be able to read or write any data to that particular partition. We can use software like Prometheus and Grafana to monitor the SQS. One is a node health check. We have to check if every shard is up. Every shard should be up, and every shard's corresponding replica should be up. If a particular shard and all its replicas are gone, then the system is down. So, we have to monitor the persistent volume or the persistent disk usage. If, because in a queue like SQS, data gets written into the disk. And if the disk is full, then we won't be able to write more data. So, we have to monitor the hard disk for the system as well. If the disk is getting full, we have to automatically increase the disk size of the particular machine in the SQS cluster. This monitoring includes node monitoring, node health check monitoring, Prometheus-based monitoring, then we can always look at the RAM usage. If it's RAM usage is going up by too much, then we have to think how the problem can be solved. We have to think whether we should add more nodes. If we have 5 machines in the SQS cluster, it's not able to handle the load. We have to horizontally scale the system so that it gets distributed to multiple nodes. So, RAM usage, hard disk usage, node health check, and we should always look for logs if there's any bug in the SQS software. We should always look for other logs in the SQS. These are various techniques by which we can monitor an SQS in production.

    Row number in window function, difference between rank and row number in window functions. I think row number gives the ID. Row number is something which is related to the ID. And I think rank is in the order by. Once we do an order by, and what is the position of the row in that order sorted order, that is rank. And row number, I think, is the ID of the row in the table or the number of the row in it.

    I think vacuum is used to compact the compact the B-tree, in the imposters. So, vacuum is used to recover this lost space. So let's say the disk has some fragmentation, there's fragmentation has happened in this because let's say, there were some multiple inserts going on with this, and after, there were multiple deletes. So at that, when the delete happened, some of the space in the disk will be empty, but some of the space they are not usable anymore. So the other end of the B-tree, let's say, there is 50 MB of data, followed by 20 MB of free space, followed by 50 MB of data, followed by 20 MB of free space. And now, let's say let's make it GB. Fifty GB of data, 20 GB of free space, fifty GB of data, 20 GB of free space. Now, the entire size is only 150 GB. Let's say 150 GB. Now this 20 GB free space is in the middle, and it's not it's unusable though. The disk space is unusable, but it's in the middle, so it's not usable anymore. So in order to make it usable, we have to use this vacuum command. Vacuum command will compact the memory usage in the hard disk. So and we get suddenly more space at the end of the disk to write more data to it. So this is the classic problem of fragmentation.

    For me, what is in principle of item? How would you recover this code? Component clients system, he'll check external service client service. So we should client our health check. Help dot add service health client dot get service name. So return health. System health check. There's a class called a system health system health and clients. Personal service client. Okay. Client.healthcheck.help.add service client dot okay. Now what is happening is that, we have a we have a list called clients, and we are trying to type cast it into external service client external service client class. So now we have to first type the type casting is not happening here. So the Liskov substitution principle has been inverted here. So if we change if we change the, in the for loop to list client colon clients, not list clients. List<ExternalServiceClient> clients. Okay. And the thing is that the list doesn't have a type checking. In the 1st line, private final list clients, the list doesn't the list is a generic class, and it doesn't have a type mentioned there. If you mention a type to external service class client, and then if we specify the for loop the way it is, it is supposed to work. And then we do help dot add service time, trying to update service name. Service health service health. Yeah. The first thing is that the list should have a this is a generic class, and it should mention the type of type shared service client. And that is that is the 1st data that I can find. The second thing is also in the constructor parameter also, we have to specify the, specify the, specify the generic type for the, gen specific type for the list. These 2 errors. If these errors are fixed, the code is supposed to work fine. Client's service help services client.checkhelp.add. Okay. Another thing that I can see is that the system health is not a list. We can say help dot add service help. We don't know how the class is defined. The specific classes, the list is not found yet. So it, a list to list would be a good thing. We get a list of clients, and we get we return a list. Each element in the list corresponds to the, corresponds to 1 in input index wise mapped. So this way, we'll be the code will look more readable. Rather than adding it to a single system health class, we can have a list of system health class. That'll be a better approach.

    During this course, I will look and explain why there will be performance issues when it's across different locations and how to optimize. So, it's off from rental without a location, make a location, and other places. This rental plus with parent locations, same location at parent price, good decimal price. The location is okay. The location is like a key factor. The like query could be the recent one thing you said, you have to index both the location query location column and the price column. The location column, we made how to use it in a GIN index, generalized inverted index. And the price column, we can get a regular B-tree index. If you create a B-tree index for the price column and a generalized inverted index for the location column, the search will be fast. The like operator usually uses a regular expression. A generalized inverted index is supposed to take care of the like operator. So, if it doesn't, what could be a replacement for the like operator? I think a generalized inverted index and a B-tree index would optimize the issue. The like operator is not a prefix or a suffix thing. The location is something with a percentage and the percentage is in the beginning and at the end, which means that any string that appears anywhere in the middle of the word should also be selected. That's where the performance bottleneck will happen. It's not a perfect search or a suffix search. If it were a perfect search, we could have used a B-tree index. If it were a suffix search, we could use a suffix tree to fix it. But then it can match anywhere in the middle. So, in that case, a suffix tree would be a good data structure to solve this problem. Because it's in the middle, we can create a compact tray of all suffixes. A suffix-free index for the location column would be suitable. But in both cases, it uses a generalized inverted index, and this is the reason for the slower performance. The like operator and location with a percentage between them is the reason for the slower performance.

    Database that logs in a highly controlled Microsoft environment. Did logs occur when we can think of a resource allocation graph, where 1 thread is looking for a resource, and the 2nd thread is holding a resource, and the 2nd thread is looking for resource, held by the 1st thread. In a resource allocation graph, if you have a cycle, it means that you have a deadlock. So in a highly concurrent database in such a scenario, let's say, a specific microservice is trying to update a specific column. Let's say assume a scenario like this. 1 microservice is trying to update row 1 and row 2. And the microservice is trying to update row 2 and row 1 in the same order. Microsoft will first update row 1 and then row 2. Microservice 1 will first update row 1 and then row 2. Microservice 2 will update row 2 and then row 1. Now these 2 transactions are happening at the same moment. When updating row 1, microservice 1 would have got a lock on row 1. Microservice 2 would have got a lock on row 2. Microservice 1 will next try to update row 2, but it has its lock on row 1. Microservice 2 will try to update row 1, but it has its lock on row 2. This is a scenario for a deadlock. Now when different microservices are trying to update the same table, we can get this problem. But if we can prevent this by separating out the domains of the microservices. If microservice 1 owns tables 1, 2, and 3, microservice 2 should never update those tables. Instead, microservice 2 should make a REST API call to microservice 1 to update the rows corresponding to tables 1, 2, and 3. So clear domain-wise segregation of tables and microservices will be helpful to prevent the problem.

    Metals is configuring the high load. Key metrics is investigating for CPUC, memory leaks. What method of problem-solving approach? Okay. If it's failing at a high load, it could be possible that it's failing. As I've mentioned, there could be a lot of RAM being used, and CPU usage is also high. And if the microservice is stateful, there could be a situation where the disk usage is very high. If it's not stateful and if it's creating resources on the disk, we have to look at the disk usage. After the computation, whether unwanted things are being cleaned up, disk usage could be checked. CPU usage should be checked. And memory leaks - even in garbage collector languages, there's a chance of slight memory leaks. Even those should be checked. And the RAM usage can be checked by using whether the application is using a thread pool. One of the things is that if the application is creating too many threads, each thread has its own stack space in the memory. Since each thread has its own stack space and each thread is alive, all threads will use a fixed amount of memory. If you set a thread pool with a fixed number of threads, then there will only be a fixed number of threads. Irrespective of how many records come to the microservice, the memory will not shoot up. Rather, we should horizontally scale the system by adding new nodes. So there won't be a failure like this. So if there's a heavy failure on a single microservice, we can also use a thread pool to reduce its memory usage, and we can horizontally scale and add more nodes to capture the needs of new vectors that are coming in. So thread pools will reduce the load on the RAM. We can monitor disk usage. We can look at the CPU usage. We can look at the CPU usage, and then we can look at the code. We can obviously look at the domain-specific or scenario-specific application code if the code is efficient or not. If the code is running in order of n or order of function time or order of log n, we can use it as a benchmark. We can investigate the space and time complexities of the code. Also, we can think if there's any memory-related resource leaks. If the code is opening a file and not closing it down, if the code is opening a socket and not closing it down, we have to close the buffers and the files and database connections and the sockets. So resource leaks, memory leaks - these are two more things that we can look at. We can always perform these checks in production. We can deploy the same code in our test environment or the developer environment or the staging environment and simulate high load on the system and see why it is failing. And we can see if any external connections, such as connection failures to the database, connection failures to the caches, connection failures to the SQS queue, or something that is listening to. These are the external connections if any external connection is failing. Is there any external connection that is failing? Are those services failing? Are external services failing or the internal server or the internal microservices failing? So these are some of the things that we can consider.

    Can you use auto scales in AWS to efficiently manage sudden spikes in traffic and compute resources? Can definitely use auto scaling. So Qumentas itself has auto scaling. We can use the AWS Qumentas auto scaling techniques to detect spikes and scale our system to cater to the need of high traffic. Or if it's not working, let's assume it's a location into website. Maybe on a Monday morning at 10 AM, a lot of people may not use it. Everybody rushes to the offices at that time, so the number of factors would be less. We can base our decision on the auto scaling techniques that we have in AWS and Kubernetes to reduce the number of nodes serving the request. We can definitely use autoscaling. But we should always have a max scale and a min scale. So, the min scale is that we should always have 10 nodes. The max scale is that we should not have more than 100 nodes. This number, 10 and 100, is the number we have to find based on the specific use case scenario of the microservice. We should have efficient auto scaling, but we should always have a min number of nodes and a max number of nodes. So it should always be between the min and max nodes.

    A Spring Boot application, we can always use the CBC based caching technique. There are multiple caching techniques, like write cache, write-ahead log, hash, and write behind cache, among others. But before the CDC based caching technique, when you write a record, make an update, delete, or insert into the database, we read the transaction logs of the database and stream it to Kafka. From Kafka, we sync it to a cache. This way, the application doesn't have to worry about populating the cache at all; it only has to worry about reading from the cache. This way, no application will ever write to the cache. We can write to the cache based on if a particular query is doing a complex SQL operation, we can just read it and write it to the cache. But we can always use a change data capture technique to keep the database and cache in sync. Spring Boot can always connect to the cache and read a record if you give an ID; it can read a particular record from the cache if it's present in the cache. If it's not present in the cache, it can go to the database. So, CDC based caching techniques can be employed in Spring Boot to improve the performance of the application.