Vetted Talent

Kislay srivastava

Vetted Talent

I have over 8 years of experience with Python development, I have worked primarily with Django and Flask frameworks to create scalable web applications and deploying them on cloud.

I am confident in my ability to take on complex projects and provide innovative solutions that meet the needs of clients.

Role
Senior Software Engineer
Years of Experience
9 years
Professional Portfolio
View here

Skillsets

NoSQL
Websockets
REST
ReactJs
Microservices
HTML
ETL
Docker
dbt
CSS
channels
BigQuery
Airflow - 4.0 Years
FastAPI
Django
Python
Django
Python
SQL - 5.0 Years
Snowflake
Python - 8.0 Years
Git
Django - 5.0 Years
AWS - 5.0 Years

Vetted For

13Skills

Roles & Skills
Results
Details

Data Engineer || (Remote)AI Screening
76%

Skills assessed :Airflow, Data Governance, machine learning and data science, BigQuery, ETL processes, Hive, Relational DB, Snowflake, Hadoop, Java, Postgre SQL, Python, SQL
Score: 68/90

Professional Summary

9Years

Apr, 2025 - Jun, 2025 2 months
Senior Software Engineer
Mphasis
Mar, 2025 - Apr, 2025 1 month
Technical Lead
NextGenInvent
Nov, 2022 - Sep, 20241 yr 10 months
Senior Software Engineer
Miratech
Jul, 2021 - Mar, 2022 8 months
Senior Software Engineer
L&T Infotech
Apr, 2022 - May, 2022 1 month
Senior Software Engineer
IGT Solutions
May, 2022 - Nov, 2022 6 months
Senior Backend Engineer
Apisero Integration
May, 2021 - Jul, 2021 2 months
Senior Software Engineer
Concentrix
Dec, 2015 - May, 20215 yr 5 months
Senior Software Engineer
Infosys

Applications & Tools Known

Python
Pyspark
AWS (Amazon Web Services)
Apache Airflow
Snowflake
MySQL
Docker
Kubernetes
Django
Flask
Athena
Airflow
Azure Databricks
Tableau
AWS S3
AWS
Azure Blob Storage
Azure Databricks
Tableau
AWS Glue
Tableau
Tensorflow
Pandas
Tableau
AWS RDS
AWS Fargate
Kafka
Jenkins
AWS Elastic Beanstalk
Django Rest Framework
CI/CD
AWS S3
Azure App Service
Databricks
AWS S3
AWS EMR
AWS RDS
Kafka
Azure App Service
JDBC
React
HTML
CSS
Pandas
AWS S3
Javascript

Work History

9Years

Senior Software Engineer

Mphasis

Apr, 2025 - Jun, 2025 2 months

My client is a prominent Bank and the project was based on analysis of organisation data. As the Lead of the data team, I recreated and enhanced the existing data pipelines based on the client inputs. The ETL system was triggered through scheduled Airflow jobs and used GCP Bigquery as the Data Warehouse. The transformation was carried out through SQL models deployed on DBT.

Technical Lead

NextGenInvent

Mar, 2025 - Apr, 2025 1 month

My client is a prominent health lab based startup and was looking to improve its deployed solution . The responsibility was primarily to mentor and motivate junior devs to develop as per the client expectations. The project involved a complete overhaul of the current application including optimizations on the frontend as well as the backend side.

Senior Software Engineer

Miratech

Nov, 2022 - Sep, 20241 yr 10 months

My client is the world's leading Asset manager and the project was based on Financial and markets analysis. As a part of index data team, I was in-charge of enhancing and re-developing my data platform application. The Data platform so designed is part of a Django based web application. The application is deployed on AWS EC2. The created data platform was fully cloud native and was auto-scalable and utilized docker containers for scalability. The app was based on the Microservices framework and used REST APIs for communication. I achieved a 17% higher throughput for my workflows using asynchronous processing and using event based triggers based on AWS SNS events.

Senior Backend Engineer

Apisero Integration

May, 2022 - Nov, 2022 6 months

My client was a Global Supply Chain Platform and I worked as the enabler of new integration pipelines The application was required to enable realtime tracking and communication to streamline reporting. My team and I went with websockets and channels approach in django to enable realtime, asynchronous processing. I was responsible for recreating the REST APIs of the system and make them more standardized. Leveraged the React/HTML/CSS combination to make the User interface more responsive.

Senior Software Engineer

IGT Solutions

Apr, 2022 - May, 2022 1 month

My designation was Senior Data Engineer and I was in charge of creating data pipelines for some of our vendors. The data wrangling and manipulation scripts were written in Python and hosted on top of AWS ECS fargate containers. The cleansed data was subsequently pushed to another curated bucket. After ascertaining the quality of the incoming data.

Senior Software Engineer

L&T Infotech

Jul, 2021 - Mar, 2022 8 months

My designation was Specialist Data Engineer and I was in charge of creating pipelines specifically for the Asia Pacific Region. The data wrangling and manipulation scripts were written in Python and hosted on top of AWS ECS fargate containers. The cleansed data was subsequently pushed to another curated bucket. After ascertaining the quality of the incoming data. --- The validation was carried out using great expectations, it was loaded to Snowflake data warehouse using python connectors.

Senior Software Engineer

Concentrix

May, 2021 - Jul, 2021 2 months

I created and enhanced UI and backend scripts for my client. The API scripts were written in Python and the UI was based on React frontend.

Senior Software Engineer

Infosys

Dec, 2015 - May, 20215 yr 5 months

Worked with various end clients pertaining to and financial domains. Created and deployed several Django based web applications on the cloud. Followed best practices of SDLC for software development. I worked with HTML,CSS and React to optimize the UI of my applications. Used various services on AWS like SNS, SQS,Lambda,Api Gateway etc for the development of event driven software. Worked recently on Microservices landscape and used FastAPI to carry out certain python API endpoints. The main objective was to create a web applications as per the clients' mandate and deployed mostly on cloud infrastructures.

Achievements

Developed fanning out mechanisms to deal with the sharing of transformed data to the registered clients.
Successfully developed and tested end-to-end ETL pipelines for automated ingestion and storing the results on a cloud warehouse.
Created and deployed several Django-based web applications on the cloud.
Revised concepts of RDBMS,DW and Python development.
Pyspark and Hadoop technologies played a key part in the final capstone project.
Studied SDLC concepts and pipelines in the data cloud.
Also worked on backend frameworks like Django and Flask, along with Containerized services.
Learned to utilize MS Azure services for creating event based data pipelines
Created Big Data pipelines using GCP services
Used Tensorflow library for creating ML models
Mostly worked with Pandas and Pyspark API of Python.
Project work and lab sessions used IBM DSX(data studio) as the service provider.
Revised concepts of RDBMS, DW and Python development.
Created Big Data pipelines using GCP services.
IBM Certified Backend Engineer
Infosys Python Associate
Infosys certified Python developer
Architecting Solutions on AWS

Major Projects

4Projects

Enterprise Data Platform

Blackrock Pvt ltd

Nov, 2022 - Present3 yr 1 month

My team and I were tasked with identifying a viable alternative to the pre-existing Index data platform being used (mostly Perl and SAP Sybase database along with several in-house antiquated tools).

As part of our modernization we moved from traditional architecture to a more cloud native approach.
The next priority was for us to become as vendor agnostic as possible, this led to the choice of Snowflake with DBT as an ELT framework
I was regularly involved in POCs for enhancing the existing Index Platform.
Being a Python developer, I was in charge of understanding the legacy perl code and migrate them to more efficient python scripts.

Offline Verification of Digital Signatures using ANN models

Created a Neural Network identifier for offline signature verification. Used Supervised Learning Algorithms to make the classifier.

Financial Services Guidance

Ameriprise Financial Services

Aug, 2019 - May, 20211 yr 9 months

Python/Pyspark developer AWS cloud: (client Ameriprise Financial Services - 2019 to 2021) I was in-charge of a data migration project wherein the data was being ingested through AWS Glue and were processed downstream using Pyspark programs running on top of EMR clusters. Thereafter the data was sent to an S3 bucket and visualized using Amazon Athena.

Spreadsheet comparator app

Morgan Stanley

May, 2016 - Aug, 20193 yr 3 months

Python Django Developer: (client Morgan Stanley - 2016 to 2019) My team and I were tasked with creating and maintaining a simple MVC app to perform some minimal transformations on some input files and writing the transformed files to an AWS s3 bucket location.

Education

Master of Technology, Computer Science
IIT Dhanbad (2015)
Bachelor of Technology, Computer Science
SRM University, Chennai (2011)

Certifications

Ibm certified data engineer
Ibm certified data science professional
Infosys certified python associate
Gcp big data and ml engineer
Ms azure for data engineering
Ibm backend developer
Ibm data science professional
Ibm certified data engineer (07/2022 - present)
Ibm backend developer (01/2023 - present)
Ibm data science professional (09/2019 - present)
Gcp big data and ml engineer (01/2020 - present)
Ms azure for data engineering (08/2022 - present)
Ibm certified backend engineer
Infosys python associate
Infosys certified python developer
Meta backend developer
Architecting solutions on aws (01/2024 - present)
Architecting solutions on aws

Interests

Travelling

Watching Movies

Exercise

Cricket

AI-interview Questions & Answers

Do you understand motor body diagram? Why keeping up deep in Actually, I have I'm Kisla. I have around 8 0.5 years of experience as a Python data engineer. And I've, um, worked with the back end as well as uh, somewhat front end technologies. And, uh, basically, I'm in charge of creating data for things from scratch. So, basically, I'm in charge of creating cloud native data pipelines, and, uh, I've worked with various data warehouses like Snowflake or and Amazon Redshift and, uh, IBM, uh, Cloud Warehouse on certain occasions. And uh, and most of my work centers around creating and managing and enhancing data pipelines. Like, um, my day to day activity includes in my current project, it includes, uh, creating a new data port and and, uh, like, enhancing or suggesting enhancements to the existing architecture. So I'm, uh, constantly working with the arc enterprise architects to design a new, more robust system of pipeline design. So my actual role is of a senior software engineer in the data department. So, yeah, that's a different direction for myself. I hope I'll get a chance to explain it further.

What the fuck? What would be your strategy? My creating an existing retail process from house, trim, cluster to BigQuery. Basically, my strategy would involve, uh, like, uh, if you want to exist, uh, exist, uh, if you want to migrate an existing retail solution, so, basically, I'll use, uh, GCS storage for my data landing zone, and I'll flow I'll use an orchestrator like, uh, Apache Airflow to, like, extract data from that, uh, source source report that we are using. In our case, the GCS storage, Google Cloud storage that, uh, the data falls into. And, uh, from then onwards, I'll extract and transform menu. I'll write menu all transformation script and use it and run it over Google Dataproc or, uh, Google Dataproc, I think. Google Dataproc would be the most suitable tool for this. So, basically, I would be handling data processing using Dataproc, and then I'll move the data as a final ETL step. I'll move the data to my BigQuery data warehouse using Apache Airflow Scheduler as well. So for so when the data lands to big BigQuery, I can easily, like, easily easily analyze our data and create visualizations using Tableau or Looker or any visualization dashboard. So I think that will be my strategy. Like, uh, and I have to consider load balancing as well. Like, it depends on the case, case to case. So, yeah, that would be a broad approach.

Python to develop complex detailed workflows involving multiple datas, ports, and targets. How do you? So, uh, actually, in my previous projects, we use the PySpark API of Python to, like, uh, to integrate various sources of data. And, uh, we we have a variety of data, uh, like, you guys said, we have a variety of data operators which are supported by Airflow. Apache Airflow as our scheduler, as our, like, uh, orchestrator. Apache Airflow allows us to use Python code along with some inbuilt Airflow operators, and, uh, they they very easily allow us to integrate data. And along with that, we can write our Python codes Python or pie PySpark code snippets on top of Dataproc clusters. Basically, Dataproc is kind of like, uh, Amazon, uh, AWS, EMR. Basically, it is a managed Hadoop service provided by Google. So so I think, uh, multiple data sources can be handled, uh, very well using Python, API, or Python. And it can be run over, uh, Dataproc clusters.

Is required to be less scalable ETL pipeline using Hadoop and R4. Scalable ETL pipeline involves basically when basically, our source system can be help, uh, can be a Hadoop storage, Hadoop cluster storage or HiveTables, basically. And we can process them, uh, we can process them. If it was a legacy system, we can we could process that by MapReduce. But in the modern cases, we use Spark and Pyspark APIs for the processing. Actually, it is an PISPR processing is an in memory processing, so it is much faster than the, uh, Hadoop's map reduce paradigm. So, basically, processing large datasets can be easily achieved by writing writing files for programs, like, uh, creating, um, creating a PySpark context and, uh, uh, PySpark session object and, uh, then performing the calculations, transformations to the data frame and RDD APIs. So it is all very, like, all very simple, actually, using Spark. Big data processing is very simple, actually, using, uh, PySpark. And, uh, we can use a familiar data frame or dataset or RDT, uh, RDT structure to, like, create, uh, create transformation pipeline. Sorry. Create transformation pipelines. And, uh, actually, it will be, uh, price per cluster would be more than enough to handle big Hadoop and live datasets. So I think a scalable EDL pipeline would involve, uh, password processing. Uh, password processors based on Dataproc Dataproc as a service. So, yeah, that's what I suppose the answer should be.

You would optimize data storage in a relational database for data intensive application. So, basically, for data optimizing data storage, there is data storage, redescripting, relational database. The first thing I'll keep in mind is, basically, for data intensive application, it has been generally observed that columnar compression approach would be better than the row based storage row based storage. So, basically, I would optimize my data storage firstly by, uh, converting my data to format like parquet, v r c, and, uh, my second step would be to compress her data into, uh, using a snappy or any other kind of compression algorithms. And, uh, my 3rd would my 3rd optimization step would be to use, uh, as far as, uh, far as possible. My 3rd actually, I'm actually, on second thought, I think the 3rd option wouldn't work in most cases. So I think, yeah, these 2 would be all. Yeah.

That you were used to implement that real time data processing with MB query environment. So, uh, so for using real time streaming, the AWS counterpart for that is, uh, AWS Kinesis. Actually, AWS Kinesis is, uh, quite, uh, is, uh, it is quite similar to AW, uh, Apache Kafka tool. So I'm forgetting the name of the GCP accounts counterpart to that AWS Kinesis, but I think it is, uh, GCP streaming. So, basically, what GCP stream does is it is similar to Apache Kafka, and it it collects our top collects, uh, topics from several producers and, uh, relays those topics to the subscribers. So, basically, it is a cloud PubSub model. So we can simply use to transfer transfer real time data using streaming streaming, uh, apps within the GCP platform. And we can transfer our streaming data 1 by 1 to the BigQuery environment, and, uh, we can query it in real time. So that would be my answer.

Why the code might not function as better? So, um, actually, if we look closely in the stream data function, actually, we are opening the cursor. But, uh, if the condition is not satisfied, Like, if the rule we are retrieving is actually none, then the whole process breaks. Like, it comes out of the function, and, uh, we are never able to close the cursor cursor object. So I think, uh, the I think the most obvious solution would be to use, uh, context handlers over there. Like, uh, with with the cursor cursor object. Or the second option would be to use try, accept, and finally to write the function. Basically, try try block would include all our all our execute all our execution steps and ex exception. Block will catch all our exceptions. And, finally, block will actually execute a perspective of whether we encountered an exception or not. So in our case, in our suggested solution, we can use cursor cursor object or any other object. Uh, we can use the cursor creation object and the query and everything else in the in the try block. And, uh, we can use the accept exception in the accept block. And for the finally block, we can use cursor dot close. So that it will it is a part of the calculation of the row. It will close the cursor object. So, yeah, that would be the most suggested approach, I think.

So coming to this, I can I can see that, uh, like, uh, we are raising an exception inside an exception block? So it can potentially, like, uh, go in and go into an infinite loop, if I'm not wrong. Like, not in an infinite loop, but, uh, I don't see any, like, utility of raising an exception inside an accept block. So I I think the the whole concept of creating exceptions within the accept block is flawed. And, uh, I think that, uh, if we remove the days, other customization issue, then I think, uh, the code looks okay. Yeah. Then I then I'm pretty sure the code is okay. Thanks.

Complex data models you've designed and how to improve in large scale data environment. Actually, uh, in my current project, I'm in charge of creating, uh, on com creating, like, uh, data models. Actually, I'm currently working for finance based. Uh, I'm actually working for financial asset manager company. It is actually the biggest asset manager company in the world, and I'm in charge of creating data schemas or data models of various incoming indexes. Like, by indexes, I mean, the entities which have us for the sub entities. Like, uh, example would be NSE, BSE, or Nasdaq, or do do those any MSCI, any other index. So, basically, I'm in charge of creating data models for incoming indexes. And, uh, like, uh, basically, what my client told me was they want, uh, the indexes. The security which we are getting is already being mapped to a public identifier provided by the vendor. And, uh, what my client told me that we want to create a data model in which we internally map the public identifier, which is given by the vendor. And, uh, actually, I have to create a logical and very, like, uh, very exhausting mapping, very exhausting exhaustive mapping of, uh, the, uh, incoming public identifiers and the internal private type, uh, private identity enterprise which we use. Uh, those we dumb as QZIPs in our language. So, basically, I'm in charge of creating those mappings. And, uh, also, I'm in charge of creating several, like, uh, several data transformations. So, uh, I think, uh, the data model involved here is very complex because we have, uh, we have, uh, many different moving parts, and we have to manage each one of them. Like, for, um, for the Brazilian or Latin American con countries, we have a index called NBMA, which is, uh, which is highly different from the those of the Asian markets. So, uh, creating a data model, which, like, which is uniform for all our client countries is very, like, exhaustive and very difficult at client to implement. And I would and I would further explain it if given a chance. No.

Given your expertise in Java, uh, but create a robust details of this. Yeah. Actually, Java's Java's concurrency, unlike Python, is a real concurrency that allows us to use, uh, multiple core processors at once. The thing is in Python, we have the concept of global interpreter lock, which we don't have in Java. So in Java, effectively, we can run the program on multiple course, and thereby, we can use we can use Java concurrency features for, like, uh, real time. And we can use the we can use Java's concurrency features in the. Actually actually, I don't have much experience with Java, but but I have some theoretical background over it. I've not used much Java in practice, So I don't think I would be the I would be very, like, I would be giving a very detailed answer on that this question.

As we're integrating our Python, we see the flow for you to ensure liability and scalability, integrating a Python by Studio. Basically, for ensuring reliability and scalability, If given the task of integrating a Python and Python based DTL process is the airflow. So, actually, I would if it was a cloud based environment, so I would be, like, I would be, first of all, to handle the reliability and scalability concerns, I would be using, uh, application load balancers application load balancers to ensure scalability. Actually, I don't want any any particular node to be like, any particular node to be overloaded with data processing. So I would use so I would use, uh, uh, actually, I would use, uh, processing of data to I'm sorry. There's some disturbance at my end. Actually, if that's basically, I have integrated Python based retail processes with airflow in the past. And, actually, my main concern would be to use the appropriate airflow operator to ensure reliability of performance. Basically, I want to use an operator which allows us which allows us to, like, allows us to, like, effectively handle the data. And, uh, my task would be to ensure that, uh, like, all the infrastructure is highly highly scalable and, uh, if possible, serverless. Like, by serverless, I mean that we are not concerned with the, uh, infrastructure provisioning of the data. Basically, the underlying cloud service takes care of provisioning the infrastructure for us as and when it is needed. Like, as and when the data flow reaches a particular threshold, then we'll automatically get a new infrastructure piece. And I'll I will use actually, that is widely available in the GCS cloud as well as the AWS cloud. So, yeah, I would use that application load balancer service extensively. Yeah. Thanks.

Kislay srivastava

Senior Software Engineer

9 years

View here

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Senior Software Engineer

Technical Lead

Senior Software Engineer

Senior Backend Engineer

Senior Software Engineer

Senior Software Engineer

Senior Software Engineer

Senior Software Engineer

Achievements

Major Projects

Enterprise Data Platform

Offline Verification of Digital Signatures using ANN models

Financial Services Guidance

Spreadsheet comparator app

Education

Master of Technology, Computer Science

Bachelor of Technology, Computer Science

Certifications

Ibm certified data engineer

Ibm certified data science professional

Infosys certified python associate

Gcp big data and ml engineer

Ms azure for data engineering

Ibm backend developer

Ibm data science professional

Ibm certified data engineer (07/2022 - present)

Ibm backend developer (01/2023 - present)

Ibm data science professional (09/2019 - present)

Gcp big data and ml engineer (01/2020 - present)

Ms azure for data engineering (08/2022 - present)

Ibm certified backend engineer

Infosys python associate

Infosys certified python developer

Meta backend developer

Architecting solutions on aws (01/2024 - present)

Architecting solutions on aws

Interests

AI-interview Questions & Answers