profile-pic
Vetted Talent

Kislay srivastava

Vetted Talent

I have over 8 years of experience with Python development, I have worked primarily with Django and Flask frameworks to create scalable web applications and deploying them on cloud.

I am confident in my ability to take on complex projects and provide innovative solutions that meet the needs of clients.

  • Role

    Senior Python Developer - Data Science & Engineering

  • Years of Experience

    8.2 years

  • Professional Portfolio

    View here

Skillsets

  • Hadoop
  • Warehousing
  • Terraform
  • Tableau
  • Redis
  • pandas
  • NumPy
  • MemCached
  • Express
  • Databricks
  • Data Visualization
  • Architecture
  • react
  • PySpark
  • Kafka
  • JavaScript
  • Airflow - 4.0 Years
  • Flask
  • Data Engineering
  • Azure
  • NoSQL
  • FastAPI
  • Django
  • Python
  • Django
  • Python
  • SQL - 5.0 Years
  • Snowflake
  • Python - 8.0 Years
  • Git
  • Django - 5.0 Years
  • AWS - 5.0 Years

Vetted For

13Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Data Engineer || (Remote)AI Screening
  • 76%
    icon-arrow-down
  • Skills assessed :Airflow, Data Governance, machine learning and data science, BigQuery, ETL processes, Hive, Relational DB, Snowflake, Hadoop, Java, Postgre SQL, Python, SQL
  • Score: 68/90

Professional Summary

8.2Years
  • Mar, 2026 - Present 3 months

    Senior Python Developer - Data Science & Engineering

    RDSolutions
  • Nov, 2022 - Oct, 20241 yr 11 months

    Senior Backend Engineer

    Miratech
  • Apr, 2022 - Nov, 2022 7 months

    Senior Lead Engineer

    Apisero Integration
  • Dec, 2015 - May, 20215 yr 5 months

    Senior System Engineer

    Infosys

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    Pyspark

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Apache Airflow

  • icon-tool

    Snowflake

  • icon-tool

    MySQL

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    Django

  • icon-tool

    Flask

  • icon-tool

    Athena

  • icon-tool

    Airflow

  • icon-tool

    Azure Databricks

  • icon-tool

    Tableau

  • icon-tool

    AWS S3

  • icon-tool

    AWS

  • icon-tool

    Azure Blob Storage

  • icon-tool

    Azure Databricks

  • icon-tool

    Tableau

  • icon-tool

    AWS Glue

  • icon-tool

    Tableau

  • icon-tool

    Tensorflow

  • icon-tool

    Pandas

  • icon-tool

    Tableau

  • icon-tool

    AWS RDS

  • icon-tool

    AWS Fargate

  • icon-tool

    Kafka

  • icon-tool

    Jenkins

  • icon-tool

    AWS Elastic Beanstalk

  • icon-tool

    Django Rest Framework

  • icon-tool

    CI/CD

  • icon-tool

    AWS S3

  • icon-tool

    Azure App Service

  • icon-tool

    Databricks

  • icon-tool

    AWS S3

  • icon-tool

    AWS EMR

  • icon-tool

    AWS RDS

  • icon-tool

    Kafka

  • icon-tool

    Azure App Service

  • icon-tool

    JDBC

  • icon-tool

    React

  • icon-tool

    HTML

  • icon-tool

    CSS

  • icon-tool

    Pandas

  • icon-tool

    AWS S3

  • icon-tool

    Javascript

Work History

8.2Years

Senior Python Developer - Data Science & Engineering

RDSolutions
Mar, 2026 - Present 3 months
    Modernized existing data pipelines on Azure using Python. Responsible for creating and maintaining the Azure Cloud UI and agreed upon ELT architecture. Involved in deploying and creating Terraform scripts to automate deployments.

Senior Backend Engineer

Miratech
Nov, 2022 - Oct, 20241 yr 11 months
    Designed new integration pipelines for a Global Supply Chain Platform. Improved Data Application responsible for ETL functionality by applying RAG based generative AI. Leveraged FastAPI with React frontend and deployed on AWS ElasticBeanStalk. Developed fanning out mechanisms with Kafka and utilized AWS ECS for creating kafka brokers. Achieved 17% higher throughput using Airflow orchestrator and asynchronous processing with event based triggers.

Senior Lead Engineer

Apisero Integration
Apr, 2022 - Nov, 2022 7 months
    Created new pipelines from scratch using MS Azure blob storage and utilized SnowFlake as the backend data warehouse. Worked with Pandas and Pyspark API of Python. Used Azure Databricks to ingest data and load it to warehouse. Developed various Machine Learning models to predict sales revenue for new products. Downstream visualizations were carried out through Tableau.

Senior System Engineer

Infosys
Dec, 2015 - May, 20215 yr 5 months
    Created and deployed several Django based web applications on the cloud. Developed and personalized web applications using Redis/MemCached distributed caching. Worked on big data landscape using AWS EMR, pandas/Numpy and pyspark scripts. Created data pipelines for ingesting source files from various systems and pushing processed rows to AWS S3.

Achievements

  • Developed fanning out mechanisms to deal with the sharing of transformed data to the registered clients.
  • Successfully developed and tested end-to-end ETL pipelines for automated ingestion and storing the results on a cloud warehouse.
  • Created and deployed several Django-based web applications on the cloud.
  • Revised concepts of RDBMS,DW and Python development.
  • Pyspark and Hadoop technologies played a key part in the final capstone project.
  • Studied SDLC concepts and pipelines in the data cloud.
  • Also worked on backend frameworks like Django and Flask, along with Containerized services.
  • Learned to utilize MS Azure services for creating event based data pipelines
  • Created Big Data pipelines using GCP services
  • Used Tensorflow library for creating ML models
  • Mostly worked with Pandas and Pyspark API of Python.
  • Project work and lab sessions used IBM DSX(data studio) as the service provider.
  • Revised concepts of RDBMS, DW and Python development.
  • Created Big Data pipelines using GCP services.
  • IBM Certified Backend Engineer
  • Infosys Python Associate
  • Infosys certified Python developer
  • Architecting Solutions on AWS

Major Projects

4Projects

Enterprise Data Platform

Blackrock Pvt ltd
Nov, 2022 - Present3 yr 7 months

    My team and I were tasked with identifying a viable alternative to the pre-existing Index data platform being used (mostly Perl and SAP Sybase database along with several in-house antiquated tools).

    1. As part of our modernization we moved from traditional architecture to a more cloud native approach.
    2. The next priority was for us to become as vendor agnostic as possible, this led to the choice of Snowflake with DBT as an ELT framework
    3. I was regularly involved in POCs for enhancing the existing Index Platform.
    4. Being a Python developer, I was in charge of understanding the legacy perl code and migrate them to more efficient python scripts.

Offline Verification of Digital Signatures using ANN models

    Created a Neural Network identifier for offline signature verification. Used Supervised Learning Algorithms to make the classifier.

Financial Services Guidance

Ameriprise Financial Services
Aug, 2019 - May, 20211 yr 9 months
    1. Python/Pyspark developer AWS cloud: (client Ameriprise Financial Services - 2019 to 2021) I was in-charge of a data migration project wherein the data was being ingested through AWS Glue and were processed downstream using Pyspark programs running on top of EMR clusters. Thereafter the data was sent to an S3 bucket and visualized using Amazon Athena.

Spreadsheet comparator app

Morgan Stanley
May, 2016 - Aug, 20193 yr 3 months
    1. Python Django Developer: (client Morgan Stanley - 2016 to 2019) My team and I were tasked with creating and maintaining a simple MVC app to perform some minimal transformations on some input files and writing the transformed files to an AWS s3 bucket location.

Education

  • MTech (Comp Science)

    IIT Dhanbad (2015)
  • BTech (Comp Science)

    SRM University, Chennai (2011)

Certifications

  • Ibm certified data engineer

  • Ibm certified data science professional

  • Infosys certified python associate

  • Gcp big data and ml engineer

  • Ms azure for data engineering

  • Ibm backend developer

  • Ibm data science professional

  • Ibm certified data engineer (07/2022 - present)

  • Ibm backend developer (01/2023 - present)

  • Ibm data science professional (09/2019 - present)

  • Gcp big data and ml engineer (01/2020 - present)

  • Ms azure for data engineering (08/2022 - present)

  • Ibm certified backend engineer

  • Infosys python associate

  • Infosys certified python developer

  • Meta backend developer

  • Architecting solutions on aws (01/2024 - present)

  • Architecting solutions on aws

Interests

  • Travelling
  • Watching Movies
  • Exercise
  • Cricket
  • AI-interview Questions & Answers

    Do you understand motor body diagram? Why keeping up deep in Actually, I have I'm Kisla. I have around 8 and a half years of experience as a Python data engineer. And I've worked with both back-end as well as front-end technologies. And basically, I'm in charge of creating data from scratch. So, basically, I'm in charge of creating cloud-native data pipelines, and I've worked with various data warehouses like Snowflake, Amazon Redshift, and IBM Cloud Warehouse on certain occasions. And most of my work centers around creating, managing, and enhancing data pipelines. Like, my day-to-day activity includes creating a new data port and enhancing or suggesting enhancements to the existing architecture. So I'm constantly working with the enterprise architects to design a new, more robust system of pipeline design. So my actual role is that of a senior software engineer in the data department. So, yeah, that's a different direction for myself. I hope I'll get a chance to explain it further.

    What's the strategy? My strategy would be to create an existing retail process from house, trim, cluster to BigQuery. Basically, if you want to exist, you have to exist, if you want to migrate an existing retail solution. I'll use GCS storage for my data landing zone, and I'll use an orchestrator like Apache Airflow to extract data from that source report that we are using, which is GCS storage. From then onwards, I'll extract and transform the data. I'll write the transformation script and run it over Google Dataproc, which is the most suitable tool for this. I'll be handling data processing using Dataproc, and then I'll move the data as a final ETL step to my BigQuery data warehouse using Apache Airflow Scheduler. Once the data lands in BigQuery, I can easily analyze the data and create visualizations using Tableau or Looker or any visualization dashboard. I have to consider load balancing as well, but it depends on the case. That's a broad approach.

    Python is used to develop complex detailed workflows involving multiple datas, ports, and targets. So, actually, in my previous projects, we used the PySpark API of Python to integrate various sources of data. And, we have a variety of data operators supported by Airflow. Apache Airflow is our scheduler, as our orchestrator. Apache Airflow allows us to use Python code along with some inbuilt Airflow operators, and they very easily allow us to integrate data. Along with that, we can write our Python codes or PySpark code snippets on top of Dataproc clusters. Dataproc is a managed Hadoop service provided by Google, kind of like Amazon AWS EMR. I think multiple data sources can be handled very well using Python APIs, or PySpark. And it can be run over Dataproc clusters.

    To be less scalable ETL pipeline using Hadoop and R involves using Hadoop storage or Hive tables as the source system. We can process them using MapReduce. In modern cases, we use Spark and Pyspark APIs for processing. Actually, Pyspark processing is an in-memory processing, so it is much faster than the Hadoop's map reduce paradigm. So, processing large datasets can be easily achieved by writing files for programs, like creating a PySpark context and performing calculations, transformations to the data frame and RDD APIs. So, it is all very simple using Spark. Big data processing is very simple using PySpark. We can use a familiar data frame or dataset or RDD structure to create transformation pipelines. Actually, it will be priced per cluster to handle big Hadoop and live datasets. I think a less scalable ETL pipeline would involve using password processors based on Dataproc as a service.

    You would optimize data storage in a relational database for data intensive application. So, basically, for data optimization, there is data storage, redesigning, and relational database. The first thing I'll keep in mind is that for data intensive applications, it has been generally observed that a columnar compression approach is better than row-based storage. So, basically, I would optimize my data storage firstly by converting my data to formats like Parquet, VRC, and then compressing the data using a Snappy or any other kind of compression algorithm. And my second step would be to use these two approaches as far as possible. My third actually, I'm actually on second thought, I think the third option wouldn't work in most cases. So I think these two would be all.

    That you were used to implement that real time data processing with MB query environment. So for using real time streaming, the AWS counterpart for that is, AWS Kinesis. Actually, AWS Kinesis is, quite, is, it is quite similar to AW, Apache Kafka tool. So I'm forgetting the name of the GCP accounts counterpart to that AWS Kinesis, but I think it is, GCP streaming. So, basically, what GCP stream does is it is similar to Apache Kafka, and it collects our top collects, topics from several producers and, relays those topics to the subscribers. So, basically, it is a cloud PubSub model. So we can simply use to transfer real time data using streaming, apps within the GCP platform. And we can transfer our streaming data 1 by 1 to the BigQuery environment, and, we can query it in real time. So that would be my answer.

    Why the code might not function as better? So, actually, if we look closely in the stream data function, we are opening the cursor. But, if the condition is not satisfied, and the rule we are retrieving is actually none, then the whole process breaks. It comes out of the function, and we are never able to close the cursor object. So, I think the most obvious solution would be to use context handlers over there with the cursor object. Like, with the second option would be to use try, except, and finally to write the function. Basically, the try block would include all our execution steps and exception handling. The except block will catch all our exceptions. And the finally block will actually execute regardless of whether we encountered an exception or not. So in our suggested solution, we can use the cursor object or any other object in the try block. We can use the cursor creation object and the query in the try block. And we can use the exception in the except block. For the finally block, we can use cursor dot close. So that it will close the cursor object.

    So coming to this, I can see that, like, we are raising an exception inside an exception block. So it can potentially go into an infinite loop, if I'm not wrong. Like, not in an infinite loop, but I don't see any utility of raising an exception inside an exception block. So I think the whole concept of creating exceptions within the exception block is flawed. And if we remove the days, other customization issue, then I think the code looks okay. Yeah. Then I'm pretty sure the code is okay. Thanks.

    Complex data models you've designed and how to improve in large scale data environment. Actually, in my current project, I'm in charge of creating data models. I'm currently working for a financial asset manager company, which is the biggest asset manager company in the world, and I'm in charge of creating data schemas or data models of various incoming indexes. Like, by indexes, I mean the entities that have us for sub entities. An example would be NSE, BSE, or Nasdaq, or MSCI, or any other index. So, basically, I'm in charge of creating data models for incoming indexes. And, basically, what my client told me was they want the indexes. The security which we are getting is already being mapped to a public identifier provided by the vendor. And, what my client told me that we want to create a data model in which we internally map the public identifier, which is given by the vendor. And, I have to create a logical and very exhaustive mapping of the incoming public identifiers and the internal private type. Those we refer to as QZIPs in our language. So, basically, I'm in charge of creating those mappings. And, also, I'm in charge of creating several data transformations. So, I think the data model involved here is very complex because we have many different moving parts, and we have to manage each one of them. Like, for example, for the Brazilian or Latin American countries, we have an index called NBMA, which is highly different from those of the Asian markets. So, creating a data model, which is uniform for all our client countries is very exhaustive and very difficult to implement. And I would further explain it if given a chance.

    Java's concurrency, unlike Python, is a real concurrency that allows us to use multiple core processors at once. The thing is in Python, we have the concept of global interpreter lock, which we don't have in Java. So in Java, effectively, we can run the program on multiple cores, and thereby, we can use Java concurrency features for real time. And we can use Java's concurrency features in the context. Actually, I don't have much experience with Java, but I have some theoretical background over it. I've not used much Java in practice, so I don't think I would be giving a very detailed answer on that question.

    As we're integrating our Python, we see the flow for ensuring liability and scalability, integrating a Python by Studio. Basically, for ensuring reliability and scalability, given the task of integrating a Python and Python-based DTL process in Airflow. So, actually, I would first handle the reliability and scalability concerns by using application load balancers to ensure scalability. I would ensure that no particular node is overloaded with data processing. So, I would use data processing to handle the load. There's some disturbance at my end. Actually, if that's the case, I have integrated Python-based retail processes with Airflow in the past. And my main concern would be to use the appropriate Airflow operator to ensure reliability of performance. I want to use an operator that allows us to effectively handle the data. My task would be to ensure that all the infrastructure is highly scalable and, if possible, serverless. By serverless, I mean that we are not concerned with infrastructure provisioning of the data. The underlying cloud service takes care of provisioning the infrastructure for us as and when it is needed. As and when the data flow reaches a particular threshold, then we'll automatically get a new infrastructure piece. That is widely available in the GCS cloud as well as the AWS cloud. So, I would use the application load balancer service extensively. Yeah. Thanks.