profile-pic
Vetted Talent

Kislay srivastava

Vetted Talent

I have over 8 years of experience with Python development, I have worked primarily with Django and Flask frameworks to create scalable web applications and deploying them on cloud.

I am confident in my ability to take on complex projects and provide innovative solutions that meet the needs of clients.

  • Role

    Senior Software Engineer

  • Years of Experience

    9 years

  • Professional Portfolio

    View here

Skillsets

  • Flask
  • Machine Learning
  • Data Structures
  • C
  • Big Data
  • Algorithms
  • Flask
  • Django
  • Data Mining
  • Python
  • Machine Learning
  • Data Structures
  • C
  • Big Data
  • Algorithms
  • Airflow - 4.0 Years
  • Django
  • Data Mining
  • Python
  • Machine Learning
  • SQL - 5.0 Years
  • Snowflake
  • Python - 8.0 Years
  • Git
  • Django - 5.0 Years
  • Databricks - 3.0 Years
  • Data Engineering - 5.0 Years
  • Bash
  • AWS - 5.0 Years

Vetted For

13Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Data Engineer || (Remote)AI Screening
  • 76%
    icon-arrow-down
  • Skills assessed :Airflow, Data Governance, machine learning and data science, BigQuery, ETL processes, Hive, Relational DB, Snowflake, Hadoop, Java, Postgre SQL, Python, SQL
  • Score: 68/90

Professional Summary

9Years
  • Nov, 2022 - Present2 yr 11 months

    Senior Software Engineer

    Miratech Pvt ltd
  • Apr, 2022 - Nov, 2022 7 months

    Senior Backend Engineer

    Apisero Integration Pvt ltd
  • Dec, 2015 - May, 20215 yr 5 months

    Senior Software Engineer

    Infosys Ltd

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    Pyspark

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    Apache Airflow

  • icon-tool

    Snowflake

  • icon-tool

    MySQL

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    Django

  • icon-tool

    Flask

  • icon-tool

    Athena

  • icon-tool

    Airflow

  • icon-tool

    Azure Databricks

  • icon-tool

    Tableau

  • icon-tool

    AWS S3

  • icon-tool

    AWS

  • icon-tool

    Azure Blob Storage

  • icon-tool

    Azure Databricks

  • icon-tool

    Tableau

  • icon-tool

    AWS Glue

  • icon-tool

    Tableau

  • icon-tool

    Tensorflow

  • icon-tool

    Pandas

  • icon-tool

    Tableau

  • icon-tool

    AWS RDS

  • icon-tool

    AWS Fargate

  • icon-tool

    Kafka

  • icon-tool

    Jenkins

  • icon-tool

    AWS Elastic Beanstalk

  • icon-tool

    Django Rest Framework

  • icon-tool

    CI/CD

  • icon-tool

    AWS S3

  • icon-tool

    Azure App Service

  • icon-tool

    Databricks

  • icon-tool

    AWS S3

  • icon-tool

    AWS EMR

  • icon-tool

    AWS RDS

  • icon-tool

    Kafka

  • icon-tool

    Azure App Service

  • icon-tool

    JDBC

  • icon-tool

    React

  • icon-tool

    HTML

  • icon-tool

    CSS

  • icon-tool

    Pandas

  • icon-tool

    AWS S3

  • icon-tool

    Javascript

Work History

9Years

Senior Software Engineer

Miratech Pvt ltd
Nov, 2022 - Present2 yr 11 months
    Responsible for enabling the pipelines which onboard new clients, integrating data from several sources using python and pyspark, pushing to downstream systems, creating cloud-native Django web applications deployed on AWS Elastic Beanstalk. Utilized MySQL RDS, Kafka for fanning out mechanisms, and AWS ElastiCache for enhancing UX.

Senior Backend Engineer

Apisero Integration Pvt ltd
Apr, 2022 - Nov, 2022 7 months
    Designed new integration pipelines, enhanced Django applications on Azure, worked with HTTP/Websocket APIs, recreated standardized REST APIs, utilized PostgreSQL with JDBC connection, and optimized UI using React/HTML/CSS.

Senior Software Engineer

Infosys Ltd
Dec, 2015 - May, 20215 yr 5 months
    Created and deployed Django web applications on the cloud, optimized UI using HTML/CSS/JavaScript, worked on big data landscapes using AWS EMR, Pandas/PySpark for transformations, created Airflow DAGS, and designed data pipelines to process rows and ingest source files into AWS S3.

Achievements

  • Developed fanning out mechanisms to deal with the sharing of transformed data to the registered clients.
  • Successfully developed and tested end-to-end ETL pipelines for automated ingestion and storing the results on a cloud warehouse.
  • Created and deployed several Django-based web applications on the cloud.
  • Revised concepts of RDBMS,DW and Python development.
  • Pyspark and Hadoop technologies played a key part in the final capstone project.
  • Studied SDLC concepts and pipelines in the data cloud.
  • Also worked on backend frameworks like Django and Flask, along with Containerized services.
  • Learned to utilize MS Azure services for creating event based data pipelines
  • Created Big Data pipelines using GCP services
  • Used Tensorflow library for creating ML models
  • Mostly worked with Pandas and Pyspark API of Python.
  • Project work and lab sessions used IBM DSX(data studio) as the service provider.
  • Revised concepts of RDBMS, DW and Python development.
  • Created Big Data pipelines using GCP services.
  • IBM Certified Backend Engineer
  • Infosys Python Associate
  • Infosys certified Python developer
  • Architecting Solutions on AWS

Major Projects

4Projects

Enterprise Data Platform

Blackrock Pvt ltd
Nov, 2022 - Present2 yr 11 months

    My team and I were tasked with identifying a viable alternative to the pre-existing Index data platform being used (mostly Perl and SAP Sybase database along with several in-house antiquated tools).

    1. As part of our modernization we moved from traditional architecture to a more cloud native approach.
    2. The next priority was for us to become as vendor agnostic as possible, this led to the choice of Snowflake with DBT as an ELT framework
    3. I was regularly involved in POCs for enhancing the existing Index Platform.
    4. Being a Python developer, I was in charge of understanding the legacy perl code and migrate them to more efficient python scripts.

Offline Verification of Digital Signatures using ANN models

    Created a Neural Network identifier for offline signature verification. Used Supervised Learning Algorithms to make the classifier.

Financial Services Guidance

Ameriprise Financial Services
Aug, 2019 - May, 20211 yr 9 months
    1. Python/Pyspark developer AWS cloud: (client Ameriprise Financial Services - 2019 to 2021) I was in-charge of a data migration project wherein the data was being ingested through AWS Glue and were processed downstream using Pyspark programs running on top of EMR clusters. Thereafter the data was sent to an S3 bucket and visualized using Amazon Athena.

Spreadsheet comparator app

Morgan Stanley
May, 2016 - Aug, 20193 yr 3 months
    1. Python Django Developer: (client Morgan Stanley - 2016 to 2019) My team and I were tasked with creating and maintaining a simple MVC app to perform some minimal transformations on some input files and writing the transformed files to an AWS s3 bucket location.

Education

  • MASTER OF TECHNOLOGY (CSE)

    Indian Institute of Technology, Dhanbad (2015)
  • BACHELOR OF TECHNOLOGY (CSE)

    Srm University, Chennai (2011)

Certifications

  • Ibm certified data engineer

  • Ibm certified data science professional

  • Infosys certified python associate

  • Gcp big data and ml engineer

  • Ms azure for data engineering

  • Ibm backend developer

  • Ibm data science professional

  • Ibm certified data engineer (07/2022 - present)

  • Ibm backend developer (01/2023 - present)

  • Ibm data science professional (09/2019 - present)

  • Gcp big data and ml engineer (01/2020 - present)

  • Ms azure for data engineering (08/2022 - present)

  • Ibm certified backend engineer

  • Infosys python associate

  • Infosys certified python developer

  • Meta backend developer

  • Architecting solutions on aws (01/2024 - present)

  • Architecting solutions on aws

Interests

  • Travelling
  • Watching Movies
  • Exercise
  • Cricket
  • AI-interview Questions & Answers

    Do you understand motor body diagram? Why keeping up deep in Actually, I have I'm Kisla. I have around 8 0.5 years of experience as a Python data engineer. And I've, um, worked with the back end as well as uh, somewhat front end technologies. And, uh, basically, I'm in charge of creating data for things from scratch. So, basically, I'm in charge of creating cloud native data pipelines, and, uh, I've worked with various data warehouses like Snowflake or and Amazon Redshift and, uh, IBM, uh, Cloud Warehouse on certain occasions. And uh, and most of my work centers around creating and managing and enhancing data pipelines. Like, um, my day to day activity includes in my current project, it includes, uh, creating a new data port and and, uh, like, enhancing or suggesting enhancements to the existing architecture. So I'm, uh, constantly working with the arc enterprise architects to design a new, more robust system of pipeline design. So my actual role is of a senior software engineer in the data department. So, yeah, that's a different direction for myself. I hope I'll get a chance to explain it further.

    What the fuck? What would be your strategy? My creating an existing retail process from house, trim, cluster to BigQuery. Basically, my strategy would involve, uh, like, uh, if you want to exist, uh, exist, uh, if you want to migrate an existing retail solution, so, basically, I'll use, uh, GCS storage for my data landing zone, and I'll flow I'll use an orchestrator like, uh, Apache Airflow to, like, extract data from that, uh, source source report that we are using. In our case, the GCS storage, Google Cloud storage that, uh, the data falls into. And, uh, from then onwards, I'll extract and transform menu. I'll write menu all transformation script and use it and run it over Google Dataproc or, uh, Google Dataproc, I think. Google Dataproc would be the most suitable tool for this. So, basically, I would be handling data processing using Dataproc, and then I'll move the data as a final ETL step. I'll move the data to my BigQuery data warehouse using Apache Airflow Scheduler as well. So for so when the data lands to big BigQuery, I can easily, like, easily easily analyze our data and create visualizations using Tableau or Looker or any visualization dashboard. So I think that will be my strategy. Like, uh, and I have to consider load balancing as well. Like, it depends on the case, case to case. So, yeah, that would be a broad approach.

    Python to develop complex detailed workflows involving multiple datas, ports, and targets. How do you? So, uh, actually, in my previous projects, we use the PySpark API of Python to, like, uh, to integrate various sources of data. And, uh, we we have a variety of data, uh, like, you guys said, we have a variety of data operators which are supported by Airflow. Apache Airflow as our scheduler, as our, like, uh, orchestrator. Apache Airflow allows us to use Python code along with some inbuilt Airflow operators, and, uh, they they very easily allow us to integrate data. And along with that, we can write our Python codes Python or pie PySpark code snippets on top of Dataproc clusters. Basically, Dataproc is kind of like, uh, Amazon, uh, AWS, EMR. Basically, it is a managed Hadoop service provided by Google. So so I think, uh, multiple data sources can be handled, uh, very well using Python, API, or Python. And it can be run over, uh, Dataproc clusters.

    Is required to be less scalable ETL pipeline using Hadoop and R4. Scalable ETL pipeline involves basically when basically, our source system can be help, uh, can be a Hadoop storage, Hadoop cluster storage or HiveTables, basically. And we can process them, uh, we can process them. If it was a legacy system, we can we could process that by MapReduce. But in the modern cases, we use Spark and Pyspark APIs for the processing. Actually, it is an PISPR processing is an in memory processing, so it is much faster than the, uh, Hadoop's map reduce paradigm. So, basically, processing large datasets can be easily achieved by writing writing files for programs, like, uh, creating, um, creating a PySpark context and, uh, uh, PySpark session object and, uh, then performing the calculations, transformations to the data frame and RDD APIs. So it is all very, like, all very simple, actually, using Spark. Big data processing is very simple, actually, using, uh, PySpark. And, uh, we can use a familiar data frame or dataset or RDT, uh, RDT structure to, like, create, uh, create transformation pipeline. Sorry. Create transformation pipelines. And, uh, actually, it will be, uh, price per cluster would be more than enough to handle big Hadoop and live datasets. So I think a scalable EDL pipeline would involve, uh, password processing. Uh, password processors based on Dataproc Dataproc as a service. So, yeah, that's what I suppose the answer should be.

    You would optimize data storage in a relational database for data intensive application. So, basically, for data optimizing data storage, there is data storage, redescripting, relational database. The first thing I'll keep in mind is, basically, for data intensive application, it has been generally observed that columnar compression approach would be better than the row based storage row based storage. So, basically, I would optimize my data storage firstly by, uh, converting my data to format like parquet, v r c, and, uh, my second step would be to compress her data into, uh, using a snappy or any other kind of compression algorithms. And, uh, my 3rd would my 3rd optimization step would be to use, uh, as far as, uh, far as possible. My 3rd actually, I'm actually, on second thought, I think the 3rd option wouldn't work in most cases. So I think, yeah, these 2 would be all. Yeah.

    That you were used to implement that real time data processing with MB query environment. So, uh, so for using real time streaming, the AWS counterpart for that is, uh, AWS Kinesis. Actually, AWS Kinesis is, uh, quite, uh, is, uh, it is quite similar to AW, uh, Apache Kafka tool. So I'm forgetting the name of the GCP accounts counterpart to that AWS Kinesis, but I think it is, uh, GCP streaming. So, basically, what GCP stream does is it is similar to Apache Kafka, and it it collects our top collects, uh, topics from several producers and, uh, relays those topics to the subscribers. So, basically, it is a cloud PubSub model. So we can simply use to transfer transfer real time data using streaming streaming, uh, apps within the GCP platform. And we can transfer our streaming data 1 by 1 to the BigQuery environment, and, uh, we can query it in real time. So that would be my answer.

    Why the code might not function as better? So, um, actually, if we look closely in the stream data function, actually, we are opening the cursor. But, uh, if the condition is not satisfied, Like, if the rule we are retrieving is actually none, then the whole process breaks. Like, it comes out of the function, and, uh, we are never able to close the cursor cursor object. So I think, uh, the I think the most obvious solution would be to use, uh, context handlers over there. Like, uh, with with the cursor cursor object. Or the second option would be to use try, accept, and finally to write the function. Basically, try try block would include all our all our execute all our execution steps and ex exception. Block will catch all our exceptions. And, finally, block will actually execute a perspective of whether we encountered an exception or not. So in our case, in our suggested solution, we can use cursor cursor object or any other object. Uh, we can use the cursor creation object and the query and everything else in the in the try block. And, uh, we can use the accept exception in the accept block. And for the finally block, we can use cursor dot close. So that it will it is a part of the calculation of the row. It will close the cursor object. So, yeah, that would be the most suggested approach, I think.

    So coming to this, I can I can see that, uh, like, uh, we are raising an exception inside an exception block? So it can potentially, like, uh, go in and go into an infinite loop, if I'm not wrong. Like, not in an infinite loop, but, uh, I don't see any, like, utility of raising an exception inside an accept block. So I I think the the whole concept of creating exceptions within the accept block is flawed. And, uh, I think that, uh, if we remove the days, other customization issue, then I think, uh, the code looks okay. Yeah. Then I then I'm pretty sure the code is okay. Thanks.

    Complex data models you've designed and how to improve in large scale data environment. Actually, uh, in my current project, I'm in charge of creating, uh, on com creating, like, uh, data models. Actually, I'm currently working for finance based. Uh, I'm actually working for financial asset manager company. It is actually the biggest asset manager company in the world, and I'm in charge of creating data schemas or data models of various incoming indexes. Like, by indexes, I mean, the entities which have us for the sub entities. Like, uh, example would be NSE, BSE, or Nasdaq, or do do those any MSCI, any other index. So, basically, I'm in charge of creating data models for incoming indexes. And, uh, like, uh, basically, what my client told me was they want, uh, the indexes. The security which we are getting is already being mapped to a public identifier provided by the vendor. And, uh, what my client told me that we want to create a data model in which we internally map the public identifier, which is given by the vendor. And, uh, actually, I have to create a logical and very, like, uh, very exhausting mapping, very exhausting exhaustive mapping of, uh, the, uh, incoming public identifiers and the internal private type, uh, private identity enterprise which we use. Uh, those we dumb as QZIPs in our language. So, basically, I'm in charge of creating those mappings. And, uh, also, I'm in charge of creating several, like, uh, several data transformations. So, uh, I think, uh, the data model involved here is very complex because we have, uh, we have, uh, many different moving parts, and we have to manage each one of them. Like, for, um, for the Brazilian or Latin American con countries, we have a index called NBMA, which is, uh, which is highly different from the those of the Asian markets. So, uh, creating a data model, which, like, which is uniform for all our client countries is very, like, exhaustive and very difficult at client to implement. And I would and I would further explain it if given a chance. No.

    Given your expertise in Java, uh, but create a robust details of this. Yeah. Actually, Java's Java's concurrency, unlike Python, is a real concurrency that allows us to use, uh, multiple core processors at once. The thing is in Python, we have the concept of global interpreter lock, which we don't have in Java. So in Java, effectively, we can run the program on multiple course, and thereby, we can use we can use Java concurrency features for, like, uh, real time. And we can use the we can use Java's concurrency features in the. Actually actually, I don't have much experience with Java, but but I have some theoretical background over it. I've not used much Java in practice, So I don't think I would be the I would be very, like, I would be giving a very detailed answer on that this question.

    As we're integrating our Python, we see the flow for you to ensure liability and scalability, integrating a Python by Studio. Basically, for ensuring reliability and scalability, If given the task of integrating a Python and Python based DTL process is the airflow. So, actually, I would if it was a cloud based environment, so I would be, like, I would be, first of all, to handle the reliability and scalability concerns, I would be using, uh, application load balancers application load balancers to ensure scalability. Actually, I don't want any any particular node to be like, any particular node to be overloaded with data processing. So I would use so I would use, uh, uh, actually, I would use, uh, processing of data to I'm sorry. There's some disturbance at my end. Actually, if that's basically, I have integrated Python based retail processes with airflow in the past. And, actually, my main concern would be to use the appropriate airflow operator to ensure reliability of performance. Basically, I want to use an operator which allows us which allows us to, like, allows us to, like, effectively handle the data. And, uh, my task would be to ensure that, uh, like, all the infrastructure is highly highly scalable and, uh, if possible, serverless. Like, by serverless, I mean that we are not concerned with the, uh, infrastructure provisioning of the data. Basically, the underlying cloud service takes care of provisioning the infrastructure for us as and when it is needed. Like, as and when the data flow reaches a particular threshold, then we'll automatically get a new infrastructure piece. And I'll I will use actually, that is widely available in the GCS cloud as well as the AWS cloud. So, yeah, I would use that application load balancer service extensively. Yeah. Thanks.