profile-pic
Vetted Talent

Rajat Gupta

Vetted Talent

Senior Data Engineer with over 10 years of experience in the banking, finance & telecom for Fortune 500 clients with certifications in AWS and Databricks. I leverage my experience to design, build, and implement scalable analytics solutions with data engineering workflows and complex data pipelines on the cloud. I work with cross-functional teams to deliver high-quality data products and services for the banking and finance sector, using cutting-edge technologies such as Data Engineer, DevOps, Confluent Kafka, Spark, and ML technologies including OpenAI. Extensive professional experience in software architecture, design, development and integration. Design, build and implementation of scalable analytics solutions with data engineering workflows and complex data pipelines both on-premise and AWS Cloud. Developing large-scale distributed applications using Hadoop (HDP & CDH), MR, Hive, Spark, Streaming, Kafka. Building enterprise cloud data platform on AWS Creating enterprise Data Lake and modern Data Warehouse capabilities and patterns. Creating DevOps CI/CD pipelines using Git, Jenkins, Dockers, Kubernetes.

  • Role

    Data Engineer

  • Years of Experience

    10 years

Skillsets

  • PyCharm
  • Terraform
  • Teradata
  • SQS
  • SQL Server
  • SNS
  • Seaborn
  • SciPy
  • Scikit-learn
  • Sagemaker
  • REST API
  • RDS
  • Zookeeper
  • Postgre SQL
  • pandas
  • Oracle
  • NumPy
  • Maven
  • Matplotlib
  • Lambda
  • Kubernetes
  • Kinesis
  • Kafka
  • AWS EMR
  • StatsModels
  • Random
  • Math
  • Beautiful Soup
  • Pyplot
  • Plot.ly
  • AutoSys
  • Ibm rational team concert
  • Jupyter Notebook
  • IntelliJ IDEA
  • Ibm rad
  • Jira
  • Confluent Kafka
  • Cloudera
  • Sbt
  • Dataproc
  • CloudSQL
  • BigTable
  • ADLS
  • DMS
  • Sqoop
  • MapReduce
  • SVM
  • Redshift - 5 Years
  • Spark - 8 Years
  • MySQL - 8 Years
  • Linear/Logistic Regression
  • Core Java
  • SQL - 8 Years
  • ANN
  • Time Series
  • Hierarchical clustering
  • PCA
  • Random Forests
  • ETL - 10 Years
  • Logistic Regression
  • DevOps
  • CI/CD
  • Hadoop
  • Python - 6 Years
  • Java
  • PySpark
  • AWS - 6 Years
  • AWS
  • Big Data - 8 Years
  • DynamoDB
  • Jenkins
  • Hive
  • HDFS
  • Gradle
  • Glue
  • Git
  • GCP
  • EMR
  • Eclipse
  • EC2
  • Big Data
  • Docker
  • Databricks
  • Composer
  • Bitbucket
  • BigQuery
  • Azure
  • Airflow
  • ADF
  • S3 - 5 Years
  • Snowflake - 2 Years

Vetted For

11Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Data EngineerAI Screening
  • 50%
    icon-arrow-down
  • Skills assessed :BigQuery, AWS, Big Data Technology, ETL, NO SQL, PySpark, Snowflake, 組込みLinux, Problem Solving Attitude, Python, SQL
  • Score: 45/90

Professional Summary

10Years
  • May, 2024 - Aug, 2024 3 months

    AWS Data Engineer

    Synechron Technologies
  • May, 2024 - Aug, 2024 3 months

    Lead Data Engineer

    Synechron Technologies
  • Jan, 2022 - Dec, 20231 yr 11 months

    Module Lead

    National Stock Exchange Information Technology (NSE-IT)
  • Jan, 2020 - Feb, 20211 yr 1 month

    Senior Big Data Developer

    Collabera Technologies
  • Mar, 2021 - Aug, 2021 5 months

    Software Engineer

    NatWest Group (Royal Bank of Scotland)
  • Mar, 2021 - Aug, 2021 5 months

    Lead AWS Data Engineer

    NatWest Group (Royal Bank of Scotland)
  • Mar, 2018 - Aug, 20191 yr 5 months

    Senior Data Engineer

    Wipro Technologies
  • Mar, 2018 - Aug, 20191 yr 5 months

    Specialist

    Wipro Technologies
  • Jul, 2014 - Sep, 20173 yr 2 months

    Senior Data Engineer

    Ericsson Global India Services Pvt. Ltd.
  • Mar, 2011 - Aug, 20121 yr 5 months

    Freelancer

    IIT Bombay
  • Mar, 2013 - Jul, 20141 yr 4 months

    Associate Software Engineer

    Gopisoft Pvt. Ltd.
  • Jul, 2014 - Sep, 20173 yr 2 months

    Assistant Engineer

    Ericsson Global India Services Pvt. Ltd.

Applications & Tools Known

  • icon-tool

    HDFS

  • icon-tool

    Sqoop

  • icon-tool

    Hive

  • icon-tool

    Impala

  • icon-tool

    Oozie

  • icon-tool

    Spark

  • icon-tool

    Kafka

  • icon-tool

    Airflow

  • icon-tool

    AWS

  • icon-tool

    Azure

  • icon-tool

    GCP

  • icon-tool

    Maven

  • icon-tool

    Gradle

  • icon-tool

    REST API

  • icon-tool

    Bitbucket

  • icon-tool

    Git

  • icon-tool

    Jira

  • icon-tool

    Oracle

  • icon-tool

    MySQL

  • icon-tool

    Teradata

  • icon-tool

    SQL Server

  • icon-tool

    PostgreSQL

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    Terraform

  • icon-tool

    Jenkins

  • icon-tool

    Cloudera

  • icon-tool

    Databricks

  • icon-tool

    AWS EMR

  • icon-tool

    Eclipse

  • icon-tool

    IBM RAD

  • icon-tool

    PyCharm

  • icon-tool

    IntelliJ IDEA

  • icon-tool

    Jupyter Notebook

  • icon-tool

    HDFS

  • icon-tool

    Sqoop

  • icon-tool

    Hive

  • icon-tool

    Airflow

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    Maven

  • icon-tool

    SQL Server

  • icon-tool

    Terraform

  • icon-tool

    Databricks

  • icon-tool

    Jupyter Notebook

  • icon-tool

    Scikit-learn

  • icon-tool

    NumPy

  • icon-tool

    Pandas

  • icon-tool

    SciPy

  • icon-tool

    Pyplot

  • icon-tool

    Beautiful Soup

  • icon-tool

    Matplotlib

  • icon-tool

    Seaborn

Work History

10Years

AWS Data Engineer

Synechron Technologies
May, 2024 - Aug, 2024 3 months
    • Designed and developed robust data pipelines using Spark and Hive to support ETL processes, ensuring timely and accurate data flow across multiple sources and destinations.
    • Managed and optimized AWS infrastructure to support large-scale data processing, implementing best practices for cost efficiency, data security, and scalability.
    • Led data modeling efforts to enhance data accuracy and accessibility, transforming raw data into structured formats suitable for analytics and reporting.
    • Conducted performance tuning on Spark and Hive processes to minimize execution times, improve throughput, and maintain high performance standards.
    • Worked closely with American Express stakeholders to understand data requirements, align on project goals, and deliver solutions that support their business objectives.
    • Provided technical guidance to a team of data engineers, conducting code reviews, mentoring team members, and fostering a collaborative development environment.
    • Implemented data validation and quality checks throughout the pipeline to ensure data integrity, resolving issues proactively to maintain high data quality.

Lead Data Engineer

Synechron Technologies
May, 2024 - Aug, 2024 3 months

Module Lead

National Stock Exchange Information Technology (NSE-IT)
Jan, 2022 - Dec, 20231 yr 11 months
    Create new data-lake for processing financial data related to debt, recovery

Lead AWS Data Engineer

NatWest Group (Royal Bank of Scotland)
Mar, 2021 - Aug, 2021 5 months

    Project Description: New data-lake using AWS Environment

    • Created Data Ingestion Pipelines from various RDBMS databases to AWS S3 layer using AWS DMS
    • Worked on creating ETL data pipelines consisting of PySpark, Glue Jobs, Lambda, DynamoDB, Athena, S3 with parquet files
    • Built new infrastructure in AWS using Terraform & CI/CD Pipeline using Bitbucket and TeamCity

    Tech Stack: Hive, Spark, Python, AWS (EC2, S3, RDS, Redshift, SNS, EMR, Glue, DMS), Airflow, Terraform

Software Engineer

NatWest Group (Royal Bank of Scotland)
Mar, 2021 - Aug, 2021 5 months
    New data-lake using AWS Environment

Senior Big Data Developer

Collabera Technologies
Jan, 2020 - Feb, 20211 yr 1 month
    Integrate new data sources using Spark

Senior Data Engineer

Wipro Technologies
Mar, 2018 - Aug, 20191 yr 5 months

    Project 1: Customer Care IVR Automation

    Description: Automate the current IVR process to remove dependency on Customer Care Executive

    • Handled Data Ingestion via Kafka for real time processing of the data with Spark Streaming for large scale data processing
    • Involved in translation of complex functional and technical requirements into detailed high and low level design
    • Integrated HSM API with Kafka to provide hardware encryption along with software level 256-bit encryption to secure
    • Transactions like debit/credit card details along with integrating with REST API for encryption and decryption of messages
    • Played a key role in transformation by Spark scripts for data transformation from structured & semi-structured data
    • Performed analysis on data by implementing various machine learning algorithms using Spark ML
    • Improved algorithms by deploying best hyperparameter, deployed GridSearchCV for tuning

    Tech Stack: Confluent Kafka, Spark, Python, Machine Learning, IBM RTC, Jenkins, REST API, Web Services, HSM

    Project 2: Datawarehouse Migration

    Description: To create Datalake which shall be the single & comprehensive source of information to improve decision making

    • Setup environment on AWS EMR for development purposes
    • Oversaw data extraction from Charging System nodes Oracle Exadata to HDFS using Data ingestion tool Sqoop
    • Analyzed data by Spark scripts to extract meaning and value from structured data

    Tech Stack: Sqoop, Hive, Spark, Python, AWS (EC2, EMR)

Specialist

Wipro Technologies
Mar, 2018 - Aug, 20191 yr 5 months
    Automate the current IVR process to remove dependency on Customer Care Executive

Senior Data Engineer

Ericsson Global India Services Pvt. Ltd.
Jul, 2014 - Sep, 20173 yr 2 months

    Project 1: Charging System Tariff Plans Regression | U Mobile, Malaysia

    Description: Constant update in tariff raises persistent need to track the impact on ROI which determines end user satisfaction.

    • Developed PySpark scripts to calculate traditional & ad-hoc KPIs from structured & semi-structured data
    • Managed Data ingestion using Sqoop, cleaning and manipulation of data using Spark scripts

    Tech Stack: Sqoop, Hive, Spark, Python

    Project 2: New Datalake for LTE Network

    Description: To create new Datalake which shall be the single & comprehensive source of information for critical KPIs

    • Performed ETL from ENIQ Oracle database to HDFS using Sqoop and processing of Data by Hive scripts
    • Tech Stack: Sqoop, Hive, Shell Scripting

    Project 3: Portal Development

    • Migrated old dashboard written in PHP scripts to new dashboard using Java

    Tech Stack: Core Java, JSP, Servlets, MySQL

Assistant Engineer

Ericsson Global India Services Pvt. Ltd.
Jul, 2014 - Sep, 20173 yr 2 months
    Constant update in tariff raises persistent need to track the impact on ROI which determines end user satisfaction

Associate Software Engineer

Gopisoft Pvt. Ltd.
Mar, 2013 - Jul, 20141 yr 4 months

Freelancer

IIT Bombay
Mar, 2011 - Aug, 20121 yr 5 months
    Handled open source development projects including Scilab Textbook Companion funded by Ministry of HRD

Achievements

  • Intel Edge AI Scholarship recipient
  • 80% scholarship recipient from Swades Foundation for PGD in Data Science
  • Organizer of Linux Workshop in association with IIT Bombay
  • Google Scholar for Scilab Consortium
  • Open-Source Developer for Scilab Consortium
  • Awarded Intel Edge AI Scholarship from Intel & Udacity in Dec19
  • Awarded 80% scholarship from an NGO, Swades Foundation for PGD in Data Science in Sep18
  • Organized Linux Workshop in association with IIT Bombay in Amity Youth Fest 12
  • Google Scholar for the open-source development work done for Scilab Consortium
  • Open-Source Developer for Scilab Consortium (INRIA, France) with 2 documents published at scilab.in

Major Projects

1Projects

Scilab Textbook Companion- Internship Projects-

Aug, 2011 - Mar, 2012 7 months

    Project 1: Scilab Textbook Companion | IIT Bombay | Under Ministry of HRD Project

    • Description: Port out examples from standard textbooks and make it easy for users and to improve the documentation for Scilab
    • Published 2 documents which are available at Electronic Communication Systems & Applications Of GSM


    Project 2: Spoken Tutorials | IIT Bombay | Under Ministry of HRD Project

    • Description: An Initiative of "Talk to a Teacher" project for National Mission on Education through ICT, MHRD, Govt. of India
    • Worked as Workshop Ambassador to promote and develop content for FOSSEE

Education

  • Post Graduate Diploma in Data Science

    IIIT Bangalore (2019)
  • Post Graduate Diploma in Advanced Computing

    CDAC Noida (2015)
  • B. Tech. Electronics & Telecommunication

    Amity University, Noida, IN (2012)
  • B. Tech. – Electronics & Telecommunication

    Amity University, Noida, IN (2012)
  • Post Graduate Diploma in Advanced Computing

    CDAC Noida, IN (2013)

Certifications

  • Aws solutions architect associate

  • Databricks data engineer associate

  • Aws solutions architect associate certified in aug 2023

  • Databricks data engineer associate certified from jan’23 to jan’25.

  • Intel edge ai scholarship from intel & udacity in dec19

  • Databricks data engineer associate certified from jan23 to jan25

  • Gcp associate cloud engineer certified from oct’24 to oct’27.

  • Awarded intel® edge ai scholarship from intel & udacity in dec’19

  • Awarded 80% scholarship from an ngo, swades foundation for pgd in data science in sep’ 18

  • Open-source developer for scilab consortium (inria, france) with 2 documents published at scilab.in

  • Google scholar for the open-source development work done for scilab consortium

  • Organized linux workshop in association with iit bombay in amity youth fest ‘12