profile-pic

Suman khatua

Data Engineer with 5 years of IT industry experience with technical skills in the full development life cycle of software applications including requirement gathering, architecture design, and project planning. Managing the execution of the project from development to production release and maintaining production releases.


Have been responsible for the design and development of multiple applications involving data integration from ~10 Operational Data stores to Enterprise Data Warehouse while applying business logic and requirements.


Good working knowledge of NoSQL database for MongoDB and HBase.


Good coding skills in Python,Pyspark,DBT and SQL. Good understanding of algorithms and implementation of the same efficiently. Have basic knowledge of AWS and AZURE cloud services. Worked on Sqoop, Flume, RedShift, S3, EMR, etc.


Experienced in ingesting data from multiple data sources and deriving meaningful insights.


  • Role

    Data Engineer

  • Years of Experience

    5 years

Skillsets

  • AWS - 2 Years
  • Azure
  • dbt
  • ETL - 4 Years
  • Hadoop
  • HBase
  • Mongo DB
  • MySQL
  • NO SQL - 2 Years
  • PySpark
  • Python - 5 Years
  • Spark
  • SQL - 5 Years
  • S3 - 5 Years
  • Redshift - 2 Years

Professional Summary

5Years
  • Oct, 2022 - Present2 yr 9 months

    Senior Data Engineer (Assistant Manager)

    Genpact

Applications & Tools Known

  • icon-tool

    Jupyter

  • icon-tool

    pyCharm

  • icon-tool

    Jira

  • icon-tool

    Bitbucket

  • icon-tool

    Spark

  • icon-tool

    Hive

  • icon-tool

    Hadoop

  • icon-tool

    Apache

  • icon-tool

    Databricks

  • icon-tool

    Data Lake

  • icon-tool

    Microsoft Azure

  • icon-tool

    MySQL

  • icon-tool

    PostgreSQL

  • icon-tool

    MongoDB

  • icon-tool

    Redshift

  • icon-tool

    RDS

  • icon-tool

    CosmosDB

  • icon-tool

    MS Excel

Work History

5Years

Senior Data Engineer (Assistant Manager)

Genpact
Oct, 2022 - Present2 yr 9 months
    Refined and enriched insights of trends through the development of full pipelines using Global Transactional Data, creating dashboards using Python, SQL, Shell-script, and working with various source connections.

Achievements

  • Secured 3.7/4 GPA
  • Secured 75%
  • Ranked among top 5% in ECE Batch
  • 5 star Gold badge in SQL on HackerRank platform
  • 5 star Gold badge in Python on HackerRank platform
  • 2 star Bronze badge in Problem Solving on HackerRank platform

Major Projects

1Projects

ETL and Data Analysis

    Extracting transactional data from MySQL RDS to HDFS (EC2), transforming data using PySpark, and loading into S3 and Redshift.

Education

  • Post Graduate Diploma in Data Engineering

    IIIT Bangalore (2023)
  • Bachelor of Technology in Electronics and Communication Engineering

    SOE, Cochin University of Science and Technology (2019)

Certifications

  • Databricks certified associate data engineer

  • Aws certified cloud practitioner

  • 3x microsoft azure certified

  • Advance sql for data science

  • Python data science certification

  • Infosys machine learning certified

  • Dbt fundamentals