Ambuj Kumar

Senior Data Engineer with 9+ years of experience in building data intensive applications, tackling challenging architectural and scalability problems, managing data repos for efficient visualization, for a wide range of products. Highly analytical team player, with the aptitude for prioritization of needs/risks. Constantly striving to streamlining processes and experimenting with optimising and benchmarking solutions. Creative troubleshooter/problem-solver and loves challenges. Experience in implementing ML Algorithms & CI/CD using distributed paradigms of Spark/Flink, in production, on Azure Databricks/AWS Sagemaker/MLFlow. Experience in shaping and implementing Big Data architecture for Medical Devices,Retail, Banking, Games and Transport Logistics domain (IOT).

Role
Senior Data Engineer
Years of Experience
9 years
Professional Portfolio
View here

Skillsets

Redshift
MLFlow
MLlib
MongoDB
MQTT
Neo4j
Oozie
pandas
Pig
PostgreSQL
Python
Luigi
S3
Sagemaker
Scala
Scikit learn
Spark
SQL
Structured Streaming
Tableau
TensorFlow
Terraform
DVC
Akka
AWS
Azure
Azure datalake
Cassandra
ClickHouse
Cosmos
Databricks delta
dbt
Docker
Airflow
Flask
Flink
GCP
GitFlow
Hive
Java
Kafkastreams
Kubernetes
Looker

Professional Summary

9Years

Aug, 2022 - Present3 yr 6 months
Senior Data Engineer
British Petroleum
Mar, 2021 - Jun, 20221 yr 3 months
Senior Software Engineer
StrongArmTech
Feb, 2019 - Dec, 20201 yr 10 months
Senior Data Engineer Advanced
Jones LaSalle Lang Technologies
Jun, 2014 - Oct, 20173 yr 4 months
Software Developer
General Electric Corp
Oct, 2017 - Dec, 20181 yr 2 months
Senior Data Engineer
Robert Bosch Engineering Solutions

Applications & Tools Known

Spark
Flink
PostgreSQL
Cassandra
MongoDB
Redshift
Clickhouse
Snowflake
Airflow
Luigi
Looker
Tableau
Azure DataLake
S3
AWS
Azure
GCP
Databricks
Docker
Kubernetes
Terraform
GitFlow
MLFlow
DVC
SageMaker

Work History

9Years

Senior Data Engineer

British Petroleum

Aug, 2022 - Present3 yr 6 months

Worked on a realtime streaming and batch lambda architecture pipeline for ingesting blockchain events and populating KPIs/dashboard in DeltaLake. Created batch/streaming analytics jobs for the lambda architecture using Airflow managed periodic PySpark jobs, writing to DeltaLake. Modeled data warehouse for KPI tracking on Snowflake (OLAP) and Databricks Delta. Created and managed DBT models for extensive data quality enforcement on DBT cloud. Modeled and created updating pipelines to a Neo4j knowledge graph for end user data relationship management product. Used GitActions, Docker, Kubernetes, Terraform for CI/CD operations. Ensured GDPR and CCPA compliant data platform.

Senior Software Engineer

StrongArmTech

Mar, 2021 - Jun, 20221 yr 3 months

Created streaming pipelines to ingest sensor data and process them in real time to populate dashboards and the warehouse. Created pipelines for sensor data published into Kinesis (and S3 for failsafe reprocessing) ingested by a Databricks job, written into Azure Delta tables and Clickhouse (GCP earlier). Worked on Looker and SQL Analytics dashboards for Clickhouse/GCP. Data quality testing and improvement via periodic comparison jobs. Built pipelines as part of a SOLID principled ML codebase including ad hoc time bound backruns, API CDC jobs for metadata entities and the MLlib production optimized code, in Python (including Pandas API). Designed and integrated entities of the product using Databricks Delta (parquet DeltaLake) and Clickhouse. Used Terraform and Github Actions for DevOps/Infra/CI CD. Ensured GDPR, CCPA, and HIPAA compliant data platform.

Senior Data Engineer Advanced

Jones LaSalle Lang Technologies

Feb, 2019 - Dec, 20201 yr 10 months

Worked on multiple API source ingestion, dump schema creation and entity modelling using Cosmos and Scala Azure Functions. Worked on global multi region sources and associated rule based implementation of Spark Azure Databricks notebooks driven ETL region specific pipelines. Integrated entities in the property domain using Azure Cosmos Graph and Azure Databricks Notebooks, followed by Scala web-service APIs deployed on Azure HDinsights for quick search. Worked on streaming data application element of the pipeline, detecting refreshes. Designed individual table based schema handling, ingestion and implementation of a data warehouse for KPI tracking and its respective components for a full fledged reporting data warehouse. Created Spark jobs for handling daily data from Mongo, MySQL, Postgres and folder dumps to update the data warehouses, using Airflow scheduling. Managed scaled ingestion from public competitor APIs for tracking relevant parameters in analytics warehouse on Redshift. Worked on complex custom reporting Spark logic driving insightful marketing strategy. Benchmarked the real-time elements of the solution with Kafka Streams.

Senior Data Engineer

Robert Bosch Engineering Solutions

Oct, 2017 - Dec, 20181 yr 2 months

Created Spark batch jobs based on derivation from incoming data-model via a productionised ML model with associated business logic. Implemented Flask APIs layer and simulator for the application. Tested end-to-end pipeline and DevOps of associated individual component log monitoring. Overall design and development of the lambda architecture: MQTT based, Kafka, Spark pipeline for data ingestion and alert detection. Developed cloud agnostic framework. Created Scala Flink complex event processing and detection pipeline from incoming data-model with business logic. Worked on APIs layer implementation in Akka and a simulator for data. Tested end-to-end pipeline and DevOps of associated individual component log monitoring on AWS. Created data format based overall design and development of a MQTT based, Kafka, Flink, RDBMS and Cassandra pipeline for data ingestion and event/milestone detection.

Software Developer

General Electric Corp

Jun, 2014 - Oct, 20173 yr 4 months

GE Healthcare Device Monitoring Product: Deployment and maintenance of the Azure cloud based cluster (DevOps), along with pipeline design and data handling constraint using a Data Virtualization tool. Implemented detection algorithms of different respiration and lung parameters, and accumulation algorithms for case-end aggregation requirements. Data modeling for Cassandra for real-time data storage and case-end data aggregation. Data modeling for data-warehousing and UI based consumption. Company Log Data Analytics: Involved in PIG scripting and the HIVE database, to staging layer for processing before loading into final Hadoop table. Worked on OOZIE workflows for executing Java, Pig and Hive actions based on decision nodes, scheduled Oozie Workflow and Coordinator Jobs.

Education

Bachelors in Engg.
Syb University (2014)

Certifications

Consensys certified blockchain developer
Oracle certified associate java programmer se 7
Oracle certified oracle database 11g_sql advanced

Ambuj Kumar

Senior Data Engineer

9 years

View here

Skillsets

Professional Summary

Applications & Tools Known

Work History

Senior Data Engineer

Senior Software Engineer

Senior Data Engineer Advanced

Senior Data Engineer

Software Developer

Education

Bachelors in Engg.

Certifications

Consensys certified blockchain developer

Oracle certified associate java programmer se 7

Oracle certified oracle database 11g_sql advanced