profile-pic

Akhil Prasad

Akhil Prasad

Skilled in practices around Data Warehousing, Data Marts, Data Models, ETL & ELT for large-scale data management.


Built batch & streaming pipelines using Spark & Hadoop ecosystem capable of handling terabytes of data on day-to-day basis.


Worked on Cloud migration of thousands of pipelines (On-Primm to GCP).


Part of the Data Management team, involved in Automation & Support, Platform Engineering, and Data Governance activities.


Build & managed Data Pipeline frameworks (using Python, Scala) used for data ingestion activities across teams/markets.


Expert in writing Optimized SQL queries for OLTP & OLAP workloads using Indexes, Join, Aggregate, Window functions, CTEs, and more.


Worked in Retail, Banking, Financial Services & Insurance domain.


Collaborated on quarterly roadmaps, optimizing resource allocation and ensuring clear stakeholder communication.


Led sprint planning, prioritizing backlog, managing team capacity, and providing technical guidance for timely & high-quality delivery.

  • Role

    Senior Data Engineer

  • Years of Experience

    6 years

Skillsets

  • Indexes
  • SQL - 6 Years
  • Python - 6 Years
  • NoSql - 2 Years
  • Linux - 6 Years
  • AWS - 1 Years
  • Snowflake - 1 Years
  • Pyspark - 6 Years
  • Big Data Technology - 6 Years
  • Ctes
  • Window functions
  • Aggregate
  • Join
  • Data Warehousing
  • Data models
  • Data Cleaning
  • Data Governance
  • OLTP
  • OLAP
  • CI/CD
  • ELT
  • ETL - 6 Years
  • REST API
  • Multi-Threading
  • Multi-Processing

Professional Summary

6Years
  • Oct, 2023 - Present1 yr 6 months

    Senior Data Engineer

    Walmart Global Tech India
  • Feb, 2022 - Sep, 20231 yr 7 months

    Data Engineer - III

    Walmart Global Tech India
  • Sep, 2018 - Jan, 20223 yr 4 months

    Data Engineer - Associate

    Infosys Limited

Applications & Tools Known

  • icon-tool

    Spark

  • icon-tool

    Hadoop

  • icon-tool

    GCP

  • icon-tool

    SQL

  • icon-tool

    Python

  • icon-tool

    Scala

  • icon-tool

    Java

  • icon-tool

    C

  • icon-tool

    C++

  • icon-tool

    Bash scripting

  • icon-tool

    Hive

  • icon-tool

    HDFS

  • icon-tool

    Sqoop

  • icon-tool

    Kafka

  • icon-tool

    Kafka Connect

  • icon-tool

    Oracle

  • icon-tool

    MSSQL

  • icon-tool

    MySQL

  • icon-tool

    Dremio

  • icon-tool

    PostgreSQL

  • icon-tool

    Snowflake

  • icon-tool

    Dataproc

  • icon-tool

    Batch

  • icon-tool

    IAM

  • icon-tool

    BigQuery

  • icon-tool

    Airflow

  • icon-tool

    Looker

  • icon-tool

    Power BI

  • icon-tool

    Tableau

  • icon-tool

    Apache Superset

  • icon-tool

    Docker

  • icon-tool

    Jenkins

  • icon-tool

    Git

  • icon-tool

    Maven

  • icon-tool

    Excel

Work History

6Years

Senior Data Engineer

Walmart Global Tech India
Oct, 2023 - Present1 yr 6 months

    Led a team of 6 engineers in maintaining & enhancing in-house data pipeline frameworks (Scala & Python) for streaming and batch data ingestion across teams and domains (used by 6000+ pipelines), achieving a 70% reduction in pipeline creation effort.


    Led a team of 25 data lake support personnel (L1), empowering them to conduct independent initial failure analysis and due diligence. This enabled self-sufficiency in monitoring and supporting over 6000+ big data pipelines across markets.


    Partnered with stakeholders to define and implement technical solutions be it on framework or platform level for various data needs.


    Worked on Airflow migration (Kubernetes to celery), version upgrades and optimization with growing pipelines count.


    Established and managed CI/CD components utilizing Git, Maven, Jenkins, and Docker for growing pipelines and framework changes.


    Established data lake best practices and conducted Proof-of-Concepts (POCs) for new tools & technologies, driving adoption and improving data lake functionality.


    Mentored junior engineers, fostered collaboration, and ensured knowledge transfer for successful project execution.

Data Engineer - III

Walmart Global Tech India
Feb, 2022 - Sep, 20231 yr 7 months

    Built a scalable Scala Spark application for e-commerce fraud detection, processing diverse data streams (sales, refunds, cancellation etc.) to identify various fraudulent activities, including employee/associate/partner collusion, frequent returns, and linked accounts.


    Collaboration with UI/UX team as part of UI Integration & UAT ensuring data integrity & accuracy on dashboards.


    Collaborated with Data Science team (providing data for model creation, integrating models into pipelines for fraud detection).


    Used Apache Freemarker for generating dynamic SQL queries based upon dashboard interaction.


    Provided Interactive Dashboards to Analysts by implementing dynamic SQL queries using Apache Freemarker.


    Migrated scala based on-premise application to Cloud Platform (GCP) and optimized (caching, profiling, tuning spark parameters, etc) to attain cost savings of $4500 per month and a 250% improvement in execution time.


    Migrated pipelines from Automic to Apache Airflow, applying optimizations leading to minimum task queue.


    Automated various data management tasks like bucket creation, BigQuery table refresh with updated metadata from GCS buckets, etc.


    Leveraged Erwin Data Modeler to streamline data modeling and transformation processes.

Data Engineer - Associate

Infosys Limited
Sep, 2018 - Jan, 20223 yr 4 months

    Utilized Scoop & jdbc for ingesting data into Data Lake from various RDBMS systems, cosmos and cassandra sources.


    Perming Data cleaning activities and loading data into catalog zone in using Hive SQL, Spark SQL and PySpark.


    Optimized pipelines resulting in 40% to 350% improvement in performance (using partitioning, tuning spark parameters)


    Used Autosys for workflow management and orchestrating spark jobs.


    Created technical design, data model and documentation of the solution.


    Experienced in consuming data from REST API endpoints using python libraries like requests, pycurl, urllib3.


    Used Python multi-threading and multi-processing to increase performance by several magnitudes.


    Worked with different data file formats like xml, yaml, json, hocon, csv, txt, orc, avro, parquet, etc.

Achievements

  • Achieved a 70% reduction in pipeline creation effort.
  • Empowered L1 support personnel for independent initial failure analysis.

Education

  • B.C.A

    GGSIPU (Delhi) (2018)
  • 12th

    KV JNU (Delhi) (2015)
  • 10th

    KV Vasant Kunj (Delhi) (2013)

Certifications

  • Az-900 - microsoft certified azure fundamentals

  • Dp-900 - microsoft certified azure data fundamentals