profile-pic

Sachin Chaudhary

Responsible for leading a dynamic sales team to drive growth. With over 8 years of experience, I have a proven track record of exceeding sales targets and enhancing client satisfaction through innovative sales strategies and exceptional customer service.

  • Role

    Data & Neo4j Engineer

  • Years of Experience

    3.1 years

Skillsets

  • Notepad++
  • Git
  • Github
  • GitLab
  • Hadoop
  • IntelliJ IDEA
  • Java
  • Maven
  • Neo4j
  • Eclipse MAT
  • Object-Oriented Programming
  • Oracle
  • pandas
  • PostgreSQL
  • PySpark
  • Python
  • SQL
  • VS Code
  • Bitbucket
  • Amazon Athena
  • Amazon EMR
  • Amazon S3
  • Apache Airflow
  • apache hive
  • Apache Spark
  • Aws emr serverless
  • AWS Glue
  • Agile
  • CQL
  • data management
  • Data Modeling
  • Data Structures and Algorithms
  • Data Warehousing
  • database management systems
  • Distributed Computing

Professional Summary

3.1Years
  • Dec, 2022 - Present3 yr 3 months

    Data Engineer

    Accenture
  • Sep, 2021 - Dec, 20221 yr 3 months

    Associate Data Engineer

    Accenture

Work History

3.1Years

Data Engineer

Accenture
Dec, 2022 - Present3 yr 3 months
    Contributed as a key developer in the design and development of a custom Data Lineage Framework, enabling comprehensive lineage tracking for data ingestion and ETL pipelines. Integrated the open-source Spline framework to capture metadata from Apache Spark jobs running on AWS EMR, AWS EMR Serverless, and AWS Glue. Developed custom parsing logic to extract and transform captured lineage data, enabling Neo4j for graphical visualization of job dependencies using APIs. Engineered a configuration-driven parsing framework from scratch, enabling the capture and graphical representation of lineage for non-Spline-supported jobs. Collaborated with cross-functional teams to integrate lineage capabilities into broader data pipeline ecosystems and align with governance and compliance frameworks.

Associate Data Engineer

Accenture
Sep, 2021 - Dec, 20221 yr 3 months
    Designed and developed a scalable PySpark-based Data Ingestion Framework leveraging Apache Spark, AWS S3, AWS EMR and Hadoop, capable of ingesting terabyte-scale datasets efficiently. Built and integrated pre-ingestion data validation checks using PySpark DataFrames to ensure high data quality and consistency. Improved ingestion performance by optimizing Spark job execution, partitioning strategy, and I/O operations, resulting in a 30% reduction in processing time. Integrated the framework with orchestration tools like Apache Airflow and metadata/catalog management via AWS Glue, enabling seamless pipeline scheduling and governance. Ensured adherence to data governance, scalability, and security best practices in a distributed data processing environment.

Education

  • Bachelor of Technology in Computer Science Engineering

    Aligarh College of Engineering and Technology (2021)
  • Intermediate Education, PCM

    Radiant Stars English School (2017)