Sachin Chaudhary

Responsible for leading a dynamic sales team to drive growth. With over 8 years of experience, I have a proven track record of exceeding sales targets and enhancing client satisfaction through innovative sales strategies and exceptional customer service.

Role
Data & Neo4j Engineer
Years of Experience
3.1 years

Skillsets

Notepad++
Git
Github
GitLab
Hadoop
IntelliJ IDEA
Java
Maven
Neo4j
Eclipse MAT
Object-Oriented Programming
Oracle
pandas
PostgreSQL
PySpark
Python
SQL
VS Code
Bitbucket
Amazon Athena
Amazon EMR
Amazon S3
Apache Airflow
apache hive
Apache Spark
Aws emr serverless
AWS Glue
Agile
CQL
data management
Data Modeling
Data Structures and Algorithms
Data Warehousing
database management systems
Distributed Computing

Professional Summary

3.1Years

Dec, 2022 - Present3 yr 3 months
Data Engineer
Accenture
Sep, 2021 - Dec, 20221 yr 3 months
Associate Data Engineer
Accenture

Work History

3.1Years

Data Engineer

Accenture

Dec, 2022 - Present3 yr 3 months

Contributed as a key developer in the design and development of a custom Data Lineage Framework, enabling comprehensive lineage tracking for data ingestion and ETL pipelines. Integrated the open-source Spline framework to capture metadata from Apache Spark jobs running on AWS EMR, AWS EMR Serverless, and AWS Glue. Developed custom parsing logic to extract and transform captured lineage data, enabling Neo4j for graphical visualization of job dependencies using APIs. Engineered a configuration-driven parsing framework from scratch, enabling the capture and graphical representation of lineage for non-Spline-supported jobs. Collaborated with cross-functional teams to integrate lineage capabilities into broader data pipeline ecosystems and align with governance and compliance frameworks.

Associate Data Engineer

Accenture

Sep, 2021 - Dec, 20221 yr 3 months

Designed and developed a scalable PySpark-based Data Ingestion Framework leveraging Apache Spark, AWS S3, AWS EMR and Hadoop, capable of ingesting terabyte-scale datasets efficiently. Built and integrated pre-ingestion data validation checks using PySpark DataFrames to ensure high data quality and consistency. Improved ingestion performance by optimizing Spark job execution, partitioning strategy, and I/O operations, resulting in a 30% reduction in processing time. Integrated the framework with orchestration tools like Apache Airflow and metadata/catalog management via AWS Glue, enabling seamless pipeline scheduling and governance. Ensured adherence to data governance, scalability, and security best practices in a distributed data processing environment.

Education

Bachelor of Technology in Computer Science Engineering
Aligarh College of Engineering and Technology (2021)
Intermediate Education, PCM
Radiant Stars English School (2017)

Sachin Chaudhary

Data & Neo4j Engineer

3.1 years