Senior Data Engineer
FARFETCHFeb, 2022 - Present3 yr 11 months
Led end-to-end data integration solutions using Azure Data Factory (ADF) and Azure Databricks to orchestrate data movement and transform from various sources, including Azure Blob Storage, Azure SQL Database, Azure Data Lake Storage, and on-premises systems. Implemented error handling and logging mechanisms within Azure Data Factory, able to reduce data processing errors. Developed Azure Databricks notebooks using PySpark code to efficiently handle large volumes of data and execute complex data transformations, enhancing the overall data processing capabilities. Implemented optimizations in Azure Databricks, including partitioning, broadcast joins, caching strategies, and Spark cluster configuration adjustments, led to a remarkable 30% improvement in Spark job performance. Collaborated with stakeholders and product owners to analyze requirements and design appropriate solutions, following the Agile/Scrum methodology to ensure efficient project delivery. Conducted in-depth analysis of PySpark Directed Acyclic Graph (DAG) execution plans, identified and addressed bottlenecks in Spark SQL queries, significantly improving the efficiency of PySpark job execution by 40% and reducing unnecessary data shuffling. Successfully implemented multiple pipelines to process influencer data from 3rd Party API. Contributed to business insights by providing valuable information on marketing investments and ROI.