Analyst - Data Engineering
LatentView AnalyticsAug, 2021 - Oct, 20243 yr 2 months
Optimized Data Pipelines: Designed and developed scalable, production-ready ETL pipelines for data ingestion and transformation, ensuring seamless integration with BI applications. Automated recurring processes for data extraction, cleaning, and transformation using SQL, PySpark, and Shell scripting improving reporting timelines. Migrated Hive scripts to PySpark, enhancing performance and efficiency, and automated data copying between prod and stage Hadoop clusters. Migrated existing Cron jobs to Tidal Scheduler, ensuring reliable and timely execution of data processing tasks. Achieved Databricks Certified Data Engineer Associate certification, contributing to cloud migration projects. Analyzed job failures, implemented preventive actions, and introduced monitoring solutions to maintain stable data workflows. Built and maintained ETL pipelines using Alteryx to ingest, transform, and load data into AWS Redshift, ensuring seamless data flow and integration. Implemented robust data quality checks and optimized SQL and Python scripts for data cleaning, enhancing data accuracy and usability. Designed and implemented end-to-end BI solutions using MicroStrategy and Power BI, including advanced DAX queries and interactive dashboards for actionable insights. Developed multiple regression forecasting models and implemented machine learning models (Isolation Forest, Prophet) for anomaly detection, achieving high prediction accuracy. Created MicroStrategy Dossiers, Documents, and Power BI dashboards (e.g., Decomposition Trees), enabling detailed analysis and data-driven decision-making. Automated data ingestion from S3, Airtable, and Adobe Dashboards using APIs and Alteryx, integrating the data into Redshift for reporting. Led client discussions, provided proactive mentorship to junior team members, and ensured smooth project delivery.