Senior Data Engineer
CommerceIQ.AIFeb, 2023 - Present2 yr 6 months
Constructed efficient data pipelines using PySpark, DBT, BigQuery, and Apache Airflow, resulting in improved data processing speed and reliability. Developed Enterprise-Level Data Pipeline Template Repository Architected a comprehensive template repository integrating Apache Airflow with AWS Batch and Databricks for scalable data processing, enabling standardized deployment of both batch and data-intensive workflows Implemented robust monitoring system with PagerDuty and Slack integrations for real-time alerts, reducing incident response time and improving system reliability Established development best practices by incorporating pre-commit hooks with Black and Flake8, ensuring consistent code quality and maintainability across teams Containerized the solution using Docker and automated deployment to Amazon ECR, enabling seamless CI/CD pipeline integration and standardized environment management Engineered infrastructure-as-code using Terraform to provision and manage AWS resources (ECR, Batch, IAM roles, networking), ensuring consistent and reproducible infrastructure deployment across environments Spearheaded the development and productization of a data lake solution, providing critical data access that enhanced decision-making for customers.