
With 7 years of experience with SQL, Python, PySpark, AWS, ETL processes, BI
tools and data visualization, I have delivered impactful solutions, including
driving $3.7B+ in GMS value and reducing workloads by 6.5-man hours through automation.
My expertise extends to people and program management, enabling efficient collaboration across teams. I am eager to bring this blend of technical and leadership skills to optimize your data infrastructure and drive measurable outcomes.
Subject Matter Expert (Analytics, Intelligence and Engineering)
AMAZONLead Tech Operations Associate (Analytics, Intelligence and Engineering)
AMAZONTech Operations Associate 1 (Analytics, Intelligence and Engineering)
AMAZON
Python
Pyspark
AWS (Amazon Web Services)

Spark SQL

Amazon Redshift
.png)
Apache Spark

Microsoft Power BI

Amazon QuickSight

GitHub

MySQL Workbench
Microsoft Excel
.png)
Docker
.png)
Jenkins
Responsibilities:
1. Identified the various input sources and
centralizing it into an Amazon S3 data lake.
2. Transformed raw data using PySpark on AWS Glue to meet the
requirements of downstream analytics systems.
3. Loaded processed data into Amazon Redshift
for querying and analytics for fast querying and analytics.
4. Automated the Pipeline Using AWS Step Functions since we had to
maintain a scalable and reliable ETL pipeline.
5. Implemented data quality checks using AWS Glue Data Brew and
monitored the pipeline with Amazon Cloud Watch to collect
logs and metrics. Defined alarms to detect and respond to errors.
1. Created a star schema consisting of fact table and
multiple dimension tables.
2. Optimized redshift for performance by tuning and schema
optimization.
3. Utilized AWS Glue ETL jobs with PySpark to clean, transform, and load Data into Redshift.
4. Automated the ETL Pipeline Using Amazon MWAA (Managed Apache
Airflow).
5. Validated Data Consistency between source and destination tables.
6. Build Dashboards in Amazon Quicksight to showcase Insights from the Data Mart.