Development of pipelines in UK_SME department to generate historical and incremental data from different sources of bank which would be used by modelling team to calculate PD,LDG,EAD and RWA for credit risk measurement. Interacted daily with the Business Analysts to gather functional requirements and understand the business logics to convert those into SQL Queries those could be implemented in Pyspark transformation in AWS infrastucure. Involved in migration of the pipelines from shared EMR cluster to Container approach and able to reduce the run time by 60% for 18 yr of historic data generation. Worked on data sourcing to AWS S3 from SAS and on-prem cluster so that workflows can be run on top of those tables. Used Airflow to build ETL solution and to manage the complex workflows. Automated the deployment from bitbucket/gitlab to AWS Emr using TeamCity CI/CD pipeline. Developed a DQ check pyspark utility check which runs in EMR and help to check the data quality of Datamart tables with hundred/thousand of columns. Tech stack used was PySpark,Python,AWS,Airflow,TeamCity etc.