profile-pic

Pruthvi Raja Reddy

Highly skilled Data Engineer with 6 years of experience in designing, building, and maintaining scalable data solutions. Proficient in big data technologies, data warehousing, and ETL processes, with extensive expertise in AWS & GCP. Proven track record of leading cross-functional teams, optimizing data pipelines, and improving data quality. Experienced in implementing data governance and compliance practices to ensure data security. Strong problem-solving skills and a collaborative approach to achieving project goals.

  • Role

    AWS Cloud Engineer

  • Years of Experience

    6.8 years

Skillsets

  • PySpark
  • Kubernetes - 5.0 Years
  • Docker - 5.0 Years
  • Data Profiling
  • real-time data ingestion
  • AWS EMR
  • Cloud Dataflow
  • Teradata
  • TensorFlow
  • SQL
  • Spark
  • Snowflake
  • Scala
  • Redshift
  • RDBMS
  • Python - 4.0 Years
  • Big Data
  • Kafka
  • Hive
  • HDFS
  • Hadoop
  • Google Cloud Platform
  • ETL
  • Elasticsearch
  • Dataflow
  • Data Warehousing
  • Data lake
  • Data Engineering
  • data encryption
  • Cassandra
  • BigQuery

Professional Summary

6.8Years
  • Jun, 2023 - Present2 yr 2 months

    Sr Data Engineer

    Wipro Ltd | AON
  • Oct, 2021 - May, 20231 yr 7 months

    Data Engineer

    Wipro Ltd/Honeywell
  • Dec, 2019 - Sep, 20211 yr 9 months

    Data Engineer

    Wipro Ltd/Thames Water
  • Jun, 2018 - Nov, 20191 yr 5 months

    IT Service Management

    Wipro Ltd/United Health Group

Applications & Tools Known

  • icon-tool

    MongoDB

  • icon-tool

    Google BigQuery

  • icon-tool

    Amazon Redshift

  • icon-tool

    Snowflake

  • icon-tool

    Amazon S3

  • icon-tool

    Terraform

  • icon-tool

    Docker

  • icon-tool

    Apache Spark

  • icon-tool

    IntelliJ

  • icon-tool

    Visual Studio

  • icon-tool

    Grafana

  • icon-tool

    Gitlab CI/CD

  • icon-tool

    AWS

  • icon-tool

    GCP

  • icon-tool

    Power BI

  • icon-tool

    Confluence

  • icon-tool

    MS Office

  • icon-tool

    ServiceNow

Work History

6.8Years

Sr Data Engineer

Wipro Ltd | AON
Jun, 2023 - Present2 yr 2 months
    Written ETL jobs in using spark data pipelines to process data from different sources to transform data to multiple targets. Designed and implemented data processing pipelines using GCP services such as Cloud Dataflow and Apache Beam to ingest, transform, and analyze large volumes of data. Developed and optimized ETL processes to extract data from various sources, including databases, APIs, and streaming platforms, and load it into BigQuery for analysis. Implemented real-time data streaming solutions using GCP Pub/Sub and Dataflow for continuous data ingestion and processing. Collaborated with data scientists to deploy machine learning models on GCP using TensorFlow and integrated them into data pipelines for predictive analytics. Designed and maintained data warehouses on GCP, optimizing performance and scalability for analytical queries. Developed and maintained data pipelines for processing large volumes of data in a cloud environment. Implemented real-time data ingestion using Pub/Sub and Dataflow, allowing for near-instantaneous data processing. Designed data warehousing solutions with BigQuery, implementing partitioning and clustering for enhanced query performance. Creating Test Automation Framework in Python. Written ETL jobs using spark data pipelines to process data from different sources to transform data to multiple targets. Hands-on experience as Cloud Data Engineering in Big data Hadoop ecosystems such as HDFS, Hive, Spark, Bigquery, Data Bricks, Kafka, Yarn on AWS cloud services and Cloud rational databases. Created streams using Spark and processed real-time data into RDDs & data frames, and created analytics using SPARK SQL. Created ETL Framework using spark on AWS EMR in Scala/Python. Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3. Implemented kinesis data streams to read real-time data and loaded into data S3 for downstream processing. Experienced in writing Spark Applications in Scala and Python. Developed and optimized ETL processes to load and transform data from various sources into Teradata, ensuring data quality and consistency. Collaborated with business stakeholders to identify data analytics requirements and deliver actionable insights using Teradata's advanced analytics capabilities. Created interactive dashboards and reports using Google Data Studio to visualize insights and facilitate data-driven decision-making. Collaborated with business analysts and stakeholders to gather requirements and translate them into technical specifications. Created framework for Data Profiling. Created framework for data encryption. Designed 'Data Services' to intermediate data exchange between the Data Clearinghouse and the Data Hubs. Prepared High-level design documentation for approval. Providing support (24*7), on call.

Data Engineer

Wipro Ltd/Honeywell
Oct, 2021 - May, 20231 yr 7 months
    Operated in a fast-paced, agile environment to quickly analyze, develop, and test potential business use cases. Utilized Spark Streaming APIs to perform real-time data transformations, building a common learner data model from Kafka sources, and persisted the results to Cassandra. Developed Kafka consumer APIs in Scala to consume data from Kafka topics. Consumed XML messages from Kafka and used Spark Streaming to process and capture user interface updates. Created preprocessing jobs to flatten JSON documents into a flat file format. Loaded data from D-Streams into Spark RDDs to perform in-memory data computation for generating output responses. Developed live real-time processing jobs using Spark Streaming with Kafka as the data pipeline system. Imported and exported data from Snowflake, Oracle, and DB2 into HDFS and Hive using Sqoop for analysis, visualization, and report generation. Optimized Spark jobs to run on a Kubernetes cluster for faster data processing. Implemented Elasticsearch on the Hive data warehouse platform to facilitate complex search operations. Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Java, and Scala. Used the Spark DataStax Cassandra Connector to load data to and from Cassandra, and created data models to analyze client data sets. Leveraged the Cassandra Query Language (CQL) for quick data searching, sorting, and grouping. Utilized the Cassandra-stress tool to measure and improve read/write performance on the cluster. Used HiveQL to analyze partitioned and bucketed data, executing Hive queries on Parquet tables stored in Hive to meet business requirements. Implemented Apache Kafka to aggregate web log data from multiple servers, making it available in downstream systems for analysis.

Data Engineer

Wipro Ltd/Thames Water
Dec, 2019 - Sep, 20211 yr 9 months
    Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for the business. Developed real-time data processing applications using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS. Converted data into different formats as per user requirements by streaming data pipeline from various sources such as Snowflake and unstructured data. Designed and developed scalable data pipelines using Google Cloud Platform services such as BigQuery, Dataflow, and Cloud Storage. Experienced in building and architecting multiple Data pipelines, end-to-end ETL and ELT process for Data ingestion and transformation in GCP. Led the design and implementation of scalable data pipelines using Google Cloud Platform services, focusing on BigQuery, Dataflow, and Cloud Storage. Collaborated with cross-functional teams to implement data governance policies and ensure compliance with security standards. Implemented ETL processes to ingest and transform data from various sources into a centralized data warehouse. Collaborated with data analysts and data scientists to ensure data quality and availability for business intelligence and analytics. Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement using PySpark. Developed a framework for converting existing PowerCenter mappings to Pyspark Jobs. Used Spark Streaming APIs to perform transformations and actions on the fly for building a common learner data model which gets the data from Kafka in near real-time and persists it to Cassandra. Migrated on-premise database structure to Confidential Redshift data warehouse. Created Databricks notebooks using SQL, Python, and automated notebooks using jobs. Data extraction, aggregations, and consolidation of Adobe data within AWS Glue using Pyspark. Worked on Amazon S3 for persisting the transformed Spark Data Frames in S3 buckets and using Amazon S3 as a Data-lake to the data pipeline running on Spark and Map-Reduce. Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark/PySpark. Developed Kafka consumer's API in Scala for consuming data from Kafka topics. Consumed XML messages using Kafka & processed XML using Spark Streaming to capture UI updates. Developed Preprocessing jobs using Spark DataFrames to flatten JSON documents into flat files. Loaded D-Stream data into Spark RDD and performed in-memory data computation to generate Output responses. Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipeline system.

IT Service Management

Wipro Ltd/United Health Group
Jun, 2018 - Nov, 20191 yr 5 months
    Responsible for evaluating access management systems to show continuous improvements of provision processes and operations. Worked on ID creation/deletion along with raising requests for other application access. Interfaced with business and IT service continuity management on the dependencies of business units and their business processes with the supporting IT services contained within the business service catalogue. Documented the RCA and work-around for issues in application management. Produced and maintained a service catalogue and its contents in conjunction with the service portfolio. Added problem tasks and assigned to appropriate teams. Worked directly with end users to resolve support issues within Service Now. Created multiple dashboards and presented in Weekly Business Reviews (WBRs) and Monthly Business Reviews (MBRs). Generated documentation for known errors, issues, and solutions. Knowledge management, service level management, and request fulfillment. Engaged with the incident response team and led the process of documenting event details, creating incident response letters, obtaining proper approvals, and distributing final client-facing documents. Ensured that the incident management process is followed and that the incident and problem records accurately reflect actions taken to restore service. Ensured that the performance of the team achieves the defined performance targets and KPIs. Managed bridge calls with support teams, on-call support application teams, and management. Daily monitoring of capacity breaches/incidents and reporting to application owners. Reviewed Utilization Trends and helped in forecasting future demand based on present utilization patterns.

Education

  • Btech

    JNTU Hyderabad (2012)