profile-pic
Vetted Talent

Harsha vardhan Reddy

Vetted Talent
Dynamic and experienced professional with a solid background in Snowflake, AWS S3 and DBT seeking a challenging role where I can leverage my expertise to drive data-driven solutions and contribute to the success.
  • Role

    Software Engineer

  • Years of Experience

    5.2 years

Skillsets

  • Git
  • Unix
  • Snowflake
  • dbt
  • Abinitio
  • AWS S3
  • EDM
  • Jira
  • Oracle
  • Putty
  • SQL
  • SQL Developer
  • WinSCP
  • Deer

Vetted For

9Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Data Engineer With Snowflake (Remote)AI Screening
  • 77%
    icon-arrow-down
  • Skills assessed :Azure Synapse, Communication Skills, DevOps, CI/CD, ELT, Snowflake, Snowflake SQL, Azure Data Factory, Data Modelling
  • Score: 69/90

Professional Summary

5.2Years
  • Jun, 2022 - Present3 yr 11 months

    Software Engineer

    Tata Consultancy Services
  • Dec, 2018 - Jun, 20223 yr 6 months

    Software Engineer

    Cognizant Technology Solutions
  • ETL Developer

    Cognizant Technology Solutions

Applications & Tools Known

  • icon-tool

    SQL

  • icon-tool

    AWS S3

  • icon-tool

    Putty

  • icon-tool

    Jira

  • icon-tool

    MS-Excel

Work History

5.2Years

Software Engineer

Tata Consultancy Services
Jun, 2022 - Present3 yr 11 months
    Worked as a Software Engineer. Developed generic Data Copy modelling from upstream (S3 Bucket) to downstream (Snowflake DB). Designed, developed, and implemented data warehouse solutions for Liquidity Analysing. Developed and maintained dbt models using different materialization strategies (tables, views and incremental) to optimize performance and data availability. Created and executed dbt tests to ensure data quality and integrity across the data warehouse. Data Modelling in snowflake as per the inputs from the source systems. Created Sequence (surrogate key) object (to establish relation between tables). Used COPY into command to bulk load the data from s3 to snowflake. Implemented SQL codes in dbt and scheduled jobs for incremental and full loads in DEV, QUA & PRD environment. Deployed codes in QUA & PRD environment through Git Version control. Created views based on business requirements in snowflake and shared them to distribute layer for Consumers. Data unloading from snowflake to stage (s3) in csv files. Transformed data based on business requirements. Involved in data transformations in snowflake (both simple & complex transformations). Shared views to distribute layer for consumers (Powerbi). Fixed development views in snowflake as per consumer requests.

Software Engineer

Cognizant Technology Solutions
Dec, 2018 - Jun, 20223 yr 6 months
    Worked as a Software Engineer. Implemented data load process using Snowflake. Used EDM internal tool to migrate data from oracle to snowflake. Used Snowflake Web UI for data analysis and loaded incremental data through EDM. Used S3 bucket as source for files to import data. Created and used Snowflake Database, Schema and Table structures. Used Snowflake Clone for cloning objects of external stage. Wrote advanced SQL scripts to transform data. Used time travel, data sharing, and data cloning features. Fixed data loading issues in production. Used Web UI for data analysis and Snow Pipe. Knowledge of Data Sharing in Snowflake. Used Time Travel and Fail Safe for data recovery.

ETL Developer

Cognizant Technology Solutions
    Worked as ETL Developer. Developed pset to perform ETL functions for generic graphs. Enhanced code at component level as per business requirements. Worked on DML creation, MDH validation and XFRs. Performed unit testing and data validation with respective test case scenario. Provided deliverables on time.

Major Projects

5Projects

American Tire Distributors (EDW-Oracle to snowflake migration)

    EDW Snowflake migration is a program designed to lift and shift the enterprise data warehouse from oracle database on premise to snowflake database on cloud as the current EDW hardware is going out of support.

Kaiser Permanente (CDW-Oracle to snowflake migration)

    A leading health care provider based in California United States has initiated to develop Claims Data Warehouse (CDW) program.

Discover Financial Services LLC

    Discover financial services Maps project deals with the program to modernize their general ledger by deploying Oracle Cloud Financial with accounting hub and migrating GL data into Oracle cloud, with end-to-end ownership of transformation of GL data feeds and arcs, and the ability to scale quickly and deliver within timelines of the GL transformation program, including deployment of account reconciliation cloud service as part of enterprise management cloud offering.

American Tire DistributorsEDW-Oracle to snowflake migration

    EDW Snowflake migration is a program designed to lift and shift the enterprise data warehouse from oracle database on premise to snowflake database on cloud as the current EDW hardware is going out of support. The program includes retrofitting of informatica components, business objects universes & reports and tableau dashboards into snowflake compatible. A total of 145 TB data was migrated.

Kaiser Permanente CDW-Oracle to snowflake migration

    A leading health care provider based in California, United States, initiated the development of Claims Data Warehouse (CDW) program to provide a Central, Consistent view of finalized claims for all regions for Reporting, Analytical, and Operational functions, aiding end-users in report generation from a central repository.

Education

  • B. Tech in Electronics & Communication Engineering

    Jawaharlal Nehru Technological University, Anantapur

Certifications

  • Data warehousing

  • Data lake warehouse

  • Data engineering workshop certifications from snowflake

  • Certified dbt fundamentals, by dbt labs

AI-interview Questions & Answers

Hi. This is Harsha. I'm having 5.2 years of total IT experience. In the initial days of my career, I worked as an ETL developer. Then, due to the migration project that came into our assignment, we started training in Snowflake. I have 3 years of Snowflake experience. I worked on 2 migration projects, which were one for an American-type distributor, another for the healthcare industry, and the last one for Discover Financial Services. For the past year, I've been working with DBT, which is a new ATL or ELT tool, where we create models and perform transformations. In Snowflake, we normally perform some best practices, such as using time travel, data sharing, creating tables, materialized views, data loading and unloading, and Jira copy cloning. I've also worked on task creation and stream setup for change data capture and task scheduling. Coming to DBT, I was working on creating models and seeds for loading data. We can upload a portion of data through a file in a seed concept. I've implemented a type 2 CDC using the snapshot method, and we have different materializations, such as table views and incremental. We also have test features, including unique tests, not-null tests, column tests, and schema tests. I've created DBT documentation and we maintain our DBT project versioning through Jit. We also do code deployment in DBT itself. These are all the parts of my experience.

What are the fundamental differences between DBT and the ETL tools in Snowflake environment? I mean, what are the fundamental differences between using DBT and the traditional ETL tools in Snowflake environment. Snowflake environment. Like, normally, we can what are the fundamental differences between using DBT and the traditional ETL tools in Snowflake environment? Using DBT, it's a data build tool compared to traditional ETL tools in a snowflake environment, introduce several fundamental differences in approach and the workflow and capabilities. Here are the key differences, actually. We have a SQL-centric transformation versus GUI-based ETL. DBT is a SQL-centric tool, actually. DBT focuses on SQL-based transformation where analysis and data engineers use SQL to transform and model the data directly with the data warehouse. And the transformations are defined in SQL-based models and using executing SQL commands. And, normally, workflow management involves writing and managing SQL code with a version-controlled environment like Git, providing transparency and control over the data transformation logic. When it comes to traditional ETL tools, traditional ETL tools are often user graphical user interface to design data graphical user interface to design data transformation workflows visually. This includes designing the data flows, scheduling the jobs, and managing the transformations within the ETL tools in the environment. When it comes to database processing and external processing, DBT leverages the database processing capabilities of Snowflake, executing SQL transformations directly within the data warehouse. This reduces the time and leverages Snowflake's scalability and performance for our transformation task. Transformations are performed close to optimizing the performance. In traditional tools, external processing often involves extracting data from the source system, performing transformations externally, and then loading the data into a data lakehouse. Then transformations may require additional infrastructure and resources outside of the data warehouse, leading to complex and potential performance overhead. I mean, quality control, code reusability, and version control. DBT promotes code reusability and the SQL code. But in traditional ETL tools, we have some versioning support, code reuse through components and templates. The level of granularity and control over the code, versioning is may vary, changes to transformations. And, we have differences between DevOps and DataOps practices when compared with DBT and traditional ETLs. DBT aligns with DevOps and DataOps practices, integrating with tools like Git and enabling continuous integration and continuous deployment. When it comes to traditional ETL tools, may require separate practices, tools for version control, CICD, and testing, and integration, and the broader DevOps and DataOps workflows, and, required additional effort and customization.

What methods can you use to optimize Snowflake storage cost while ensuring data availability for query? What methods can you use to optimize Snowflake cost while ensuring data availability for query? What methods? Normally, we use data optimization techniques to optimize Snowflake storage cost while ensuring data availability for querying involves implementing. And in data compression, we have this automatic compression enabled to automate the data compression in Snowflake to reduce the storage footprint by compressing data before storing it. Snowflake automatically applies compression techniques such as run-length encoding and dictionary encoding to optimize the storage. And we have a columnar storage that leverages the Snowflake's columnar storage format where data is stored column-wise rather than row-wise. So this format improves the compression ratios and reduces the storage requirements, especially for analytic workloads. And we have data retention policies also there, and we achieve this through data archiving and data purging. And, we have this storage optimization technique. We apply data skipping on big tables for improving the query performance, and we do data partitioning on large tables for the queries that are frequently used. And we do query optimization by doing query pruning, optimizing SQL queries to minimize the data scanned by specific predicates and avoiding unnecessary joins and aggregations. Snowflake's query profiling tool can help identify inefficient queries. And we use materialized views to precompute and store aggregations or frequently used datasets, reducing the need to recompute them. And, the last point will be cost monitoring and governance. We monitor usage through billing reports in Snowflake and follow government policies and best practices for data storage, access control, usage monitoring, and data security. And we have data life cycle management like data tiering, automated workflows, and many other things.

What are critical considerations when converting an ETL process to ELT within the Snowflake environment. What are the critical considerations? What's this 2 details in the Snowflake environment. I mean, when converting ETL process to ELT process, within a Snowflake environment, we have critical considerations. And, we have to address this successful migration, and then we have to answer this optimal performance. And, we should consider data value and complexity. And, we should include storage and compute resources. When coming to the storage requirements, storage requirements evaluate storage requirements for loading raw data into Snowflake, considering factors such as data retention policies and compression and data archiving strategies. Compute resources determine the compute resources needed for executing the transformations within Snowflake, considering Snowflake virtual warehouses options and concurrency and data loading and the ingestion strategy also. For data loading strategy, we used the copy into as a bulk command for bulk loading for high performance data ingestion. And data ingestion pipeline, we used the Snow pipe. We can connect there for real-time data ingestion. And, a real-time ingestion, we use the Snow pipe. And, we have transformation strategies, like, SQL-based transformation and parallel processing. We base the transformations on CTEs and SQL-based transformations and by utilizing temporary tables while using in a stored process. We use different functions. And, in parallel processing, we utilize Snowflake's parallel processing capabilities to execute transformations concurrently across the transformation workflows. And, we have data governance and security, performance monitoring, and optimization. And, we'll do resource scaling, and we'll do change management and testing. And, we have these kinds of things will be there. We should consider these things when we are going to convert the ETL process to ELT process in the Snowflake environment.

Schema evaluation management in a Snowflake when dealing with frequently changing source data structures involves implementing a flexible and iterative approach to handle schema changes seamlessly. We'll do schema versioning and implement version control for the database schema using schema migration scripts. We can store the schema definition changes in version control repositories. Adopting a consistent naming convention, such as time stamp-based or semantic versioning, will help clearly document schema changes and their impact on downstream processes. We'll use automated schema management, automated scripting, and continuous integration and deployment. This will involve schema evaluation patterns, like forward compatibility, and schema evaluation strategies. Additionally, we have a data transformation and migration process, including data mapping and incremental data loading. We'll also perform data profiling and schema testing, as well as data quality and validation. Furthermore, we'll do monitoring, alerting, and schema changing monitoring, as well as performance monitoring. We'll also maintain documentation and communicate properly with downstream teams that will be affected by the changes.

When implementing a CICD pipeline for data processing, how would you ensure that data integrity is not compromised during deployment? Security is not compromised during deployment. Ensuring data integrity is not compromised during deployment of a CICD pipeline for data processing involves implementing robust testing, validation, and rollback mechanisms. We can achieve this by using automated testing, including unit testing, integration testing, and regression testing. We will also perform data validation, data profiling, and data quality checks to ensure the accuracy and consistency of the data. Additionally, we will validate the schema to prevent any data inconsistencies. When it comes to the rollback mechanism, we will have automatic deployments and rollback scripts in place. We will use version control through Git to track changes and ensure that we can revert to a previous version if needed. We will also implement data isolation by segregating the data and masking sensitive information. Furthermore, we will have monitoring and alerting systems in place, such as deployment monitoring and data quality monitoring, to ensure that any issues are detected and addressed promptly. These measures will be an integral part of the CICD pipeline for data processing.

In the following DBT tool, model code snippet, we expect to create a transformed table with calculated revenue. However, the transformation is producing incorrect results. What is wrong with the logic and how can it be corrected? The correct order is among the options. We'll do the other way, new order program. So, tell this question. In the following DBT model code, we expect to create a transformed table with calculated revenue. However, the transformation is producing incorrect results. What is wrong with the logic and how can it be corrected? We have a syntax error with the order ID and the total amount having spaces in them, which can cause syntax errors. And column names with spaces should be enclosed in double quotes or brackets, depending on the SQL thing. And they will do the calculation there is a calculation error for revenue is incorrect. It multiplies the total amount by 0.1 to calculate the revenue, but it should multiply the total amount by 0.1 instead. So, we have to make the changes, like enclosing column names with spaces, order ID, and the total amount in double quotes. And we have to correct the calculation for revenue by multiplying the total amount by 0.1.

The SQL query in slow flight is intended to return the number of unique customer ID cards from the sales table. However, it's not executing correctly. The query is not executing correctly because it's trying to use the "unique" keyword which is not a valid keyword in SQL to select distinct rows. Instead, the correct keyword to use is "distinct". However, in the given query, the intention is to use the "distinct" keyword, but the query is written as "select unique" which is incorrect. To fix the issue, we need to apply the "distinct" keyword instead of "unique" and the correct query will be "select distinct customer ID from sales".

When designing Snowflake data warehouses, how do you approach balancing cost and performance across different warehouse sizes? How do you approach balancing cost with performance across warehouse sizes? When I mean Snowflake warehouses, balancing cost involves considering factors such as workload requirements and concurrency, as well as query complexity and budget constraints. We should understand the workload characteristics, and analyze the workload. We'll do query profiling, right-size the warehouse by selecting performance requirements, and use a scaling strategy. We'll do cost optimization, cost modeling, and resource utilization. We'll perform performance testing, like benchmarking and load testing. We'll monitor and optimize performance and cost. We'll also establish an iterative optimization process, with continuous improvement and feedback loop. These are the steps we can take while designing the Snowflake data warehouse.

Would you automate the deployment of data models in Snowflake using DBT in conjunction with a CACD framework? How would you automate deployment of data models in Snowflake using DBT in conjunction with the CACD framework? I mean, for automating this deployment of data models in Snowflake, using DBT tool in conjunction with the CACD pipeline integration. We have to set up the DBT project, and then we should initialize the DBT project. Then we have to define the Snowflake connection in the profile dot yaml file. Then we can implement the CICD workflows by using version control, using Git. And we'll do the CICD configuration using Jenkins by providing the YAML file configuration. And, we'll do the CICD pipeline step by step, like checking out the code, installing dependencies, compiling dbt models, running proper testing, and deploying to Snowflake. Then we have to handle environmental specific configuration, like environmental variables. And, we have to parametrize the deployment, and we should handle errors. And, we have to roll back using the rollback strategy and monitoring, and we have to monitor and report by pipeline monitoring, and we have a deployment report. These are all the conjunction with CICD.

In what ways can DBT be integrated with Snowflake to streamline and enhance the data transformation processes? In what ways can we integrate data with Snowflake to streamline and then enhance the data transformation process. Normally, DBT can be integrated with Snowflake to streamline and enhance the data by using several ways, actually. One way is the native Snowflake integration. We have a Snowflake adapter available. We can connect through it. And we have Snowflake-specific functionality also there. We can leverage the Snowflake-specific features such as time travel, fail-safe, and automatic clustering to optimize the queries and workflows and ensure the data quality, incremental data loading, and do incremental models. DBT supports these incremental models by defining the enabling incremental model loading strategies in Snowflake. And we have incremental materialization also available in DBT, which is a precomputed thing. We can upgrade the datasets in Snowflake for an incremental purpose. And we have data warehouse automation, version control, and collaboration. We can do it. We have version control and collaboration. We can perform this activity through this Git integration. We have code reusability in DBT. We can do testing and documentation. We have an automated testing cycle, and we can generate documentation to fix errors and issues. And we have performance optimization. We can do query profiling and materialize it beyond caching. We can do monitoring and alerting, and we can log in and monitor errors. We can alert and notify through channels also. We can do it through DBT. These things we can perform, and we can integrate DBT with Snowflake to streamline and enhance the transformation process.