Abhishek Srivastava

Data Engineer with 2.5+ years of experience in the tech industry. Proven ability to use cloud computing platforms to store, process, and analyze data. Expertise in data migration, and data warehousing. Strong problem-solving and analytical skills.

Role
Data Engineer
Years of Experience
3 years

Skillsets

Python
SQL
NumPy
pandas
ETL
AWS EC2
AWS Glue
AWS Redshift
Excel
Machine Learning
Snowflake

Professional Summary

3Years

Apr, 2022 - Present4 yr 3 months
Data Engineer
Lagozon Technologies
Jun, 2021 - Mar, 2022 9 months
Data Analyst
Intrics Solution

Applications & Tools Known

Snowflake
SQL Server
AWS CloudWatch
AWS Glue
AWS EC2
Excel
S3
SSMS
MySQL

Work History

3Years

Data Engineer

Lagozon Technologies

Apr, 2022 - Present4 yr 3 months

Successfully migrated data from SQL Server to Snowflake using a variety of techniques. Reduced data loss by 80% in reports by developing task alerts using AWS Cloud-Watch for multiple scenarios. Accomplished data transformation from JSON to a structured database format using advanced data flattening techniques. Designed and deployed optimized stored procedures at the pipeline level, driving streamlined and efficient data processing. Proficiently managed tasks, streams, and Snow pipes to achieve seamless data ingestion and processing in Snowflake. Implemented robust alert and monitoring systems ensuring seamless data ingestion, resulting in significant time and cost savings. Expertly executed Python-based Glue jobs to automate and trigger data ingestion processes. Developed intricate queries to extract essential business data, ensuring accurate and comprehensive information for impactful business reporting.

Data Analyst

Intrics Solution

Jun, 2021 - Mar, 2022 9 months

Implemented code solutions to answer analytic questions and test and assess new methods. Data Validation using SQL, Excel, and MongoDB. Extracted and interpreted data patterns to translate findings into actionable outcomes.

Achievements

Successfully migrated data from SQL Server to Snowflake using a variety of techniques.
Reduced data loss by 80% in reports by developing task alerts using AWS CloudWatch for multiple scenarios.
Accomplished data transformation from JSON to structured database format using advanced data flattening techniques.
Designed and deployed optimized stored procedures at the pipeline level, driving streamlined and efficient data processing.
Proficiently managed tasks, streams, and Snow pipes to achieve seamless data ingestion and processing in Snowflake.
Implemented robust alert and monitoring systems ensuring seamless data ingestion, resulting in significant time and cost savings.
Expertly executed Python-based Glue jobs to automate and trigger data ingestion processes.
Developed intricate queries to extract essential business data, ensuring accurate and comprehensive information for impactful business reporting.

Major Projects

3Projects

Loyalty Audit Report Generation

Dec, 2023 - Present2 yr 7 months

Analyzed business needs and data landscape to define data strategy. Developed AWS Glue ETL jobs using Python to extract data from SSMS and MySQL databases, and load into Snowflake. Performed data cleansing and deduplication using SQL, and developed multi-level reporting backend with SQL for transaction, item, voucher, and customer-level data.

Online Retail Data Ingestion

Mar, 2023 - Jun, 2023 3 months

Led the data migration planning process, identifying source systems and defining migration strategy. Developed AWS Glue ETL job using Python to ingest data from S3 into Snowflake and archive processed data in S3 Glacier. Implemented automated data pipelines in Snowflake using SQL procedures and tasks to streamline data movement and transformation.

Online Shoppers Purchasing Intention Business

Nov, 2020 - Feb, 2021 3 months

Predicted purchasing intention of shoppers on an e-commerce site. Explored revenue generating factors, performed statistical analysis, feature selection, model building, and inference using Pandas, NumPy, Seaborn, and classification models.

Education

Post Graduate Programme In Data Science And Engineering
Great Lakes Institute of Management (2021)
Bachelor of Technology
IEC College of Engineering And Technology (2017)

Certifications

Aws certi cloud practitioner (10/2022 - 10/2025)

AI-interview Questions & Answers

I'm here to help with your interview transcript. Here is the corrected text: Hi, I'm. I have 3 years of experience in data science and data engineering with a strong background in Python and SQL. No big deal, I have 38 projects on AWS. I've worked with AWS, including Blue AWS, Redshift, and I have relevant experience in Azure also, in the cloud, and I've used Snowflake extensively for the last 2 years with AWS. And, I've also worked on Azure for multiple in-house projects. And, prior to that, I was working with Intrex Solution Private Limited as an associate data analyst, working on data with technologies like MongoDB, SQL, Python, and Excel. And, I've done a PGP in 2020 to 2021 in data science and engineering, where I learned about data engineering techniques and data science projects. And, I've learned pandas, SQL, Excel, and other useful business tactics to model data in a business field to bring business value.

Hi. I'm. Hi. So in this transactional control, I would be using the I would be monitoring the test and development and test environment, that all the transactions are going 1 by 1 on concurrent detail processes. And I would be testing each and every transaction in the automated testing, flowing with me. And I will be using streams for all the transactions that are happening on the table. Like, we can use streams for insert, update, and delete all the transactions DML performed on the table. So we can use streams so that every transaction gets recorded on the Snowflake table, that can be used when the data is changed in the table. For the CDC purpose, for the CDC change data capture, we use streams. We use streams for looking at the transaction control so that we can get when the data gets updated or which data is being inserted or updated in the last few days or month back, so we can maintain the data consistency.

I would be using the 3-stage layer in which 1 is the development environment, 1 is the testing environment, and the other is the production environment. So once the data gets developed, once the ETL process pipeline is developed, we can test it, and it will be an automated session that can debug the errors in the data pipelines for multiple high-volume data. And in the production, then we can move the pipeline to the production, and we can follow the 3-layer architecture which is a 2-layer architecture that is stage and prod, and the ODS layer. On the staging layer, it will be raw data that is inserted, then it will be transformed in Snowflake, and then it will be moved to the ODS layer. This is the architectural consideration that we'll be using for the high-volume data processing so that no data gets lost and no redundant data can be inserted.

We can implement independent potency in the stopping data ingestion by using the Snowpipe. Snowpipe once the later data is loaded and the data gets loaded and the source table, then automatically, it will be inserted into.

For the CDC solution in Azure Data Factory, we can use a particular column for the modification of the data. And once it gets modified, every row that gets modified will be updated with the date and time it will be updated, then we can capture it incrementally because that data is being updated.

So, for semi-structured data like JSON, XML, in the ELT process, we can directly load the JSON or semi-structured data into a Snowflake environment in a table, which is called a Variant, by defining the column name as Variant. From that, we can run a procedure to extract the data from that JSON and load it in another table incrementally. We can use streams so that every time we get new data, we get new data, using a standard stream. We get every time a row is inserted or updated in the staging layer table, we run the procedure and we can run the procedure on that stream data. Once the fresh data is loaded, we can use that stream for the incremental loading of the table. We can get the incremental data from that stream and use it in the procedure. So, we can get the incremental data loaded in the final table. This is how we can improve the performance of the ETL process in Snowflake involving semi-structured data. For the lateral flatten technique, we can flatten the data smoothly in the Snowflake environment.

It's a matter that affects the model.

We're still doing jobs, still not getting sleep. So, there's a stage in build and deploy. It's a two-layer architecture in which we build, develop, test it, and then deploy to production. Deploying to production. The steps involved are building a job and building and testing. So, it will be impacted first by getting developed a CI-CD pipeline for, which we use in this CI-CD pipeline, and the first stage is development. Inside that development, we have build and test. There's a two-layer architecture, first build and then deploy. The first stage is build, in which we build and test it, and then build and test the pipeline. In the deploy stage, we use it, and we get deployed to production, meaning we need to deploy this into production. That step is deploying to production. So, it will impact like we've built the pipeline, we've built the pipeline and tested it. It will go smoothly.

So, for the machine learning pipelines, we need data that is accurate and instant as it gets updated. For the machine learning pipeline, we have built a Snowflake from the source where we get the data. And from that source, once the data is uploaded and updated in the source data, it then gets smoothly updated in the staging layer. From that, we run a task; in that task, we run a procedure to update the new record in the ODS layer. So, in the ODS layer, we get the updated data as soon as it gets updated on the source. For the streamlined process, for the quick process, we use streams. We get the incremental data and it will be optimized; it will be in an optimized manner. We use Snowflake on that. On that table, on the ODS layer table, we have built a machine learning model, a machine learning model that gets updated as soon as the source data is updated. So, this is how the new data will become available as soon as the source data is changed. In that task, we can add a when command on this when command; we have used a when stream has data, the stream name. Once the data gets into the stream, then it will directly load it into the final table. So, it will smooth the pipeline for our machine learning data.

Let's design a CICD pipeline in which there are three stages: development, testing, and production. In the development stage, we build a pipeline for Snowflake. We use automated testing to debug errors. Then we move it to the testing stage, where there is a minimum interruption in data services. So, for that, we can

Abhishek Srivastava

Data Engineer

3 years

Skillsets

Professional Summary

Applications & Tools Known

Work History

Data Engineer

Data Analyst

Achievements

Major Projects

Loyalty Audit Report Generation

Online Retail Data Ingestion

Online Shoppers Purchasing Intention Business

Education

Post Graduate Programme In Data Science And Engineering

Bachelor of Technology

Certifications

Aws certi cloud practitioner (10/2022 - 10/2025)

AI-interview Questions & Answers