Vetted Talent

Ponithapunitha girish

Vetted Talent

Seeking Data Engineer position at company where I can leverage my skills in data analysis and software development to support the mission of leveraging technology for impactful solutions.

Role
Senior Data Engineer
Years of Experience
14 years

Skillsets

MySQL
AWS
Hadoop
Kubernetes
PySpark
Core Java
Big Data
BI tools
Control-m scheduling
Docker
HDFS
Hive
Hue
Jenkins
YARN

Vetted For

19Skills

Roles & Skills
Results
Details

Big Data Engineer with Streaming Experience (Remote)AI Screening
50%

Skills assessed :Spark, CI/CD, Data Architect, Data Visualization, EAI, ETL, Hive, PowerBI, PySpark, Talend, AWS, Hadoop, JavaScript, 組込みLinux, PHP, Problem Solving Attitude, Shell Scripting, SQL, Tableau
Score: 45/90

Professional Summary

14Years

Jun, 2022 - Present4 yr
Senior Data Engineer
Oracle Cerner
Apr, 2016 - Jun, 20226 yr 2 months
Data Analyst / Data Engineer
TCS, Deutsche Bank
Jan, 2015 - Mar, 20161 yr 2 months
Software Engineer
TCS, Visa Europe
Dec, 2009 - Mar, 20122 yr 3 months
Software Engineer Trainee
TCS, Deutsche Bank
Apr, 2012 - Dec, 20142 yr 8 months
Junior Software Engineer
TCS, Credit Suisse

Applications & Tools Known

Hadoop
HDFS
HIVE
MySQL
Docker
Jenkins
AWS
Hue
Kubernetes

Work History

14Years

Senior Data Engineer

Oracle Cerner

Jun, 2022 - Present4 yr

Worked with healthcare service products managing registries/viewing patients' data and scorecard information. Involved in defect fixes, monitoring ETL workflows, and deploying component services on Kubernetes. Supported applications and fixed client-reported issues while unblocking engineers on technical challenges.

Data Analyst / Data Engineer

TCS, Deutsche Bank

Apr, 2016 - Jun, 20226 yr 2 months

Developed data processing pipelines and implemented scalable solutions for analyzing large datasets and real-time data. Collaborated with cross-functional teams, leading to a 30% increase in project delivery efficiency.

Software Engineer

TCS, Visa Europe

Jan, 2015 - Mar, 20161 yr 2 months

Worked on Payment Processing System for clearing and settling card transactions for 5000 European banks.

Junior Software Engineer

TCS, Credit Suisse

Apr, 2012 - Dec, 20142 yr 8 months

Worked on MYRIAD, a single storage point for Credit Risk and Market Risk data. Validated and stored data in standardized formats regardless of the source system.

Software Engineer Trainee

TCS, Deutsche Bank

Dec, 2009 - Mar, 20122 yr 3 months

Worked on Online banking trading systems offering financial products and services for corporate and private clients.

Achievements

Experienced Senior Data Engineer with strong understanding of PySpark, Hadoop, Core Java and Big Data
Led developer team achieving 30% increase in project delivery efficiency
Received On Spot Awards for improving data accuracy and efficiency
Client Appreciation from Chief Country Officer in Ireland (Dublin)

Major Projects

1Projects

Real-time Data Processing Pipeline

Developed and worked on a real-time data processing pipeline, enhancing data accuracy and efficiency.

Education

Bachelor of Engineering in Telecommunication
BMS Institute of Technology (2009)

Certifications

Oracle certified java programmer april 2013

AI-interview Questions & Answers

Hi, I'm Punita, and I am a data engineer with 14 years of experience. So overall, I have 14 years of experience, and I have around 6 to 7 years of data engineering experience. So in my past projects, I have worked on various data pipelines, like onboarding data from sources, massaging and uploading transformed data to the Cloudera cloud system. That's pretty much about myself. So, I'm a multitasker and a quick learner. This is pretty much about myself.

Yeah. So in Python-based mod, ETL extract transformation load, the incremental would be based on the data which we would be getting if there is a batch processing, then we would be doing it with the help of the Hadoop MapReduce. But since it is a Python mod, so it is good to have Spark and build a Spark tech stack to be used along with Python flavors. So, we would use the Spark for our incremental data loads.

Yeah, so to ensure there's zero downtime, since we would be using Spark, which has in-memory computation logic, we would expect it to have 0 downtime during the ETL pipeline deployments. And also, we would ensure that the code gets deployed to multiple regions and will have a backup with a 3 times replication factor, so that it reduces downtime during future pipeline deployments.

For validating the correctness of any ETL process in any of the BI tools, we first determine whether the specific input location is providing structured input data. What we are getting is structured enough. So if it is not structured, we would then proceed with using the data cleansing process, which includes removing or adding delimiters, removing extra spaces, and modifying or updating the columns if necessary. That is one of the correctness approaches we would be following.

So the data integrity would be maintained within the transaction SQL database to S3 by ensuring that all the data has been uploaded properly and it is partitioned well using partition techniques and the data, which is like you know, optimized enough. This is one of the data integrity approaches, but we would follow that the volume, veracity of the data has been considered. And even in terms of security purposes, we would use principles like I'm not sure what this means, so let's remove the filler, and ACL, access control list, so that the data's integrity, privacy, and security are also maintained.

Yeah, so sparks partitioning and caching is used in a way that it can perform much better. And partitioning, in terms of partitioning, we would be considering coalitions as a better partitioning because it'll have less shuffling. And caching, in terms of caching, we would be considering persist so that you know, we can define the memory levels. So these are the ways we can make the performance of the job better.

So in this function, I just come across that it doesn't have any static kind of method, and also it doesn't return anything. So it is just a void method wherein it is saying that if the particular report type is HTML, do this. Or else if it is PDF, then generate the PDF report. But also, coming to say I see that it is enclosed with a backslash, which is actually not needed. So that is what the first glance I look into that. Yeah. Okay.

Okay, so in terms of batch data processing, if both of them are executed, since the TMap 1 is having row 1 and updating with T file output delimiter. But if you again execute T map 1, then it will get overridden with row 2 and T file output delimiter 2.

Ponithapunitha girish

Senior Data Engineer

14 years

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Senior Data Engineer

Data Analyst / Data Engineer

Software Engineer

Junior Software Engineer

Software Engineer Trainee

Achievements

Major Projects

Real-time Data Processing Pipeline

Education

Bachelor of Engineering in Telecommunication

Certifications

Oracle certified java programmer april 2013

AI-interview Questions & Answers