profile-pic
Vetted Talent

NANDESH REDDY

Vetted Talent
Experienced data engineer adept at designing and implementing scalable data solutions. Proficient in crafting efficient data pipelines and optimizing data processing workflows. Skilled in diverse big data tech stack, including technologies such as PySpark, Hive, Sqoop, MySQL, HDFS, Hadoop, and possessing fundamental knowledge of Azure services. Self- motivated and ready to learn new technologies, committed to driving data-driven decision- making by ensuring the availability, accuracy, and usability of data.
  • Role

    Senior Tech Consultant

  • Years of Experience

    7 years

Skillsets

  • Jira
  • SQL
  • Snowflake
  • dbt
  • Apache Spark
  • Apache NiFi
  • SVN
  • Sqoop
  • MySQL
  • Python - 5 Years
  • Hive
  • Hadoop
  • Git
  • Databricks
  • Confluence
  • Azure
  • PySpark - 5 Years

Vetted For

13Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Data Engineer || (Remote)AI Screening
  • 49%
    icon-arrow-down
  • Skills assessed :Airflow, Data Governance, machine learning and data science, BigQuery, ETL processes, Hive, Relational DB, Snowflake, Hadoop, Java, Postgre SQL, Python, SQL
  • Score: 44/90

Professional Summary

7Years
  • Apr, 2024 - Present1 yr 5 months

    Senior Tech Consultant

    EY
  • Sep, 2022 - Apr, 20241 yr 7 months

    Big Data Engineer

    Tata Consultancy Services
  • Aug, 2019 - Aug, 20223 yr

    Big Data Engineer

    Metric Stream
  • Dec, 2018 - Aug, 2019 8 months

    IT Developer-1

    DXC Technology

Applications & Tools Known

  • icon-tool

    Hive

  • icon-tool

    Hadoop

  • icon-tool

    Sqoop

  • icon-tool

    Git

  • icon-tool

    SVN

  • icon-tool

    Jira

  • icon-tool

    Confluence

  • icon-tool

    Azure

Work History

7Years

Senior Tech Consultant

EY
Apr, 2024 - Present1 yr 5 months
    Worked with one of the Global leaders in food manufacturing to build end-to-end reporting requirements and KPIs such as Primary Sales and Secondary Sales. Developed and migrated data pipelines from on-prem Cloudera (Hive) to Snowflake using DBT, ensuring consistency and performance optimization for retail data. Utilized Apache NiFi to automate supply chain workflows, integrating systems and triggering shell, Python, and Hive scripts for data processing.

Big Data Engineer

Tata Consultancy Services
Sep, 2022 - Apr, 20241 yr 7 months
    Design and build scalable data pipelines using PySpark, ensuring smooth and reliable flow of data between various stages. Developed ETL workflows utilizing technologies like Apache Spark for large-scale data processing and transformation.

Big Data Engineer

Metric Stream
Aug, 2019 - Aug, 20223 yr
    Developed expertise in data processing frameworks like Apache Spark and gained proficiency in ETL pipeline development. Designed and implemented scalable data solutions, ensuring efficient data ingestion, transformation, and storage. Analyzed large amounts of data sets by writing Hive queries. Used Sqoop to import large data from traditional RDBMS to HDFS and also handled incremental Sqoop.

IT Developer-1

DXC Technology
Dec, 2018 - Aug, 2019 8 months
    Collaborated with senior developers to understand project requirements and contribute to the design and development of software applications. Developed and maintained Python scripts to automate repetitive tasks, improve system efficiency, and enhance application functionality.

Achievements

  • Spot award winner in MetricStream

Major Projects

1Projects

Credit Card Personolization

TCS
Nov, 2022 - Present2 yr 10 months

    Working on Credit cards offers personalization based on users' spending Nature. The Main moto of the project is targeting customers in 3 different channels emails, mobile and web app notifications with offers based on their spending nature.

    Using big data tech stack Pyspark,python,SQL and Hive with Azure DataBricks.

Education

  • B.E - CSE

    SSIET (2018)

Certifications

  • Big data engineer certified, trendy tech institute

Interests

  • Books
  • Bike Rides
  • Watching Movies
  • Long Rides
  • AI-interview Questions & Answers

    Hi Tim, basically I am working as a data engineer around 4 plus years of experience, coming to tech stack which I have been working for around PySpark, Python, SQL and Hive and coming to the area, the background which I have is BFSI domain like banking domain where I am mostly working on credit card fraudulent activities and credit card personalization. Coming to personalization, the way we are doing is let us say the customers, the spending mostly on some particular department or particular category, let us say they are mostly spending on sports or some entertainment perspective using credit cards. So based on the spending nature, we are focusing the customized offers for each customer in three channels, email and mobile notification, website notification, that is a major thing we are working on. Coming to my day-to-day roles and responsibilities like we have Hive warehouse, the data is ingestion from different team, we have the data in Hive warehouse, on top of that we are using Spark to read the data in Hive, the raw data basically and we will do the transformation, using the Spark technology we will do transformation and we will do resource management, optimization techniques in Spark, then we will move on to the dump the data into Hive warehouse again, some use cases we will use EdgeBase as well and we started implementing Azure Cloud going forward and so we started getting trainings on top of that as well.

    Scaling method you would use to query environment. Uh, well, I did work in BigQuery as of now. I worked in, uh, a spark, uh, streaming aspect, uh, like, uh, the window functions and watermark aspect. We use that one. Not exactly into the real time streaming application.

    How you can handle the consistency across different data stores? Cool. Mhmm. Well, this coming to the consistency management, it will vary because post SQL is, uh, something like, uh, relational database. Well, Hive is coming to the, uh, uh, OLEP system. So consistency is not that much, uh, you know, equal and we can achieve. The consistency we can achieve relational database while coming into Hive, it's not that much feasible because it will affect the environment. We'll mostly use for analytic purpose, and we'll gonna always have previous data, basically. The consistency, we can achieve only using incremental load, basically, I can say, in post this equal and dial. But when coming to Snowflake, as of now, I didn't work in Snowflake Technologies.

    And how you can utilize Zadok because it's on less stream between real time insight? Well, to work on streaming data on real time, uh, we can use Kafka environment, Kafka and Spark streaming applications. The data basically which is coming from consumer and producer aspect, uh, using Kafka, we can achieve the real time streaming. Well, the data the data which is getting processed in memory itself because it's a real time streaming. But, uh, internally, it's kind of for batch processing only. The data will be segmented into, you know, 5 minutes, 10 minutes. The kind of buckets will get created. On top of that, these Kafka technologies and Spark Streaming, uh, window window functions and tumbling windows and watermark technologies has used to process further.

    How will your strategies must be interesting? It occurs from an answer to BigQuery. Well, uh, as of now, I didn't work in BigQuery, so I don't have that much understanding to answer this question.

    It is coding for optimizing coding for external services. Well, coming to optimization, there are several factors we need to be, uh, checked. 1 is indexing need to be taken care. If you have indexing properly, uh, you know, managed, then it's, uh, the performance will get, uh, good optimization. And coming to other aspects, let us say we're doing some join operations, Then, uh, before doing joins, we should filter out the unnecessary data so we can filter using where filter. And once it is done using joins, we can do some level of optimizations. Uh, but, uh, those optimizations are, like, uh, you know, we can do in OIL AP system, but coming to post this equals a relational dp. Uh, I'm not sure we can apply partition and bucketing here. But, uh, coming to relational database, I can say we can do, you know, indexing is a good thing to follow-up. And, uh, as, you know, yeah, that where filter condition we can apply. That's the most prominent way we can do the things.

    The Python code blocking can create a first and for what not. Potentially, it's Accept exception and see. I think in exception, we should have been handling different way. The you know, like, uh, what are the details you mentioned except exception and see there? We should have given some custom exception, uh, value over there up to my understanding.

    SQL course, that's a subtal issue inquiry that might cause some performance degradation. How you think the problems are? How you can the performance. K. Producting. I think, uh, the wear filter should have been applied before join operation because after doing join operation, we are following that with wear filter. So it's unnecessarily we are doing joins then doing wear operation. So first, we should filter out the data, then we should go ahead with the join operation to improve the performance.

    So do you enjoy Python to programmatically impose and seek out this codes. Not that much sure about the how we programmatically enforce asset properties. But, uh, the SQL level, I know what we can do, what are the asset properties we can achieve. But, uh, I need to look into this question.

    Task with integrating a Python based process with for scheduling. Integrating Python. Well, in airflow, we can create tags for the whatever the jobs which we have created, whatever the functions we have created in our Python script, which will basically take and get a dependency. We should take and get the to get to ensure the reliability and scalability. You know? Uh, one DAG first DAG will be it's a direct. It won't reverse the flow which is going forward. It's a forward direction. It won't come reverse action. So we should ensure that, uh, uh, ingestion should happen ingestion DAG should apply it 1st, then the transformation, uh, function should have been called in, uh, airflow DAG, then the loading operation should be called. So if we ensure this functionality is properly managed, then, uh, we can achieve the reliability and scalability in Airflow.

    Uh, well, coming to data governance, so I'm not that much sure. It just depends on the teams and the security level they implemented. Coming to data quality framework, I will ensure that they, you know, the filtration operation, whatever the things which we have applied, and, uh, what exactly we we will we need to follow the SED 2 type operation. So before doing it, you know, if you say OLDAP, we should follow the SED types based on the requirement, basically, but mostly we'll follow SED 2 type. And, yeah, the data quality framework, uh, null should have been taken care. And, uh, performance aspect, we should be taken care, uh, coming to performance aspect. But coming to data quality, the duplicates, null operate nulls, those aspects, we should be mostly aware of it.