Vetted Talent

Janani Sivasubramanian

Vetted Talent

To seek a challenging role to apply advanced analytics techniques, collaborate with cross-functional teams, and contribute to organizational growth by leveraging data-driven strategies and solutions.

Role
Snowflake Data Developer
Years of Experience
4 years

Skillsets

ETL processes
Data Modeling
Performance Tuning
SQL Development
Data Loading

Vetted For

9Skills

Roles & Skills
Results
Details

Senior Data Engineer With Snowflake (Remote)AI Screening
53%

Skills assessed :Azure Synapse, Communication Skills, DevOps, CI/CD, ELT, Snowflake, Snowflake SQL, Azure Data Factory, Data Modelling
Score: 48/90

Professional Summary

4Years

Apr, 2021 - Present5 yr 2 months
Snowflake Data Developer
Tata Consultancy Services Ltd.
Backend Developer
CMS
Backend Developer
CCAP

Applications & Tools Known

Snowflake
Microsoft Power BI
Java
Apache POI
Git
MySQL
Microsoft Excel

Work History

4Years

Snowflake Data Developer

Tata Consultancy Services Ltd.

Apr, 2021 - Present5 yr 2 months

Design and implement efficient data models, develop and optimize SQL queries, ETL processes, and manage performance tuning of Snowflake data warehouses.

Backend Developer

CMS

Designing, building, and delivering management reporting using Microsoft Power BI. Utilized Power BI features like basic DAX calculations, maps, and KPI dashboards.

Backend Developer

CCAP

Worked with Java technology and SQL for reports generation, involved unit testing, and developed API gateways and lambda functions.

Education

B.E (EIE)
R.M.D Engineering College (2020)

Certifications

Leet code for solving 50 sql questions
Microsoft as powerbi data analyst associate (pl-300)
Snowflake decoded master fundamentals-udemy
5 star badge in hackerrank for sql

AI-interview Questions & Answers

I'm a Snowflake developer and currently working in Tata consultancy services. So my role is for development in Snowflake technology where I will be writing SQL queries and fine-tuning optimization in those SQL queries, data modeling. And also, I will get the required inputs from the data analyst from the mapping sheets, and then I will proceed with the query as for the reports they want. So my role is for Snowflake SQL things. In Snowflake dev, there are different environments. And in each environment, we will be selecting each roles and each data warehouses, where we can start with each attribute and given by the data analyst. So what we will do is, like, we will start writing the data attribute for each attribute, and we'll first analyze. Before starting with the code, we will first analyze the table frequency, table load frequency, whether it's in correctly loaded, whether it has been correctly loaded, whether all the tables are correctly loaded means correctly loaded means load each. The table's nature, we will analyze it. And after analyzing, we will start with the code attributes. And in the code attributes, we will be selecting the reps which are the attributes required for that particular report. So we will be selecting that, and obviously, we will be requiring more tables every all the attributes can't be present in a single table. So we will be using fine-tuning the queries, and we will be using many tables. And in those many tables, we will be fine-tuning it, and we will be using as of now, we are just giving the UDF. We are creating the UDF with which parameters has, and we will just give the invoking query to the API team, and they will process it. And so in our query from the Snowflake site, after we run it and the query, it will take only milliseconds to run it. It won't even go for a second, and we have done many reports, like as our team and as an individual player, I have done 2 to 3 big reports. And then that I have been trained, like done well in my part. So I have also got client appreciation as well as my team lead has personally motivated me and always encouraging me. And also after the completion of the reports, they have appraised me. And yeah, I'm. As of now, that's it. Thank you.

How would you leverage Snowflake's time travel and zero copy cloning features to enhance data recovery and testing? How would you leverage Snowflake's time travel and zero copy cloning features to enhance data recovery and testing? So, the cloning concept, zero copy cloning means, first of all, cloning is not a new concept, even though it starts with a good clone. As of Snowflake's zero copy cloning feature to enhance data recovery and testing is data duplication, like, time travel and zero copy. Zero copy cloning means it can clone a database seamlessly with a simple command. It's very similar to pointers, which act as a reference to another variable, instead of actually being that variable. So when a command is given to clone a database, it creates a new database and all objects underneath the database are created, one could expect the data to be held as a replica, but it's so what's so when a user clones a particular object or table or attribute, the cloud service simply fetches the data from the actual source. It makes the data as up-to-date as possible, with the latest timestamp and the latest dates. Snowflake does this with metadata. Time travel and zero copy cloning features are mostly done in the service layer. The advantages of zero copy cloning and time travel features to enhance data recovery are almost no additional storage costs, and no waiting time required, and no need for administrative efforts. Cloning the process instantly promotes a corrected, fixed data to production.

How would you design a resilient data pipeline in Azure Data Factory to handle intermittent source data availability? Okay, so how would you design the receiving data pipeline in Azure Data Factory to handle intermittent source data availability? So, in Azure, we can ingest the data from both on-premises and cloud data sources and transform or process the data by using existing compute services such as Hadoop and where the results are published in the on-premise or the cloud for business intelligence, which is known as Azure Data Factory. And Azure Data Factory is basically a data integration service where it works in the cloud for orchestrating and automating the data movement and the data transformation. It does not store any data itself. It allows you to create data-driven workflows to orchestrate the movement of the data between supported data stores and then process the data using the compute services in other regions or in an on-premise environment. So, it also allows you to monitor and manage the workflow using both programmatic as well as UI mechanisms. It supports data integration, data migrations, and integrating data from different ERP systems and loading it into Azure Synapse for reporting. So first, if we need to connect and collect the data, then we should transform and enrich it once it is present in the centralized data store in the cloud. It is transformed using compute services for Azure data lake analytics. And then, the transformed data from the cloud to on-premises sources like SQL server or keep in cloud storage sources for consumption by BI and other tools like Power BI, which we can use to get insights. So, data factory copies the data from a source data store to a sync data store. Azure supports data source or sync data stores like Azure Blob Storage or Azure Cosmos DB. And Data Factory supports transformation activities such as MapReduce, and these can be handled in pipelines either individually or chained with other activities. So, like, moving data from one source to sync or that can be done using these required tools.

What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? What steps would you take to migrate an existing data model to Snowflake ensuring minimal downtime? Okay. Upload.

Can you implement an auditing system in Snowflake to track historical changes of critical datasets? So if we need to implement an auditing system in Snowflake to track the historical changes of critical datasets. So if we need to trace or track a particular operation performed, like given by me or given by the user or vice versa, then it can be achieved by querying the Snowflake account usage query history view, that records the history of the queries executed in the account. We can design our audit logs through the data available in this view. So, like, by giving the role the account admin, and you can give the start time query, and we can use the start time as greater than this current date, 4th April and less than the start time, 30th, like 5th April, or the 31st 30th April. And, we will keep using query test like this warehouse should be. And, if we give order by the start time, and then we can apply different sets of filters on the query text column to search for the different comments that the users must have executed. So the redemption of the data in this view is for per month. Like, one this now for in this example from 4th April to 30th, that means this irrespective days. So, can you implement an API by providing the robust features. So, auditing and logging. Snowflake actually maintains a detailed audit logs for all API activities, including the data access and modification operations. These logs record who performed the actions, what actions were taken, and when. Audit logs can be used for compliance, security, and troubleshooting purposes, like for DDL and DML. So, for Snowflake logs, DDL operation performed through APIs such as creating or altering tables, views, data structures, and schema evaluation on for the DML, like insert, update, delete. These are the DML, for the data changes and the access, through the APIs. So, users and applications are for this, like audit policies will be helpful. Snowflake allows administrators to define the audit policies to specify which types of API activities should be audited or which policies can be configured to capture specific actions, users, and object access controls.

Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands. Describe how you would configure Azure Data Factory to handle dynamic scaling based on workload demands.

Based on the provider Snowflake SQL snippet, can you explain what the issue is with current stored procedure and how it might affect data processing? Okay. Okay. Based on the provided small flex sequence, but can you explain what the issue is with the current stored procedure and how it might affect data processing? Create or replace the procedure. Update order status. Begin update order. Set status. Discharge the status. Place the missing commit statement. Okay. So It's like creating the 2 sort procedures, where we will begin, and then, finally, we are dispatching it. So, Snowflake SQL super. Can you explain what the issue is with the current stored procedure and how it might affect the data processing? Actually, in, set statements and where statement and we have given the set status as equal to dispatch. Commit is executed after operations like insert or delete and update. So, the syntax itself, is using the For this particular store, the procedure set status, wire status, missing, commit statement. so, it's missing the commit statement. So for each transactions, the commit is, in the in d v side, if we need, if we are giving the update and if we are giving set and we are giving the bad, so, like, we should insert the we should get the inputs and we should, after that commit transaction, that print statement, that should be there. And because of that is the issue, and it might, it might it just it might not run because of the syntax error. And, Or it might be rolled back or okay.

Stages build jobs, build and test steps, echo building and testing, and display name build and test jobs deploy to, prod condition succeeded steps. Echo deploying to production. Display name deploy to production. If you come across this section within a CICD pipeline configuration using Azure Data Factory or Azure Synapse, the possible oversight is that no explicit environment is specified for the deployment job. So if you come across this section within a CICD pipeline configuration, the jobs, the job name they have given building and testing, echo building and testing, and the display name will be coming as build and test. And it's the stage isn't the deployed, so after build deploy stage is there, so job deploy to prod. Actually, it's moving to production environment and that condition is succeeded. So as of now, it's good to go out for the production environment and script echo deploying to production and then deploy to display name deploy to production. No explicit environment specified for the deployment job. What I should do is specify the production environment explicitly in the deployment job configuration to avoid any potential issues during deployment.

Would you build a machine learning data pipeline in Snowflake? How would you build a machine learning data pipeline in Snowflake and ensure it is updatable as new data becomes available? Yes, I would build a machine learning data pipeline in Snowflake. To build a machine learning data pipeline in Snowflake, I would use the Kafka connector to integrate with external data feeds, enabling real-time data streaming. I would create streams to capture policy updates, claim submissions, and customer interactions, ensuring continuous tracking and storage of changes. After that, I would apply transformations and enrichment operations to the captured data. So as of machine learning, we will see machine learning in first we will see the Bellman's equation. According to the Bellman's equation, we should value the state. We should know the value of the state, and that value of the state should be equal to the maximum for that particular state, the maximum of the reward point of the for that particular action. Reward given for a particular state and action, and then it will be given with a discount, which is the gamma into the value of the prime state, which is existing. So, according to the Bellman's equation, and how the machine learning data actually, this Bellman equation, it's coming from how the machine is learning. Like, how that, it does refinement learning by doing with its reinforcement learning. In that reinforcement learning, it is getting updated, like, it's learning from its own mistakes. It's not preprogrammed. It's learning from its own mistakes. So, in coming while doing that machine learning data pipeline in Snowflake, the data pipeline like, it might have many source tables or customer interactions or external data feeds. And using the Kafka connector for the Snowflake, the Kafka connector facilitates seamless integration between the Snowflake and it's enabling the real data streaming. Real-time data, it's seriously no. It's it's a very huge impact. And, so enabling the real time by connecting that Kafka and the Snowflake real time data streaming. So, Snowflake seems to capture and it changes from the incoming data stream. For example, streams can be created to capture the policy updates or claim submission and customer interaction. This seems to continuously track and stores the changes ensuring the real time data synchronization. And after that, the transformation and the enrichment will happen where the users can apply transformations or enrichment operations in this to this capture data. For instance, you can interest the customer interaction with the demographic data or claims data with the geolocation and then data processing with tasks automates the data processing activities based on the predefined schedules or the conditions and the real time data loading with Snowpipe. It's continuously automatically integrating the transform data from the streams into target tables within the database. So, analytics and visualization for that, we will be using Power BI tool. And in that, this actually, in Snowflake itself, we can see the by giving the graph, we can see how it can be like, applying machine learning algorithms, building predictive models, and generating the visualizations using Tableau or Power BI. And the dashboards are paid and monitored, and alerts are given also. Continuous monitoring and alerting are crucial to ensure the performance for the reliability of the data pipeline.

Applied DevOps practice to improve collaboration and reduce the lead time in Snowflake data operations. So, creating isolated environments or schema changes or table selection, like new features like schema changes, which can be expensive or operationally challenging since they require developers to code changes with the updated database schema. So, using test data to validate feature changes is far from ideal, even if the schema is the same between production and pre-production stages. With production data being so costly and time-consuming, it can result in schedule delays or compromise product quality. Scaling production environments but trying to replicate production in pre-production has been smaller in scale due to the costs associated with procuring or standing up and managing production scales. So, instantly creating any number of isolated environments, reducing schema changes with variant data types, and rapidly seeding the pre-production environment with production data has two ways to fasten up the environment with production data: secure data sharing is used when environments are on separate Snowflake accounts, and zero-copy cloning is used when the environments are on the same account. And instantly scaling environments to run jobs quickly and cost-effectively, where scaling issues are easily overcome with Snowflake's per-second pricing structure, customers pay only for the time needed to run the job, regardless of cluster size. So, development teams can scale the production environment to run big processes in a fraction of the time and scale it down when the process ends. So, easily rolling back with Snowflake's time travel. So, they can handle errors in rollbacks in the CI/CD process with time travel enabled objects, like tables or schemas, which can be deleted or easily restored and accessed programmatically at a point in time, particularly within 19 days within the 90-day window. Beyond this feature, Snowflake's data cloud makes DevOps like using simple language, making it easy to use simple languages like Python, SQL, and Node.js, and near-zero maintenance because it's a cloud-based service, a completely software-as-a-service provider, and it offers real-time integration with external services using custom external services that are stored and executed outside of Snowflake.

Optimize the data retrieval time in Snowflake while dealing with the large semi-structured JSON datasets using Snowflake SQL. We can optimize it by using the copy into operation. Although if it is a very large JSON dataset, optimizing it using Snowflake SQL can be done by getting loaded into a bulk storage destination. We can use the copy into operation and which can be loaded into the Snowflake table using a create table command. And after specifying the create table, then specifies the variant column in which the JSON data resides. If the path to the root element in the JSON is known, or the data can be cast to the appropriate type. A view could be manually created as the essential hides this complexity from the end user. To automate the creation of a view, we will need to know two key pieces of information. One is the path to each element and the other one is the data type of each element. It turns out we can leverage a natural join to flatten a subquery to get the information about the individual elements in the JSON document. So build the query and then run the query, loop through the returned elements, build the view, and construct the DDL. Then run the DDL to create the view. Then it turns out Snowflake's stored procedures are perfect for this use.

Janani Sivasubramanian

Snowflake Data Developer

4 years

Skillsets

Vetted For

Professional Summary

Applications & Tools Known

Work History

Snowflake Data Developer

Backend Developer

Backend Developer

Education

B.E (EIE)

Certifications

Leet code for solving 50 sql questions

Microsoft as powerbi data analyst associate (pl-300)

Snowflake decoded master fundamentals-udemy

5 star badge in hackerrank for sql

AI-interview Questions & Answers