Experienced Data Analyst with over 5 years of hands-on experience in leveraging Python, SQL, Power BI, and Excel for comprehensive data analysis. Proficient in extracting insights from complex datasets through data manipulation, exploration, and visualization techniques. Adept at conducting statistical analysis to derive actionable conclusions and facilitate informed decision-making. Demonstrated ability to collaborate cross-functionally to understand business requirements and deliver tailored analytical solutions. Committed to continuous learning and staying updated with emerging trends in data analytics to enhance organizational effectiveness.
Data Analyst
Teamlease ServicesData Analyst
SR Intelligent TechnologiesFinance Executive
Alten IndiaMySQL
Python
Microsoft Power BI
PostgreSQL
BigQuery
Jupyter Notebook
Pandas
NumPy
Seaborn
Matplotlib
Microsoft Excel
Yeah. So my name is, uh, Naveen Sequera, and I've been, uh, working as a data analyst for the past, uh, 5 to 6 years. So my data analyst experience is with, uh, my previous, uh, 3, uh, companies. So my last, uh, most recent experience was with a company called, uh, Flipkart. Actually, Flipkart, uh, was a client. My payroll company was, uh, Teamly Services. So at Flipkart, I worked closely with the sales and, uh, business management team. And my day to day responsibilities were, uh, you know, showing KPIs, doing the reports and dashboards, and also sharing, uh, lot of sales data, uh, uh, to the stakeholders. So my the skill set, uh, that I possess is I'm good at, uh, programming languages like Python and SQL, and I also am very good at using Excel. I could see that my Excel knowledge is at an advanced level. So at Flipkart, uh, I I used to extract data from, uh, SQL Server or, uh, the data warehouse and perform some, uh, transformations on the data, aggregate the data, and, uh, finally, share, uh, the findings to the stakeholders. So I used to do a lot of ad hoc reporting as well. And one of the, yeah, one of, uh, an important achievement, uh, here at Flipkart was I I automated our daily report, uh, using a Python script. So the the report was supposed to be sent every day at a certain time. So the entire process of, uh, extracting the data, uh, as well as uh, doing, uh, the necessary transformation and aggregation steps, as well as, uh, sending the email out to the stakeholders. The entire process was, uh, automated, and that saved a considerable amount of time, uh, every day, uh, for me and the team as well. So that was, uh, one, uh, a mini project, uh, that I took took up here at, uh, Flipkart. So other than that, uh, mostly it was, uh, reporting, uh, communicating KPIs, and, uh, ad hoc tasks. So this is a brief, uh, brief background about myself. Thank you.
So SQL, uh, we could use, uh, techniques like grouping and aggregation, uh, to handle high velocity and voluminous data. Excel has a is a very versatile language and, uh, can be used, uh, to handle, uh, pretty much, uh, every situation. So my my approach would be to instead of loading, uh, the data at a granular level, I would, uh, you know, use grouping and aggregation to get the data in the right shape. And so that, uh, all the data isn't loaded and only the required amount of data is loaded, uh, for analysis. Apart from that, uh, there may be other techniques as well, um, which could be used. But, uh, mostly, I use the grouping and the aggregation techniques to analyze the data. SQL also has, uh, functions like window functions, and, uh, you could use, uh, if, uh, those those techniques as as well to to analyze the data. So this is, uh, as as as I said, uh, I would use, uh, mainly grouping and aggregation techniques in SQL to handle and analyze data from high velocity and volume sources to prevent mortal nets.
So, uh, in NoSQL, uh, the advantage, uh, is that you could, uh, you can use different, uh, categories of data to to store, uh, different categories of data in a NoSQL database like, uh, audio files, uh, video files, etcetera. So that is, yeah, so the main, uh, the main, uh, term would be, uh, in a no SQL database, you can use structured as well as unstructured data. Uh, while in a traditional SQL database, you can, uh, only use structured data. So so the biggest advantage of, uh, NoSQL database would be that you can use structured as well as unstructured data. And that can be very useful for, uh, things like, uh, doing machine learning projects and, uh, you know, AI related work because, traditionally, before the advent of, uh, machine learning, uh, these things were, uh, uh, not used so much. For example, audio files, video files, the necessity of these types of data was not that high. But, uh, since the since, uh, since the advent of machine learning and the the kind of things you can do with the with this type of data. I think that NoSQL, uh, database is highly recommended, and many organizations are having a mixture of both, uh, uh, SQL and, uh, NoSQL database in their organization.
So Tableau. So Tableau, the Tableau is a very, uh, useful, uh, visualization tool, uh, a business intelligence tool. So the steps that I would, uh, uh, take, uh, would be first, uh, the first step would be, uh, connecting to the real time data, uh, uh, from Tableau connecting to the data source. And next, uh, would be checking, uh, once the data is in Tableau, I would check all the the data quality, like, uh, if the data is in the, you know, upload only the columns that are required and, uh, use only the columns that are required. Also, check the data types. The if, uh, the data qualities, uh, uh, would be very essential, uh, for, uh, interactive dashboard. So that would be my next step, checking the data quality. And then, uh, once once all all that is done and only the required amount of data is selected, I would then use different visualizations, uh, to present the data, uh, to the audience. So using charts and graphs like a bar chart, uh, or a line chart to show the trend. Also, yeah, so try to communicate the data as visually as possible. Um, so once all the visualizations are, uh, created, I would I would, uh, you know, do a check, uh, overall check again to see if the numbers are showing correctly. And, uh, do all the necessary quality checks, um, before finalizing on the dashboard. So, yeah, so these would be my steps, uh, for the Tableau dashboard. 1st would be extracting the data, uh, checking for the data quality, building all the necessary visualizations, and, uh, finally, verifying the numbers in the visualizations, uh, and finalizing the dashboard after that.
So handling time series data is, uh, yeah, time Power BI can handle time series data quite well. So but, uh, yeah, the first step would definitely be, uh, you know, the data cleaning because, uh, the dates would have to be in the correct format. So I would use, uh, power query. If possible, I would, uh, try to get the data corrected at the source level, uh, you know, uh, make the formats consistent at the source level. But if it if that's not possible, then I would use, uh, Power Query, uh, to, uh, get the data to get the date formats corrected. And, uh, once I once that is done in Power Query, then I would, uh, uh, bring back load that data into the Power BI interface and then, uh, uh, do the necessary next steps. That would be, you know, creating a matrix visual or a trending line chart or whatever whatever that is required to support, uh, decision making. Yeah. So, uh, the first step would be definitely either either clean the data formats at the source level or, uh, uh, at the power query level. If both both these steps is not possible, then I would, uh, do it at the, uh, Power BI data level. There are a few options where you can clean the data and, uh, and do do my visualizations. But, uh, definitely, the so those steps would be in, uh, that order. Uh, first would be the at this host level, if possible, then at, uh, at the bulk query level, or finally at the data model level inside, uh, Power BI. So this is how I would handle, uh, time series analysis in Power BI when dealing with, uh, inconsistent date formats.
In machine learning, uh, there are, uh, many methods, uh, for backtesting. Yes. I, at my workplace, uh, haven't encountered, uh, this back testing process, uh, that much. But, uh, from what I have learned that you could use, uh, a few techniques which are available in the scikit, uh, learn module, which could be used, uh, for the backtesting process. Yeah. But, uh, to answer this question, uh, I think I may not be have the required amount of knowledge at the present moment. But I but what I can, uh, say that I know is that I know, you know, how to basic machine learning, uh, techniques. I do know that from, uh, loading, uh, from, uh, yeah, loading the dataset to data cleaning and feature engineering if required. And then, uh, finally, fitting and, uh, checking the model results. Those things I know, but we have this, uh, backtesting process. I don't have that much knowledge at the moment, But, uh, I can definitely say it's very much a required process, uh, to to make sure that the machine learning model is giving the right output, uh, and, uh, so the worst part would be, you know, getting the wrong results than getting no results at all. So I would say, uh, you know, this process is, uh, really important. And, uh, definitely, Python, uh, is the preferred language and the industry standard when it comes to machine learning. So yeah. So and we can use the inbuilt, uh, not the inbuilt. Sorry. The the libraries that are, uh, especially built in Python for machine learning like, uh, scikit learn or deep learning models, which have a few features in them which can be used for backtesting.
Yeah. So it looks like, uh, in the select statement, we have, uh, selected the module ID. But, uh, then we are grouping, uh, with a different column, which is, uh, not used in the select statement. So, uh, actually, it should, uh, you know, we we were supposed to use the student ID in the select statement, but, uh, here we have used the module ID, which is incorrect. So I would, uh, replace, uh, the student ID. I would replace module ID with student ID, and that would give me the correct result. Oh, no. Uh, let me review this SQL query intending to retrieve average school module. Okay. So so yes. Uh, Yeah. So no. That, uh, what I would do was, uh, in the select statement, we do need the module ID. I would add the student ID as well before the module ID and then group by both the student ID and module ID to get the average score per module for a student. So the error here is that, uh, we need to use uh, student ID in the select statement. And in the group by, we have to add the module ID, and this would give us the correct score. Give us the help us retrieve the average score, uh, per module for a student.
Okay. Here, it looks like, uh, the the error is in the k means, uh, variable. So the k means function takes in a a parameter called n clusters, which, uh, uh, here it is, uh, hardcoded, uh, as a string. But, uh, actually, it accepts a number. So either you don't pass any parameter or, uh, you if you're passing a parameter, then it should be a integer, like n clusters, uh, 50 or a 100 and not, uh, so if it's auto, then you need not, uh, pass this, uh, parameter. It will, uh, take the default parameter, which is already built in. And, uh, also, one more error is that, uh, auto is, uh, is, uh, mentioned as a string, which is also incorrect. So that would be, uh, the main, uh, error in this function. So I would, uh, replace auto with an integer, which could be from 0, uh, to 100 or 200, etcetera. And that would, uh, correct this function and, uh, give the right output.
I know that, uh, Power BI has an option to integrate, uh, Python within its environment. It can be used to build visualizations, uh, like a heat map or a, uh, yeah, yeah, or, uh, any visualization which is not already, uh, there by default inside Power BI. It can be also used, uh, for time series analysis, I believe, um, but, uh, I haven't, uh, had the chance to use it in my analysis at the workplace. But definitely, uh, it is, uh, possible within Power BI. It has the support both for, uh, Python as well as, uh, R scripting. And, uh, you know, you could use your imagination and creativity to, you know, and, uh, integrate Python inside Power BI to get the type of, uh, to get very advanced level of, insights, uh, and information inside Power BI. So, yes, it's definitely possible, uh, to integrate a custom Python algorithm into a Power BI dashboard for predictive analysis. And, uh, it can be implemented within the power Power BI desktop, uh, environment and, uh, can be used to, you know, show very advanced level of, uh, predictive or informative insights, which can support, uh, decision making.
Yes. So I had, uh, already mentioned this in a in the in a previous question that SQL has some very powerful functionality called uh, SQL window functions, which, uh, perform at the row level. So we have, uh, the normal aggregations like sum and average, which, uh, perform at the column level. But, uh, these window functions perform at the row level. So some of the very popular window functions are, uh, lag, lead. Uh, you have the rank functions like, uh, rank and dense rank. And, uh, we could also use normal, uh, aggregation, uh, functions like sum, average uh, as well, uh, inside, uh, with window functions. So what window function does is it, uh, it, uh, works on a row context. So suppose if you want to, uh, if you have a column called, uh, if you have, uh, uh, the present month, uh, the month column, and you want to have, uh, know the value of the previous month, then you could use the leak, uh, the lag window function, uh, in this scenario. So this would create another column, uh, which would, uh, give you the the value for the previous month. So this is, uh, one example of a window function. And, also, for example, if you want to know the the total, uh, sale, uh, for all the for the entire year, So we could use if you if you have a data at a at a monthly or a daily level, but still you want to have a column which shows you the the total sale for the entire year, then you could use, uh, the sum function along with the so the syntax is like this. So you would use sum of the sales amount, and then you would use, uh, the over clause, um, with the brackets. Uh, inside the inside the over clause, you could either use, uh, partition by or order by, uh, to get the required type of result that you're looking for. So SQL, uh, window functions are a very convenient, uh, and a very useful function inside SQL and can be used for, uh, complex data aggregations in analytical tasks.
I think, uh, Python is, uh, one of the most versatile and, uh, popular languages, uh, programming languages that are present, uh, uh, that is used worldwide at, uh, at this particular moment. So the advantage of Python is that it's it's, uh, you know, the syntax is very, uh, kind of, uh, more, uh, like, as if you have written it in English language. So it's very very readable, not very robust. And, uh, I think it's quite, uh, big beginner friendly language. But though it's, uh, beginner friendly, it has a lot of, uh, capabilities and a lot of, uh, development has happened in recent years, uh, within Python because it is open source. And, uh, anybody can contribute, uh, to the Python open source, uh, in the Python open source world. So, uh, apart from the standard library, you have a lot of lot of very good packages, uh, which, uh, gives definitely gives an advantage, uh, over SPSS for statistical analysis. Uh, SPSS was good at one point of time when, uh, some years ago. But I think now, uh, Python has definitely overtaken SPSS in, uh, in all aspects. And I think, uh, Python also has the advantage of, uh, which, uh, Python can also be used for, uh, machine learning, artificial intelligence. And so and so many other different areas. So, definitely, Python would be my choice, uh, for, uh, any kind of data or statistical analysis. Apart from that, uh, since, uh, Python is open source, it's, uh, free to use and, uh, very easy to learn as well. So anybody can pick up Python uh, and, uh, learn Python and, uh, use it to use it at their work place, uh, to create meaningful work, uh, which, uh, which, uh, which is a big, uh, advantages advantage for organizations, uh, which can help them, grow rapidly, uh, in the in this, uh, age of, uh, data because data is the new, uh, holder, they say. So and, uh, so using Python to to, um, you know, using Python to get get the potential out of this, uh, data is, uh, very is very beneficial. And I would definitely use Python over SPSS.