
Data professional with a track record in analytics, operations, and data science. Proficient in Python, SQL, Excel, data visualization tools, and cloud. Experienced in machine learning, A/B testing, and database. Contributed to a 20% annual revenue growth and delivered a 14% improvement in accuracy rates through data quality enhancements.
Data Analyst
FreelanceSenior Engineer-Data Operations
AI Touch LLP (IT Company)Quality Engineer
Rinox Railings (Manufacturing)
Python

MySQL

PostgreSQL

Microsoft Power BI
Microsoft Excel

Tableau Prep

BigQuery

SQL

Excel

Power BI

Tableau

Google Analytics

Pandas

Cloud

AWS

SQL

Tableau

Excel

AWS

GitHub
Pyspark

BigQuery

Logistic Regression

Decision Tree

Random Forest

XGBoost

SVM
"Krishna is a passionate data scientist with an on-demand skillset in performing machine learning using Python. His ability in problem-solving using data science techniques is definitely a beneficial contribution to the industry or stakeholders. "
Hi. Hi. My name is Krishna Verma, and I'm a 6-year data professional. I've been working with the startup, AI, LLP, since January 2018. I was working as a senior engineer in data operations. Also, apart from that, I have been working as a freelance data analyst. I've been doing that for almost 3 years, and I've learned about analytics tools like Python, SQL, Tableau, machine learning, and statistics. Apart from that, I've had experience in stakeholder management, leadership, and project management as well. This is my current profile for this particular role. Thank you.
It's completely dependent on the data. For example, you asked how I can leverage SPSS for predictive modeling if I prefer machine learning algorithms for predictive modeling. I use Python libraries, for example, scikit-learn. There are a huge amount of algorithms in terms of machine learning. So, from there, I use different kinds of algorithms. For example, linear regression, logistic regression, decision tree, random forest, and many more. These kinds of algorithms are helpful to build a predictive model with historical data where I can use the data into different parts. For example, training, testing, validation, this kind of stuff. So, that's how I use it. Using SPSS, I would like to say that I can leverage this particular thing as well, which would not be too hard. But for the current time, I use more machine learning algorithms provided by Python rather than this.
This is one of the things I do quite regularly because most of the time, the stakeholder is a nontechnical person. So I try to present the dashboard in a simpler form. The report I prepare should be simpler, and there shouldn't be too much complexity in the dashboard if I try to present it. I don't want to show them certain kinds of visualizations that are hard to understand. So, first, I ensure that the visualization is easier to understand. Then I prepare a presentation alongside the dashboard where I can explain something about each of these visualizations individually. This makes it easier for the stakeholder to understand what I'm trying to say. Apart from that, I use the storytelling method, which helps them understand what I went through with the data, how I gained the insights, and what my recommendation would be based on those results. That's what I do.
There can be certain ways to automate this kind of performance. For example, we can create an ETL pipeline from the data source to the location where we want our data. For instance, if we can use Python scripts or SQL scripts for data extraction. Mostly, I use Python for data extraction, and cleansing also. Python can do that. So, the script we can use for using Python for that and by creating the pipeline. In the middle of the pipeline, all this processing will happen, and then the data will be transferred to the data warehouse. So later, we can do analysis, create some reports, and dashboards. That's how we can do analysis, or we can use directly SQL because the data sometimes needs to be in local databases. So we can use SQL where we can directly extract the data from there, like connecting SQL with Tableau. And before transforming the data into Tableau or Power BI, we can do all these analysis or screening methods or transforming methods. So, yes, that's how we can do.
That's a really great question, because I myself worked on NoSQL in a few conditions. The reason we use NoSQL over SQL sometimes is because we don't need a strict schema for a database where we provide a certain kind of schema at the starting point, and then we store the data according to that. But sometimes, we don't know about the data coming directly from a particular source or location, and there can be variations. Sometimes the schema varies, so we want a database with flexibility. Without flexibility, we cannot store the data. We have to define the schema again from the initial point, and then store it. So, to avoid this situation, we use NoSQL. However, there are still certain uses of relational databases, so we don't use NoSQL in every case. It completely depends on the use case we are doing, and that's why we use NoSQL or SQL.
If you are asking about high velocity volume source, that completely means big data. In such kind of situation, I don't use SQL alone. I use SQL connected with Spark. So that would help me to handle such a large data to work on that in terms of analysis, cleaning, transformation, whatever the purpose because data is continuously coming and in high volume. So alone using SQL is not a good idea. So Spark always helps in that particular process, because we can use Hadoop Distributed File System. We can use clusters. So we can divide the data into smaller clusters so that process makes faster and really quick and efficient way. So that's what I do to handle such a large volume of data.
Okay, so in this code, the number of clusters is auto. So that means it will automatically try to provide a particular number and divide the data into that certain kind of clusters. But we usually don't consider it an efficient way because we don't know how many clusters we require according to the data. So that's why we use the elbow method. There, we can certainly require defining a certain kind of clusters, and we see certain kinds of transformations in terms of visualization, elbow visualization. And then we pick that particular point, and we initially consider that amount of clusters. But then to confirm the number of clusters, we do a particular test for k-means. I don't remember the name currently, but there's a test available where it compares all those for example, we chose 5 as a total number of clusters. So it will check from 1 to 5 all the cluster performance, and the best performing number will be selected as the final cluster. So that would be a better way rather than mentioning auto because that will not provide a correct answer.
In this query, the error is in the group by statement. We have mentioned that we only want the average score for per med module, for a student ID 12 345. So rather, that's defining that there's only 1 student, but we are grouping by student ID is not a proper way. It's an incorrect way. We have to use module ID instead of student ID. That's where we can make this query correct and more efficient.
Power BI provides a particular section where we can use our script or Python script for creating dashboards or predictive insights for such kind of operation where we are required to use Python or R script. So we can click on that particular location. We can import our query there and connect with it. So that's how we can do that. Also, Power BI provides some kind of analysis section by itself. It can do predictive and provide predictive insights, trending lines, etcetera, which we can also use Python for as well. So if we just want the predictive insight, we can directly use that thing from Power BI as well. But if the process is quite complex, in that case, we need Python or R. So for that, we can directly connect to a Python script from there.
Python data structures is really important because sometimes as analysts, we think that data structures are something for developers, but we don't understand that every data structure has its own purpose and its own efficiency benefits. So we should know which data structure we need for a particular data analysis, where to store the data, what kind of for example, if we want to avoid duplicate values, we will go with a set because a set never allows duplicate values. For the same example, if we want a data structure where we don't want to make any kind of edit in the middle of the process, we want a fixed data structure, we can use a tuple. And if also, if we want a data structure where we want data from different kinds of data types, and it should be flexible in edit, so we can use a list instead of an array. So this kind of knowledge as a data analyst, we should have because it helps us in our data analysis performance as well.
Why might you choose to use Python over other services? That's a really great question. I personally like Python a lot rather than any other service or programming language because Python provides a huge variety of libraries that can be used to work with, handle different kinds of data, do any kind of analysis, data transformation using visualization, using machine learning algorithms, and statistical analysis as well, for example, SciPy or any other libraries. So Python provides a huge variety. So we don't need to create everything by ourselves. We can directly import those kinds of things. For example, we want to do a t-test, a JET test, ANOVA, or chi-square, any kind of statistical test. We can just import the library, and then we can do all these tests in a fast and efficient way. So that's why we always use Python over SPSS for statistical analysis.