Data professional with a track record in analytics, operations, and data science. Proficient in Python, SQL, Excel, data visualization tools, and cloud. Experienced in machine learning, A/B testing, and database. Contributed to a 20% annual revenue growth and delivered a 14% improvement in accuracy rates through data quality enhancements.
Data Analyst
FreelanceSenior Engineer-Data Operations
AI Touch LLP (IT Company)Quality Engineer
Rinox Railings (Manufacturing)Python
MySQL
PostgreSQL
Microsoft Power BI
Microsoft Excel
Tableau Prep
BigQuery
SQL
Excel
Power BI
Tableau
Google Analytics
Pandas
Cloud
AWS
SQL
Tableau
Excel
AWS
GitHub
Pyspark
BigQuery
Logistic Regression
Decision Tree
Random Forest
XGBoost
SVM
"Krishna is a passionate data scientist with an on-demand skillset in performing machine learning using Python. His ability in problem-solving using data science techniques is definitely a beneficial contribution to the industry or stakeholders. "
Hi. Hi. My name is Krishna Verma, and I'm a 6 year data professional. I've been working with the startup, uh, since January 2018. The startup name was AI, LLP. They I was working as senior engineer in data operation. Also, apart from that, I have been working as a freelance data analyst. I've been there more than, uh, almost 3 years, and I've learned about analytics tools like Python, SQL, Tableau, machine learning, statistics. Apart from that, I've been experienced in stakeholder management, leadership, and project management as well. So this is my current profile for this particular role. Thank you.
It it completely depend on the data. Uh, for example, you asked that as, uh, how I can leverage SPSS for predictive modeling if you're I basically prefer machine learning algorithm for predictive modeling. I use Python libraries, for example, scikit learn. Uh, there's provide a huge amount of algorithms in terms of machine learning. So from there, I use different kind of algorithm. For example, linear regression, logistic regression, decision tree, uh, random forest, and many more. So these kind of algorithms are helpful to build a predictive model, uh, with historical data where I can use the data into different kind of parts. So, uh, for example, training, testing, validation, this kind of stuff. So that's how I use, Using the s SPSS, I would like, uh, I would say that I can leverage this particular thing as well, which that would be not too hard. But for for the current time, I use more machine learning algorithm from provided by Python rather than this.
This is one of the thing I do quite, uh, regularly, uh, because most of the time, the stakeholder, uh, would be the nontechnical person. So I try to present the dashboard. The report I prepare is should be a simpler form. They're not should be a too much complexity in the dashboard if I try to present. I don't want to show them some certain kind of visualization that is hard to understand. So first thing, I ensure that visualization should be easier. Then I prepare a presentation as well, uh, with, uh, alongside with the prayer dashboard where I can explain something about all these these visualization individually. So it would be easier for the stakeholder to understand what I'm trying to say. And apart from that, I use storytelling method, which help them to understand, uh, what I went through about the data, how what was the insights I provided, and what would be my recommendation on the basis of those result. So that's what I do.
There can be certain way to automate this kind of, uh, performance. For example, we can create a ETL pipeline, uh, uh, from the data source to the location where we want our data. Uh, for example, if, uh, we can use Python, uh, scripts or SQL script for data extraction. Mostly, I use Python for data extraction, and cleansing also, uh, Python can do. So the script we can use for for with using Python for that and by creating the pipeline. In the middle of the pipeline, all this performance will happen, and then the data will be transferred to the data warehouse. So later, we can do analysis, create some reports, and dashboard. So that's how we can then do, or we can use directly SQL because the data sometimes the data should be in the local databases. So we can use SQL where there we can directly extract, uh, the data from there, like connecting SQL with Tableau And, uh, before transforming the data into Tableau or Power BI, we can do the all these analysis or a screening method or transforming method. So, yes, that's how we can do
That's a really great question, uh, because I I myself worked on, um, NoSQL, uh, in a few condition. The reason we use NoSQL somewhere sometimes over, uh, SQL, uh, data or SQL database, uh, because sometimes we don't need a flex, uh, strict schema for a database where we provide a certain kind of schema in the starting point, and then we will store the data according to that. But sometimes what happen, uh, we don't know about the data which is coming directly from a particular source or particular location, and there can be a variation happen. Sometimes the schema will be varied, so we want a certain database with their flexibility. Uh, there there's a scope of flexibility. Without steps, uh, flexibility, we cannot, uh, store the data. We have to define the schema again from the initial point, and then we have to store it. So avoiding this particular situation, we use NoSQL, but still there are certain use of relational database, so we don't use NoSQL. So that's completely depend on the use case we are doing, and that's according we use NoSQL or SQL.
If you are asking about high velocity volume source, that completely means big data. In such kind of, uh, situation, uh, I don't use SQL alone. I use SQL, uh, connected with, uh, Spark. Uh, so that would help me to handle such a large, data, uh, to work on that in terms of analysis, cleaning, transformation, whatever the purpose because data is continuously coming and in high volume. So alone using SQL is not a good idea. Uh, so Spark always help in that particular process, uh, because we can use IDD. We can use clusters. So we can we can divide the data in smaller clusters so that process makes faster and really quick and efficient way. So that's what I do to handle such a large volume of data.
Okay. So in this code, uh, the number of cluster is auto. So that means it will automatically try to provide a particular number and divide the data in that certain kind of clusters. But we usually don't consider it as a efficient way because, uh, we don't know, uh, how much, uh, cluster we require according to the data. So that's why we use the albo method. There, we can cert require, uh, define a certain kind of clusters, and we see certain kind of transformation in terms of visualization elbow visualization. And then we pick that particular, uh, point, And we initially consider that, uh, amount of the cluster. But then to confirm number of cluster, we do a particular test for, uh, k means. Uh, I don't remember the name, uh, currently, but there's a test available, uh, where it compare all those, uh, for example, we chose 5 as a total number of cluster. So it will check from 1 to 5 all the cluster performance, and the best performing number will be selected as a final cluster. So that would be a better way rather than, um, mentioning auto because that will not provide a correct answer.
In this, uh, query, uh, the error is in group by statement. We have mentioned that we only want, uh, the average score for per med module, uh, for a student ID 12 345. So rather, uh, that's that's defining that there's only 1 student, but we are grouping by student ID is not a proper way. It's an incorrect way. We have to use module ID instead of student ID. That's where we can make this query, uh, correct and more efficient.
Power BI provide a particular section where we can use our script or Python script, uh, for creating dashboard or predictive insights, uh, for such kind of operation where we are required Python or R script. So we can click on that particular location. We can import, uh, our query there, and we can connect with that. So that's how we can do that. Also, power, uh, Power BI also provides some kind of analysis section, uh, by a bill in build itself. So it can do predictive and it can provide predictive insights, trending, uh, lines, etcetera, which we will use Python for as well. So if just we want just the predictive insight, we can directly use that that thing from Power BI as well. But if the process is quite, uh, complex, uh, in that such case, we need Python or Arc. So for that, we can directly connect to a Python script to our script from there.
Python data structures is really, really important because, uh, uh, sometimes as analysts, we think that, okay, data structure is something for, uh, developers, but we don't understand that every data structure has its own purpose, has its own efficiency benefits. So we should know that what data structure we need for a particular data analysis, where to store the data, what kind of for example, if we want we don't want a duplicate value, we will go with set because a set never allowed you to have duplicate value. For same same example, uh, if we want a data structure where we don't want to make any kind of edit in the middle of the process, we want a fixed data structure, we can use a tuple. And if, uh, uh, also, if we want a data structure where we want data from different kind of data type, and it should be flexible in edit, So we can use, um, list instead of array. So this kind of knowledge as a data analyst, we should have because it help us in our data analysis performance as well. So yeah.
Why might you choose to use Python over as my service? Uh, that's really great question. Uh, I personally like Python a lot rather than any other service or programming language because Python provide a huge variety of libraries which can use for to work, handle different kind of data, do any kind of analysis, data transformation using visualization, uh, using machine learning algorithm, and statistical analysis as well, for example, SciPy or any other libraries. So Python provide a huge variety. So we don't need to create by everything by, uh, by ourself. We can directly import those kind of things. For example, we want to do p test, JET test, ANOVA, or chi square, any kind of statistical test. We can just import the library, and there from there. We can do all these tests by, uh, in a fast way and efficient way. So that's why we all, uh, for, uh, in terms of my suggestion, we always use Python, uh, over SPSS for statistical analysis.