Results-oriented Data Scientist with around 2 years of experience in Data Science. Proficient in leveraging AI/ML algorithms to collect, analyze, and transform complex data sets into actionable insights. Skilled in using SQL, and programming languages such as Python to develop machine learning and deep learning models. A self-motivated, quick learner, I am dedicated to optimizing business performance through innovative AI/ML solutions, fostering a culture of inclusion.
Generative AI Data Scientist Associate
PwcJr. Data Scientist
Mirafra TechnologiesMachine Learning Engineer
Datafoundry
Git
.png)
Docker
Yeah, sure. So myself, currently working as a junior data scientist at Mirafra Technologies. I joined Mirafra in 2022, I have around 2 plus years experience in overall. So in Mirafra, my most of my job responsibilities is of a developer type, where one project I worked on was to develop human fall detection using computer network, sorry, using CNN models. So where I detected the human post structure and using the, extracted the features from that and I classified the video of a human fall, its fall or not. And also I worked on a resume parser, automatic resume parser, which works on NLP techniques, which mainly includes, you know, Ngrams, HashMap functions, so that I can easily extract that content from the resume. And previous and before this, I worked on, worked in data foundry as a machine learning engineer, intern, it's an internship. There, I worked on a project called legal entity detection, which where the client is a legal lawyer, they have their own application, legal application, where they upload several documents, legal documents from a very long time, which are scanned manually. So to extract the data from those scanned images, that's the part, I worked on this part, where my part was to, you know, to identify the scanned PDF or document, whatever it is, to identify the page in the document, every page, it's a good for doing OCR or not. So that's the work I did. And also later on, we did the OCR part. And before that, I did master's in Amrita University in data science. My thesis was emotion analysis. It is called sentiment analysis of moods. And I did a bachelor's in computer science and R Institute of Technology, Vijayawada.
Yes. Yes. Yeah. So for different datasets, right, we can use different, tools. So, if there is if there is a numerical data, right, we can, use the and the pandas. These 2 are the mostly used, tools to, understand the distribution of the data. Because if we, if we want to find the correlation between the features or, to find to find, to find the relevant, features in the dataset. So there, we can use, pandas and also to, visualize the data. so for which if two features are correlated, how they are, correlated, that we can visualize using, like, heat maps and all we can use using pandas, network, late, SMS. So these are the different, tools we can, use.
Yes. Yeah. So as I mentioned earlier, I worked on the project with Neemos and analysis. So there, this is, my Emtek. This was my Emtek thesis. So that time, there was, no proper resource to do the part. That means there is no proper GPU support or so I did this project in my own system. I have minimal GPU support. So they're right. With the huge data, it was it became very difficult to, train a model itself. So there, what I did was, so first, I need to, if there is a high resolution image again, so I need to convert that to small resolution. And, also, I should make sure that it doesn't, remove, the proper, features. So that's one approach. And also, what I did was I also generated, documentation, which is of the same resolution which I changed before. So these are the approaches I, considered. And, also, I used, you know, repeated training approach. For example, if there is a dataset 100, samples, dataset with the different dataset, and I so I did the dataset with the different dataset, and I so I did that. So because of this memory, we can optimize the memory. So these are the approaches I follow.
Yes. so to implement the dictionary, right, in Python, there is a library called scikit learn. So in the scikit learn, it's one of the most popular library in Python where we can, we can, use so many, machine learning classifiers, machine learning models. There, it is very useful and very easy to use also. So that's the main one of the main model libraries most of the people use, psychic lab inside.
Yes. Yeah. so one of the project I worked on human file detection, where, you know, I had to, you know, detect a person in the frame, the person is fall or not. So it is a use case where, in a, in the homes, elderly people or infants stay alone. So stay alone. So it's very difficult to, maintain a, caretaker 247. So this if we install a CCTV camera, then we can easily detect if the person is false, then we can detect if it is false. So for that, right, so there are many approaches, to identify the tracking of the human body, you know, track tracking. So in this use case, right, I should detect the fall. in the fall, right, there is one more scenario. If the person is in sleeping position, then that may consider as a fall. So what we can do is the if there is there is a hall fall happening, then, definitely, in the standing upright portion, the person may fall like this or he's if the there is a shift, of movement of the person, from the standing to, dropping. So if you consider the distance between, the head portion and the top portion of the y axis, So it definitely reduces. And also hip angle changes, knee angle changes, these changes happen. So once we detect the, opposed structure of the body, right, then we can easily, you know, extract the features. So as I told the features in the sense, these angles, distances, rate of change of these angles. So these are the very important features. So using this feature, my purpose is solved. So fall detection is solved. So like that, even in NLP. Right?, I worked on a product, meme text analysis where, you know, the text, if you consider the text on the Internet means that would be a proper grammar based sentence. Some people even I can write something, different words, half words, these words. So to train these type of words, I took the model called, word embedding model called POS text, where the POS text was trained on, no, half words, features. So based on this, I for different scenarios, we can use different, you know, customized mission the, Python based, algorithm. Mission and algorithm.
Yes. Yeah. Yeah. so, I worked on projects, related to, you know, analysis part and all. So if you consider, you know, a cricket dataset, I worked on, cricket analysis project. If you consider cricket data, right, there are several, features about 1 cricket person. For example, name, age, is a debut date, last, played match and what is his average, what is strike rate about. There are several lots of features. So in that, we should consider only a specific meters for specific, task. For example, in a particular match, the spin bowler is bowling. For that spin bowler, if you consider his, all the previous performances, we should select the spinner statistic. Again, it's this bad step. So we should select the spinner and also use index spinner in the off spin or next spin. So we should select the particular, feature. In that, we should see the average on the right hand spinner, left hand spinner separately on the and also batting first or, batting second, these type of features. so different, things we have to consider. Not only for this. There are many other projects where, I should consider, you know, what kind of if I do text analysis, right, what kind of text I should use for, different, things. So these are these are all we should consider.
Okay. Yes. So there, in this Python, code. Right? in the normal data function, there is a they're they're calculating mean np.me. But nowhere they're using, you know, mean in the world. So definitely there is, memory issue. So the mean, variable, they are they are not using anywhere, and that's definitely a issue. and it is it takes some time. And if there is huge dataset, right, that itself takes so much time. So if you remove the, that line, right, then it works well. And also we can use the mean in efficient way because, in most of the cases, normalizing, people do normalize using the mean only. it's like, you know, first we calculate the mean of the whole data, And for each sample, we, test 2 sample by mean sample by so normalize like, basic normalization. So instead of standard deviation, right, if we use mean, right, that's also an approach, then we can remove the standard deviation function. So either way we can do that. So either of them we can
Yeah. Filtering model. Yes. Yeah. So, no. Most of the machine load learning pre trained models, right, are of extension, you know, PKG files, PKL files like that. So we should choose the proper, you know, a load function to load the different types of, pre trained model. So in this case, it may work. It may not work. So that's why we should choose the correct model.
Yeah. so Python scikit learn is one of the most powerful packages and useful packages in Python where we can use different, machine learning models. Like, we can access different machine learning models and also, we can, do, you know, different task with the data set like split splitting the data set, trying to split, and this. So we can also build, a reliable and assembling learning system using the scikit chat. that is a very good approach.
Yeah. So in my, real world experience, I don't have much experience in debugging. But in the, experience I had, I had as a Python backend. So in the backend, I have debugging, with, you know, PDB, debugger, Python debugger, PDB. So using that, right, if we if we run the code, if we run the tool or code, whatever it is product, so if we write if we put the Python debugger at that point, it will stop there. Then we can see what is the value or what is the value of the variable and how can we do how can we go forward? So this one I want done and also, breakpoint. That is one of the outputs.
Yes. so for a for a given dataset, first we can what we can do is we can choose different machine learning models and train with the data. No. so and also choose and find through them if the, if the score is not up to the mark. Right? So we can choose, which are performs well. Right? We can choose what are the model we have.