Vetted Talent

P Aditya Krishna Rohit

Vetted Talent

Results-oriented Data Scientist with around 2 years of experience in Data Science. Proficient in leveraging AI/ML algorithms to collect, analyze, and transform complex data sets into actionable insights. Skilled in using SQL, and programming languages such as Python to develop machine learning and deep learning models. A self-motivated, quick learner, I am dedicated to optimizing business performance through innovative AI/ML solutions, fostering a culture of inclusion.

Role
AI Data Scientist
Years of Experience
4 years

Skillsets

GPT
Transformers
spaCy
Service Bus
SciPy
Scikit-learn
rag
PyTorch
PIL
pandas
OpenCV
NumPy
NLTK
MongoDB
Keras
Python - 5 Years
Git
Gensim
Gen AI
Function Apps
Azure ML Studio
Azure AI Foundry
Azure
Agentic AI
Agentic AI
ACS
SQL - 1 Years
Docker - 1 Years
TensorFlow - 3 Years

Vetted For

10Skills

Roles & Skills
Results
Details

Machine Learning Engineer ( Remote )AI Screening
64%

Skills assessed :Algorithms, Artificial Intelligence, Cnn, Generative AI, LLM, Mathematics, data-science, machine_learning, Python, Statistics
Score: 58/90

Professional Summary

4Years

Jun, 2024 - Present1 yr 10 months
Generative AI Data Scientist Associate
Pwc
Jul, 2022 - May, 20241 yr 10 months
Jr. Data Scientist
Mirafra Technologies
Dec, 2021 - Apr, 2022 4 months
Machine Learning Engineer
Datafoundry

Applications & Tools Known

Git
Docker

Work History

4Years

Generative AI Data Scientist Associate

Pwc

Jun, 2024 - Present1 yr 10 months

Agentic Commerce: Developed an Agentic Commerce web app using multi-agent GenAI orchestration with RAG on Azure Cognitive Search (vector + semantic/hybrid retrieval) and Azure OpenAI embeddings (text-embedding-ada-002), ingesting synthetic data to deliver policy-governed, explainable procurement recommendations with auditable decision logs and a scalable architecture for real-time e-commerce API integration. SOX Control Testing: Implemented the extraction of key information for SOX control testing, including Account Reconciliation, Wire Transfer and Bank Reconciliation processes. Developed the Bank Reconciliation workflow to retrieve account summaries, balances, approvers, preparers, and dates, ensuring document accuracy. Utilized ACS (AI-powered search) and GPT-4o for efficient data extraction, incorporating a retry mechanism to handle token limits by resuming from truncation points and consolidating the extracted data into Excel for streamlined analysis and reporting. Workday Workbook Configuration: Configured the Workday Absence module by extracting and organizing country-specific leave policy data from Business Requirement Documents (BRDs). Utilized ACS (AI-powered search) integrated with GPT-4o for efficient data extraction, implementing a logic to handle token limits by retrying and resuming from the point of truncation. Consolidated critical information, including policy names and related details into Excel for seamless configuration and implementation.

Jr. Data Scientist

Mirafra Technologies

Jul, 2022 - May, 20241 yr 10 months

Developed a Human Fall Detection system using Computer Vision techniques such as MoveNet and CNN. Extracted features from MoveNets pose estimation model with 17 key points, trained an XGBoost model achieving 95% accuracy, and submitted the project for ICBAI 2023, accepted for presentation at IISC Bangalore. Implemented a Resume Parser to extract personal details, skills, and links from resumes. Provided weekly updates to management and successfully deployed the parser on the internal server using Docker technology. Worked on Ciscos Firewall Migration Tool (FMT) project, writing unit test cases for APIs, troubleshooting and fixing UT failures as well as issues in new feature developments, and actively contributing to developing new features related to migration reports.

Machine Learning Engineer

Datafoundry

Dec, 2021 - Apr, 2022 4 months

Contributed to the Search module development for the clients application, focusing on legal documents. Implemented document segregation (good or bad) using Convolutional Neural Networks (CNN). Generated annotations for the CNN model using Optical Character Recognition (OCR), spell checker, and special character ratio analysis. Trained a shallow CNN model with the generated annotations, achieving an accuracy of 94%.

Achievements

Volunteering Tech Fest in BTech Secured 3rd prize in coding on the Occasion of Engineers Day.

Major Projects

3Projects

Hand Gesture Recognition

Developed a Rock-paper-scissors game based on hand gesture recognition using OpenCV and TensorFlow. Deployed to an embedded device using TinyML techniques, generating images and annotations with OpenCV, and trained a CNN model optimized for embedded constraints.

Image classification of Dogs and Cats

Used TensorFlow and OpenCV for image classification of dogs and cats. Preprocessed image data and trained a CNN model with data augmentation and transfer learning for accurate classification.

Sentiment Analysis of Twitter data

Performed sentiment analysis on Twitter data using Logistic Regression with Twitter data from NLTK.

Education

MTech in Data Science
Amrita Vishwa Vidyapeetham (2022)
BTech in Computer Science
NRI Institute of Technology (2020)

Certifications

NLP (Coursera) (2021 - 2022)
Deep Learning
Workera
DBMS
NPTEL

Interests

Puzzle

Cricket

Cooking

Badminton

Singing

Travelling

AI-interview Questions & Answers

Yeah, sure. So myself, currently working as a junior data scientist at Mirafra Technologies. I joined Mirafra in 2022, I have around 2 plus years experience in overall. So in Mirafra, my most of my job responsibilities is of a developer type, where one project I worked on was to develop human fall detection using computer network, sorry, using CNN models. So where I detected the human post structure and using the, extracted the features from that and I classified the video of a human fall, its fall or not. And also I worked on a resume parser, automatic resume parser, which works on NLP techniques, which mainly includes, you know, Ngrams, HashMap functions, so that I can easily extract that content from the resume. And previous and before this, I worked on, worked in data foundry as a machine learning engineer, intern, it's an internship. There, I worked on a project called legal entity detection, which where the client is a legal lawyer, they have their own application, legal application, where they upload several documents, legal documents from a very long time, which are scanned manually. So to extract the data from those scanned images, that's the part, I worked on this part, where my part was to, you know, to identify the scanned PDF or document, whatever it is, to identify the page in the document, every page, it's a good for doing OCR or not. So that's the work I did. And also later on, we did the OCR part. And before that, I did master's in Amrita University in data science. My thesis was emotion analysis. It is called sentiment analysis of moods. And I did a bachelor's in computer science and R Institute of Technology, Vijayawada.

Yes. Yes. Yeah. So for different datasets, right, we can use different, tools. So, if there is if there is a numerical data, right, we can, use the and the pandas. These 2 are the mostly used, tools to, understand the distribution of the data. Because if we, if we want to find the correlation between the features or, to find to find, to find the relevant, features in the dataset. So there, we can use, pandas and also to, visualize the data. so for which if two features are correlated, how they are, correlated, that we can visualize using, like, heat maps and all we can use using pandas, network, late, SMS. So these are the different, tools we can, use.

Yes. Yeah. So as I mentioned earlier, I worked on the project with Neemos and analysis. So there, this is, my Emtek. This was my Emtek thesis. So that time, there was, no proper resource to do the part. That means there is no proper GPU support or so I did this project in my own system. I have minimal GPU support. So they're right. With the huge data, it was it became very difficult to, train a model itself. So there, what I did was, so first, I need to, if there is a high resolution image again, so I need to convert that to small resolution. And, also, I should make sure that it doesn't, remove, the proper, features. So that's one approach. And also, what I did was I also generated, documentation, which is of the same resolution which I changed before. So these are the approaches I, considered. And, also, I used, you know, repeated training approach. For example, if there is a dataset 100, samples, dataset with the different dataset, and I so I did the dataset with the different dataset, and I so I did that. So because of this memory, we can optimize the memory. So these are the approaches I follow.

Yes. so to implement the dictionary, right, in Python, there is a library called scikit learn. So in the scikit learn, it's one of the most popular library in Python where we can, we can, use so many, machine learning classifiers, machine learning models. There, it is very useful and very easy to use also. So that's the main one of the main model libraries most of the people use, psychic lab inside.

Yes. Yeah. so one of the project I worked on human file detection, where, you know, I had to, you know, detect a person in the frame, the person is fall or not. So it is a use case where, in a, in the homes, elderly people or infants stay alone. So stay alone. So it's very difficult to, maintain a, caretaker 247. So this if we install a CCTV camera, then we can easily detect if the person is false, then we can detect if it is false. So for that, right, so there are many approaches, to identify the tracking of the human body, you know, track tracking. So in this use case, right, I should detect the fall. in the fall, right, there is one more scenario. If the person is in sleeping position, then that may consider as a fall. So what we can do is the if there is there is a hall fall happening, then, definitely, in the standing upright portion, the person may fall like this or he's if the there is a shift, of movement of the person, from the standing to, dropping. So if you consider the distance between, the head portion and the top portion of the y axis, So it definitely reduces. And also hip angle changes, knee angle changes, these changes happen. So once we detect the, opposed structure of the body, right, then we can easily, you know, extract the features. So as I told the features in the sense, these angles, distances, rate of change of these angles. So these are the very important features. So using this feature, my purpose is solved. So fall detection is solved. So like that, even in NLP. Right?, I worked on a product, meme text analysis where, you know, the text, if you consider the text on the Internet means that would be a proper grammar based sentence. Some people even I can write something, different words, half words, these words. So to train these type of words, I took the model called, word embedding model called POS text, where the POS text was trained on, no, half words, features. So based on this, I for different scenarios, we can use different, you know, customized mission the, Python based, algorithm. Mission and algorithm.

Yes. Yeah. Yeah. so, I worked on projects, related to, you know, analysis part and all. So if you consider, you know, a cricket dataset, I worked on, cricket analysis project. If you consider cricket data, right, there are several, features about 1 cricket person. For example, name, age, is a debut date, last, played match and what is his average, what is strike rate about. There are several lots of features. So in that, we should consider only a specific meters for specific, task. For example, in a particular match, the spin bowler is bowling. For that spin bowler, if you consider his, all the previous performances, we should select the spinner statistic. Again, it's this bad step. So we should select the spinner and also use index spinner in the off spin or next spin. So we should select the particular, feature. In that, we should see the average on the right hand spinner, left hand spinner separately on the and also batting first or, batting second, these type of features. so different, things we have to consider. Not only for this. There are many other projects where, I should consider, you know, what kind of if I do text analysis, right, what kind of text I should use for, different, things. So these are these are all we should consider.

Okay. Yes. So there, in this Python, code. Right? in the normal data function, there is a they're they're calculating mean np.me. But nowhere they're using, you know, mean in the world. So definitely there is, memory issue. So the mean, variable, they are they are not using anywhere, and that's definitely a issue. and it is it takes some time. And if there is huge dataset, right, that itself takes so much time. So if you remove the, that line, right, then it works well. And also we can use the mean in efficient way because, in most of the cases, normalizing, people do normalize using the mean only. it's like, you know, first we calculate the mean of the whole data, And for each sample, we, test 2 sample by mean sample by so normalize like, basic normalization. So instead of standard deviation, right, if we use mean, right, that's also an approach, then we can remove the standard deviation function. So either way we can do that. So either of them we can

Yeah. Filtering model. Yes. Yeah. So, no. Most of the machine load learning pre trained models, right, are of extension, you know, PKG files, PKL files like that. So we should choose the proper, you know, a load function to load the different types of, pre trained model. So in this case, it may work. It may not work. So that's why we should choose the correct model.

Yeah. so Python scikit learn is one of the most powerful packages and useful packages in Python where we can use different, machine learning models. Like, we can access different machine learning models and also, we can, do, you know, different task with the data set like split splitting the data set, trying to split, and this. So we can also build, a reliable and assembling learning system using the scikit chat. that is a very good approach.

Yeah. So in my, real world experience, I don't have much experience in debugging. But in the, experience I had, I had as a Python backend. So in the backend, I have debugging, with, you know, PDB, debugger, Python debugger, PDB. So using that, right, if we if we run the code, if we run the tool or code, whatever it is product, so if we write if we put the Python debugger at that point, it will stop there. Then we can see what is the value or what is the value of the variable and how can we do how can we go forward? So this one I want done and also, breakpoint. That is one of the outputs.

Yes. so for a for a given dataset, first we can what we can do is we can choose different machine learning models and train with the data. No. so and also choose and find through them if the, if the score is not up to the mark. Right? So we can choose, which are performs well. Right? We can choose what are the model we have.