profile-pic
Vetted Talent

P Aditya Krishna Rohit

Vetted Talent

Results-oriented Data Scientist with around 2 years of experience in Data Science. Proficient in leveraging AI/ML algorithms to collect, analyze, and transform complex data sets into actionable insights. Skilled in using SQL, and programming languages such as Python to develop machine learning and deep learning models. A self-motivated, quick learner, I am dedicated to optimizing business performance through innovative AI/ML solutions, fostering a culture of inclusion.

  • Role

    AI Data Scientist

  • Years of Experience

    4 years

Skillsets

  • GPT
  • Transformers
  • spaCy
  • Service Bus
  • SciPy
  • Scikit-learn
  • rag
  • PyTorch
  • PIL
  • pandas
  • OpenCV
  • NumPy
  • NLTK
  • MongoDB
  • Keras
  • Python - 5 Years
  • Git
  • Gensim
  • Gen AI
  • Function Apps
  • Azure ML Studio
  • Azure AI Foundry
  • Azure
  • Agentic AI
  • Agentic AI
  • ACS
  • SQL - 1 Years
  • Docker - 1 Years
  • TensorFlow - 3 Years

Vetted For

10Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Machine Learning Engineer ( Remote )AI Screening
  • 64%
    icon-arrow-down
  • Skills assessed :Algorithms, Artificial Intelligence, Cnn, Generative AI, LLM, Mathematics, data-science, machine_learning, Python, Statistics
  • Score: 58/90

Professional Summary

4Years
  • Jun, 2024 - Present2 yr

    Generative AI Data Scientist Associate

    Pwc
  • Jul, 2022 - May, 20241 yr 10 months

    Jr. Data Scientist

    Mirafra Technologies
  • Dec, 2021 - Apr, 2022 4 months

    Machine Learning Engineer

    Datafoundry

Applications & Tools Known

  • icon-tool

    Git

  • icon-tool

    Docker

Work History

4Years

Generative AI Data Scientist Associate

Pwc
Jun, 2024 - Present2 yr
    Agentic Commerce: Developed an Agentic Commerce web app using multi-agent GenAI orchestration with RAG on Azure Cognitive Search (vector + semantic/hybrid retrieval) and Azure OpenAI embeddings (text-embedding-ada-002), ingesting synthetic data to deliver policy-governed, explainable procurement recommendations with auditable decision logs and a scalable architecture for real-time e-commerce API integration. SOX Control Testing: Implemented the extraction of key information for SOX control testing, including Account Reconciliation, Wire Transfer and Bank Reconciliation processes. Developed the Bank Reconciliation workflow to retrieve account summaries, balances, approvers, preparers, and dates, ensuring document accuracy. Utilized ACS (AI-powered search) and GPT-4o for efficient data extraction, incorporating a retry mechanism to handle token limits by resuming from truncation points and consolidating the extracted data into Excel for streamlined analysis and reporting. Workday Workbook Configuration: Configured the Workday Absence module by extracting and organizing country-specific leave policy data from Business Requirement Documents (BRDs). Utilized ACS (AI-powered search) integrated with GPT-4o for efficient data extraction, implementing a logic to handle token limits by retrying and resuming from the point of truncation. Consolidated critical information, including policy names and related details into Excel for seamless configuration and implementation.

Jr. Data Scientist

Mirafra Technologies
Jul, 2022 - May, 20241 yr 10 months
    Developed a Human Fall Detection system using Computer Vision techniques such as MoveNet and CNN. Extracted features from MoveNets pose estimation model with 17 key points, trained an XGBoost model achieving 95% accuracy, and submitted the project for ICBAI 2023, accepted for presentation at IISC Bangalore. Implemented a Resume Parser to extract personal details, skills, and links from resumes. Provided weekly updates to management and successfully deployed the parser on the internal server using Docker technology. Worked on Ciscos Firewall Migration Tool (FMT) project, writing unit test cases for APIs, troubleshooting and fixing UT failures as well as issues in new feature developments, and actively contributing to developing new features related to migration reports.

Machine Learning Engineer

Datafoundry
Dec, 2021 - Apr, 2022 4 months
    Contributed to the Search module development for the clients application, focusing on legal documents. Implemented document segregation (good or bad) using Convolutional Neural Networks (CNN). Generated annotations for the CNN model using Optical Character Recognition (OCR), spell checker, and special character ratio analysis. Trained a shallow CNN model with the generated annotations, achieving an accuracy of 94%.

Achievements

  • Volunteering Tech Fest in BTech Secured 3rd prize in coding on the Occasion of Engineers Day.

Major Projects

3Projects

Hand Gesture Recognition

    Developed a Rock-paper-scissors game based on hand gesture recognition using OpenCV and TensorFlow. Deployed to an embedded device using TinyML techniques, generating images and annotations with OpenCV, and trained a CNN model optimized for embedded constraints.

Image classification of Dogs and Cats

    Used TensorFlow and OpenCV for image classification of dogs and cats. Preprocessed image data and trained a CNN model with data augmentation and transfer learning for accurate classification.

Sentiment Analysis of Twitter data

    Performed sentiment analysis on Twitter data using Logistic Regression with Twitter data from NLTK.

Education

  • MTech in Data Science

    Amrita Vishwa Vidyapeetham (2022)
  • BTech in Computer Science

    NRI Institute of Technology (2020)

Certifications

  • NLP (Coursera) (2021 - 2022)

  • Deep Learning

    Workera
  • DBMS

    NPTEL

Interests

  • Puzzle
  • Cricket
  • Cooking
  • Badminton
  • Singing
  • Travelling
  • AI-interview Questions & Answers

    So myself, currently working as a junior data scientist at Mirafra Technologies. I joined Mirafra in 2022, I have around two plus years of experience overall. So in Mirafra, my main job responsibilities are of a developer type, where one project I worked on was to develop human fall detection using CNN models. So I detected the human posture structure and extracted the features from that, and I classified the video of a human fall, determining whether it was a fall or not. And also I worked on an automatic resume parser, which works on NLP techniques, mainly including N-grams and HashMap functions, so that I can easily extract content from the resume. And previously, I worked at Data Foundry as a machine learning engineer intern. There, I worked on a project called legal entity detection, which involved a client who was a legal lawyer, they have their own application, a legal application, where they upload several documents, legal documents from a very long time, which are scanned manually. So to extract data from those scanned images, that's the part, I worked on identifying the scanned PDF or document, to identify whether each page was suitable for OCR or not. So that's the work I did. And also later on, we did the OCR part. And before that, I earned a master's in data science from Amrita University, my thesis was on emotion analysis, specifically sentiment analysis of moods. And I also hold a bachelor's degree in computer science from R Institute of Technology, Vijayawada.

    Yes. Yes. So for different datasets, we can use different tools. If there is numerical data, we can use pandas. These two are the most used tools to understand the distribution of the data. Because if we want to find the correlation between the features or to find the relevant features in the dataset, we can use pandas and also visualize the data. So, if two features are correlated, we can visualize this using, like, heat maps. We can also use pandas, networkx, and Plotly. These are the different tools we can use.

    Yes. So as I mentioned earlier, I worked on the project with Neemos and analysis. So, this is my Emtek thesis. So that time, there was no proper resource to do the part. That means there was no proper GPU support, so I did the project on my own system, which has minimal GPU support. With the huge data, it became very difficult to train a model. So, what I did was, first, I needed to convert high-resolution images to small resolution, and I made sure that I didn't remove the proper features. That's one approach. I also generated documentation of the same resolution I had changed before. These are the approaches I considered. I used a repeated training approach. For example, if there was a dataset with 100 samples, I did the training on a different dataset and repeated it. Because of this, we can optimize the memory. These are the approaches I followed.

    Yes, so to implement a dictionary in Python, there is a library called scikit-learn. In scikit-learn, it's one of the most popular libraries in Python where we can use many machine learning classifiers, machine learning models. There, it is very useful and very easy to use also. So that's the main model library most people use, specifically scikit-learn.

    Yes. So, one of the projects I worked on was human file detection, where I had to detect a person in the frame and determine if the person had fallen. This is a use case where elderly people or infants stay alone in homes. It's very difficult to maintain a caretaker 24/7. So, this is where we can install a CCTV camera and easily detect if the person has fallen. There are many approaches to identify the tracking of the human body. In this use case, I should detect the fall. In the fall, there is one more scenario: if the person is in a sleeping position, that may be considered a fall. If there is a fall happening, then in the standing upright portion, the person may fall like this, or if there is a shift in movement from standing to dropping. If we consider the distance between the head portion and the top portion of the y-axis, that distance definitely reduces. The hip angle and knee angle also change, and these changes happen. Once we detect the opposing structure of the body, we can easily extract the features. These features are the angles, distances, and rate of change of these angles. Using these features, my purpose is solved: fall detection is solved. Like in NLP, I worked on a product for meme text analysis. The text on the Internet is not always a proper grammar-based sentence. Some people write something in different words, half words, and these words. To train these types of words, I used the word embedding model called POS text, which was trained on no grammar, half words, and features. Based on this, for different scenarios, we can use different customized missions and Python-based algorithms.

    Yes. I worked on projects related to the analysis part and all. So, if you consider a cricket dataset, I worked on a cricket analysis project. If you consider cricket data, there are several features about a cricket person. For example, name, age, debut date, last played match, and average, as well as strike rate. There are several lots of features. So, in that case, we should consider only specific metrics for a specific task. For example, in a particular match, the spin bowler is bowling. For that spin bowler, if you consider his all previous performances, we should select the spinner statistics. Again, it's a bad step. So, we should select the spinner and also use the index spinner for off spin or next spin. We should select the particular feature. In that, we should see the average for right-hand spinners, left-hand spinners separately, and also batting first or batting second, these types of features. So, different things we have to consider. Not only for this. There are many other projects where I should consider what kind of text analysis, right, what kind of text I should use for different things. So, these are all things we should consider.

    Yes. So there is an issue in this Python code. In the normal data function, they're calculating mean np.me. However, nowhere in the world are they using mean. So definitely there is a memory issue. The mean variable is not used anywhere, and that's definitely an issue. It takes some time, and if there's a huge dataset, that itself takes a lot of time. If you remove that line, then it works well. Also, we can use the mean in an efficient way because in most cases, people normalize using the mean only. It's like, you know, first we calculate the mean of the whole data, and for each sample, we test by subtracting the mean to normalize, which is basic normalization. Instead of using standard deviation, if we use mean, that's also an approach. Then we can remove the standard deviation function. So either way, we can do that.

    Yeah, filtering model, yes. So, no. Most machine learning pre-trained models, right, are of extension, you know, PKG files, PKL files like that. So, we should choose the proper load function to load the different types of pre-trained models. In this case, it may work, it may not work, so we should choose the correct model.

    Python scikit-learn is one of the most powerful and useful packages in Python where we can use different machine learning models. Like, we can access different machine learning models and also do different tasks with the dataset, such as splitting the dataset. So we can also build a reliable and assembling learning system using the scikit-learn. That is a very good approach.

    Yeah, so in my real-world experience, I don't have much experience in debugging. But in the experience I had as a Python backend developer, I had debugging with, you know, PDB, the Python debugger. So using that, if we run the code, if we put the Python debugger at that point, it will stop there. Then we can see the value of the variable and how to go forward. So this one I want done and also, a breakpoint. That is one of the outputs.

    Yes, so for a given dataset, first we can choose different machine learning models and train them with the data. No, so also we can choose and find through them if the score is not up to the mark. Right? So we can choose which models perform well. Right? We can choose the models that we have.