profile-pic
Vetted Talent

Chaitanya Arora

Vetted Talent
Highly-skilled Machine Learning Engineer with over years of experience in applying data science and machine learning algorithms to solve complex business challenges. Specialized in Natural Language Processing (NLP), Deep Learning, Generative AI, LLMs and cloud computing technologies(AWS and GCP).
  • Role

    Associate Data Scientist

  • Years of Experience

    3 years

Skillsets

  • Troubleshooting
  • TensorFlow
  • Rest APIs
  • Analytical
  • Keras
  • Web Services
  • APIS
  • Algorithms
  • ETL pipelines
  • Google Cloud Platform
  • OCR
  • Python
  • GCP
  • Vertex AI
  • Random Forest
  • Cloud
  • Git
  • Scikit-learn
  • NLTK
  • LSTM
  • Transformers
  • Lambda
  • PyTorch - 3 Years
  • PyTorch - 3 Years
  • Regression Analysis
  • Statistics
  • NLP
  • Mongo DB
  • HuggingFace
  • Deep Learning
  • LLM
  • ML
  • Python - 3 Years
  • Docker
  • AWS
  • LLAMA
  • LLMs
  • SQL
  • ETL
  • AI
  • Flask
  • On

Vetted For

10Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    AI Chatbot Developer (Remote)AI Screening
  • 71%
    icon-arrow-down
  • Skills assessed :CI/CD, AI chatbot, Natural Language Processing (NLP), AWS, Azure, Docker, Google Cloud Platform, Kubernetes, machine_learning, Type Script
  • Score: 64/90

Professional Summary

3Years
  • Jun, 2023 - Present2 yr 3 months

    Machine Learning Engineer

    Liaison Medicare Pharma
  • May, 2022 - Jun, 20231 yr 1 month

    Analyst - I

    Aon
  • Sep, 2020 - Feb, 20221 yr 5 months

    Associate Data Scientist

    Syntax Edutek

Applications & Tools Known

  • icon-tool

    AWS SQS

  • icon-tool

    AWS SES

  • icon-tool

    AWS Lambda

  • icon-tool

    AWS S3

  • icon-tool

    AWS Sagemaker

  • icon-tool

    Vertex AI

  • icon-tool

    Keras

  • icon-tool

    YOLO

  • icon-tool

    OCR

  • icon-tool

    Metabase

Work History

3Years

Machine Learning Engineer

Liaison Medicare Pharma
Jun, 2023 - Present2 yr 3 months
    Implemented salesperson pitch audit mechanism, NLP system for data extraction, and forecasting model using LSTM. Developed Flask based REST APIs and microservices architecture with AWS and Vertex AI.

Analyst - I

Aon
May, 2022 - Jun, 20231 yr 1 month
    Designed ML models to predict risk scores, maintained data configuration in ETL pipelines, and implemented Langchain layer for LLM and AI use cases.

Associate Data Scientist

Syntax Edutek
Sep, 2020 - Feb, 20221 yr 5 months

Achievements

  • Implemented salesperson pitch audit mechanism with 93% accuracy
  • Reduced LLMs token cost by around 78%
  • Implemented 2+ scalable ML inference systems
  • Predicted customer risk scores with 89% accuracy
  • Live monitoring of lecture classes saving 100s of manhours
  • Increased coupon usage by 20%
  • Developed 8+ ML use cases with a variety of algorithms

Education

  • Bachelor of Technology (B.Tech.), Computer Science and Engineering

    Lovely Professional University (2021)

AI-interview Questions & Answers

I have 3.6 years of experience working in machine learning and the computer vision. And recently, I'm working on the project related to large language models where where whereby I'm working mostly on the auditing related use cases. And apart from that, I currently, I'm working as machine learning engineer at licensed medical pharmaceuticals. And apart from that, previously, I used to work at as an analyst on at Aon, where I worked mostly on the risk assessment side of things. I have deployment relate model deployment related experience as well, working on model of using AWS SageMaker and GCP vertex AI. And I have, uh, I have deployed more inferences to rest API endpoints, and I'm maintaining a microservice under my ownership, which is, uh, which actually serves the model inferences to, uh, the to the end user.

So to handle to to ensure that AI in chatbot handle various data format on PDF and Excel sheet without significant modifications, we need to standardize the data in this, uh, in in a textual format. And for that, uh, to process the PDF files, we can utilize, uh, libraries like PDF Plumber and and such live and use these libraries. And, uh, to process Excel sheet sheets, we can use pandas and such libraries. And to take the layouts and and to take the tables and other layouts from the PDF, we can utilize layout layout which is a language model, uh, which generates that, uh, uh, which OCR is the text and on the different layout, p d PDFs and generates text and converts them into text format.

So to implement real time monitoring and logging to track the performance of the AI chatbot, we can utilize, uh, several, uh, in in in we we we we can continue to monitor the key matrices such as the accuracy of the generation, the perplexity, and such, uh, such such such metrics as then put it into a continuous dashboard. We can utilize some of the, uh, cloud provider solutions, like, uh, for the monitoring, like, the CloudWatch and, uh, for monitoring and loading, like, CloudWatch and and and similar solution. The CloudWatch is on the AWS.

To end up, uh, to to to ensure the reducing of the false positive in the user, uh, recognition user intent recognition in the chatbot, I'll I'll go with the flow I I'll go with, uh, using recall as a metrics for the recognized intents. And apart from that, we'll utilize the class weights to take the negative class to have the negative class of more weight than the positive one. And so which will be irrespective. And and similar such methodologies can be utilized in the intent recognition as well. And when while training the intent recognition model, we can have uh, in the we we we we can oversample the negative intents in in in such scenarios.

To ensure that the the data analysis component of the chatbot is optimized for the performance, we can, uh, we and it scales well. We can use the, the infrastructure, uh, which is geared towards scaling, like, the, uh, the in the elastic cluster and similar things. And apart from that, we can also have, uh, we we we we can also utilized, uh, other parallel processing techniques and improve the data pipelines over that, uh, such that we take the data in the incremental load so that the data received is, uh, if that didn't we do not receive the entire data again again for such analysis rather yeah. That's all.

In a chatbot conversation, a model to predict predict user intents will be a kind of, uh, if we are going by a supervised, uh, way, then it will be a classification problem where we can use use several, uh, text classification. So it will be, uh, it will be a sequence classification problem. We can utilize several transformers model and add add such in those cases. And we can have the we can utilize 0 sort short supervised plus, uh, 0 sort approach as well. Apart from that, we also have, uh, another thing. Apart from that, we can also utilize, uh, in the model such as, uh, crossing borders to compare the similarity of other intact examples and as such.

One issue which I see with this is, uh, this is that the port values are not provided in the database connection, and usually, the database is not on the default port also the local host. So, uh, and that and the database name is also not provided. So it is taking the database, uh, and so it is not taking the appropriate database in such case. So yeah. Uh, so it might connect to a different database on the same server. And so the the so it is a ambiguity regarding this database to connect.

So in in in the Java snippet, uh, to handle exceptions, uh, in the in in in this scenario, we can have different exceptions. There might be exceptions, like, file not found or the file is of the improper form format and the text cannot be utilized. So utilizing the custom exceptions of the PDF reader library, uh, will help. So whatever the custom exceptions are might be present, say, example that, uh, for one example is the file not found exception or some other, uh, custom exceptions. So we can, uh, make it more specific and go from specific to generic exceptions in the catch block itself.

When when developing a a scalable chatbot, uh, the in in there are some some consideration like the utilizing caching as well as utilizing indexes in the database and and checking the appropriate, uh, so so so the database which which support asset properties, uh, will have so taking the concurrency into the account that what, uh, is is so there might be, uh, the the database is it should be not susceptible to, uh, the consistency or the integrity issues in case of, uh, the the concurrent executions, which will be, uh, very highly likely in the scalable chatbot because a lot of requests are being processed.

So in the case of feedback, I'll take the feed, uh, I'll take the feedback from, uh, from the user itself. So whenever the user generates, then such as just how we take in the chat GPT, uh, just how they open air text in the chat GPT, and they they took to, uh, up and down, uh, feedback. So up and thumbs up and thumbs down. And, generally, when it's, uh, we can also have multiple generations with with us so that the multiple generations, uh, users select the best out of them. And and and and user can also select whether the single generation is helpful or not. So in in this is how we can take the feedback to improve the model accuracy and go in that way.

So for maintaining the chatbot solution over time, we, uh, we we apply the CICD pipeline whereby we have the optimized image of, uh, the deployment and as well as in the in in in the say, for example, in case of AWS, we can have the ECR repositories, which, uh, which built into using the AWS code build. It it builds into uh, a a deployment build, and we can take the, uh, and we we can utilize the appropriate versioning of the builds and select the, uh, the large table, uh, versioning for the case of the production chatbot and and a dev build as well, which you you which takes that, uh, which works on the development environment whereby the in in in whereby the users can, uh, actually go in the the developers can, uh, integrate their code and test it out before the production. We can have a free production environment as well where, uh, before, uh, releasing into the into the production on similar to a production, we can utilize over the production data, uh, and check the accuracy and efficacy over the production data. Yeah.