profile-pic
Vetted Talent

Chaitanya Arora

Vetted Talent
Highly-skilled Machine Learning Engineer with over years of experience in applying data science and machine learning algorithms to solve complex business challenges. Specialized in Natural Language Processing (NLP), Deep Learning, Generative AI, LLMs and cloud computing technologies(AWS and GCP).
  • Role

    Associate Data Scientist

  • Years of Experience

    3 years

Skillsets

  • Troubleshooting
  • TensorFlow
  • Rest APIs
  • Analytical
  • Keras
  • Web Services
  • APIS
  • Algorithms
  • ETL pipelines
  • Google Cloud Platform
  • OCR
  • Python
  • GCP
  • Vertex AI
  • Random Forest
  • Cloud
  • Git
  • Scikit-learn
  • NLTK
  • LSTM
  • Transformers
  • Lambda
  • PyTorch - 3 Years
  • PyTorch - 3 Years
  • Regression Analysis
  • Statistics
  • NLP
  • Mongo DB
  • HuggingFace
  • Deep Learning
  • LLM
  • ML
  • Python - 3 Years
  • Docker
  • AWS
  • LLAMA
  • LLMs
  • SQL
  • ETL
  • AI
  • Flask
  • On

Vetted For

10Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    AI Chatbot Developer (Remote)AI Screening
  • 71%
    icon-arrow-down
  • Skills assessed :CI/CD, AI chatbot, Natural Language Processing (NLP), AWS, Azure, Docker, Google Cloud Platform, Kubernetes, machine_learning, Type Script
  • Score: 64/90

Professional Summary

3Years
  • Jun, 2023 - Present2 yr 11 months

    Machine Learning Engineer

    Liaison Medicare Pharma
  • May, 2022 - Jun, 20231 yr 1 month

    Analyst - I

    Aon
  • Sep, 2020 - Feb, 20221 yr 5 months

    Associate Data Scientist

    Syntax Edutek

Applications & Tools Known

  • icon-tool

    AWS SQS

  • icon-tool

    AWS SES

  • icon-tool

    AWS Lambda

  • icon-tool

    AWS S3

  • icon-tool

    AWS Sagemaker

  • icon-tool

    Vertex AI

  • icon-tool

    Keras

  • icon-tool

    YOLO

  • icon-tool

    OCR

  • icon-tool

    Metabase

Work History

3Years

Machine Learning Engineer

Liaison Medicare Pharma
Jun, 2023 - Present2 yr 11 months
    Implemented salesperson pitch audit mechanism, NLP system for data extraction, and forecasting model using LSTM. Developed Flask based REST APIs and microservices architecture with AWS and Vertex AI.

Analyst - I

Aon
May, 2022 - Jun, 20231 yr 1 month
    Designed ML models to predict risk scores, maintained data configuration in ETL pipelines, and implemented Langchain layer for LLM and AI use cases.

Associate Data Scientist

Syntax Edutek
Sep, 2020 - Feb, 20221 yr 5 months

Achievements

  • Implemented salesperson pitch audit mechanism with 93% accuracy
  • Reduced LLMs token cost by around 78%
  • Implemented 2+ scalable ML inference systems
  • Predicted customer risk scores with 89% accuracy
  • Live monitoring of lecture classes saving 100s of manhours
  • Increased coupon usage by 20%
  • Developed 8+ ML use cases with a variety of algorithms

Education

  • Bachelor of Technology (B.Tech.), Computer Science and Engineering

    Lovely Professional University (2021)

AI-interview Questions & Answers

I have 3.6 years of experience working in machine learning and computer vision. Recently, I'm working on a project related to large language models, where I'm working mostly on auditing-related use cases. Apart from that, I currently work as a machine learning engineer at licensed medical pharmaceuticals. In addition, I previously worked as an analyst at Aon, where I worked mostly on the risk assessment side of things. I have experience with model deployment, having worked with AWS SageMaker and GCP Vertex AI. I have deployed models to REST API endpoints, and I'm maintaining a microservice under my ownership, which serves model inferences to the end user.

So to handle data from various formats in chatbots, we need to standardize the data in a textual format. To process PDF files, we can utilize libraries like PDF Plumber. To process Excel sheets, we can use pandas and similar libraries. To extract layouts and tables from PDFs, we can utilize layout, a language model that generates text from OCR and converts different layouts into text format.

So to implement real-time monitoring and logging to track the performance of the AI chatbot, we can utilize several in-built features such as the accuracy of the generation, the perplexity, and such metrics. We can then put it into a continuous dashboard. We can utilize some of the cloud provider solutions, like CloudWatch, for the monitoring, and similar solutions like CloudWatch and other similar solutions for monitoring and logging. The CloudWatch is on the AWS.

To end up ensuring the reduction of false positives in user intent recognition in the chatbot, I'll go with the flow by using recall as a metric for recognized intents. Apart from that, we'll utilize class weights to take the negative class to have more weight than the positive one, which will be irrespective. Similar methodologies can be utilized in intent recognition as well. When training the intent recognition model, we can oversample the negative intents in such scenarios.

To ensure that the data analysis component of the chatbot is optimized for performance, we can use it to scale well. We can use the infrastructure, which is geared towards scaling, like the elastic cluster and similar things. And apart from that, we can also utilize other parallel processing techniques and improve the data pipelines so that we take the data in incremental loads, such that we do not receive the entire data again for analysis rather than that.

In a chatbot conversation, a model to predict user intents will be a kind of classification problem, if we are going by a supervised way, then it will be a classification problem where we can use several text classification techniques. So it will be a sequence classification problem. We can utilize several transformer models and add such cases. And we can have both supervised and unsupervised approaches. Apart from that, we also have another thing. Apart from that, we can also utilize in the model such as crossing borders to compare the similarity of other intact examples and as such.

One issue I see with this is that the port values are not provided in the database connection, and usually the database is not on the default port also on the local host. So, and that the database name is also not provided. So it is taking the database and it is not taking the appropriate database in such case. So it might connect to a different database on the same server. And so there is an ambiguity regarding the database to connect.

So in the Java snippet, to handle exceptions, in this scenario, we can have different exceptions. There might be exceptions, like, file not found or the file is in an improper format and the text cannot be utilized. So utilizing the custom exceptions of the PDF reader library, will help. So whatever the custom exceptions are present, for example, the file not found exception or some other custom exceptions. We can make it more specific and go from specific to generic exceptions in the catch block itself.

When developing a scalable chatbot, there are some considerations like utilizing caching as well as utilizing indexes in the database and checking the appropriate database which supports asset properties. So, taking concurrency into account, what is so there might be the database should not be susceptible to consistency or integrity issues in case of concurrent executions, which will be very highly likely in the scalable chatbot because a lot of requests are being processed.

So in the case of feedback, I'll take the feedback from the user itself. So whenever the user generates, such as just how we take in the chat GPT, just how they open-air text in the chat GPT, and they take up and down feedback. So up and thumbs up and thumbs down. And generally, when it's, we can also have multiple generations with us so that multiple generations users select the best out of them. And users can also select whether a single generation is helpful or not. So in this is how we can take the feedback to improve the model accuracy and go in that way.

For maintaining the chatbot solution over time, we apply the CICD pipeline. We apply the CICD pipeline whereby we have the optimized image of the deployment, as well as in the ECR repositories, which are built into using the AWS CodeBuild. It builds into a deployment build, and we can take the appropriate versioning of the builds and select the large table versioning for the case of the production chatbot and a dev build as well, which works on the development environment. The developers can integrate their code and test it out before releasing into the production. We can have a free production environment as well, where before releasing into the production, we can utilize over the production data and check the accuracy and efficacy over the production data.