Machine Learning Engineer
Liaison Medicare PharmaAnalyst - I
AonAssociate Data Scientist
Syntax EdutekAWS SQS
AWS SES
AWS Lambda
AWS S3
AWS Sagemaker
Vertex AI
Keras
YOLO
OCR
Metabase
I have 3.6 years of experience working in machine learning and the computer vision. And recently, I'm working on the project related to large language models where where whereby I'm working mostly on the auditing related use cases. And apart from that, I currently, I'm working as machine learning engineer at licensed medical pharmaceuticals. And apart from that, previously, I used to work at as an analyst on at Aon, where I worked mostly on the risk assessment side of things. I have deployment relate model deployment related experience as well, working on model of using AWS SageMaker and GCP vertex AI. And I have, uh, I have deployed more inferences to rest API endpoints, and I'm maintaining a microservice under my ownership, which is, uh, which actually serves the model inferences to, uh, the to the end user.
So to handle to to ensure that AI in chatbot handle various data format on PDF and Excel sheet without significant modifications, we need to standardize the data in this, uh, in in a textual format. And for that, uh, to process the PDF files, we can utilize, uh, libraries like PDF Plumber and and such live and use these libraries. And, uh, to process Excel sheet sheets, we can use pandas and such libraries. And to take the layouts and and to take the tables and other layouts from the PDF, we can utilize layout layout which is a language model, uh, which generates that, uh, uh, which OCR is the text and on the different layout, p d PDFs and generates text and converts them into text format.
So to implement real time monitoring and logging to track the performance of the AI chatbot, we can utilize, uh, several, uh, in in in we we we we can continue to monitor the key matrices such as the accuracy of the generation, the perplexity, and such, uh, such such such metrics as then put it into a continuous dashboard. We can utilize some of the, uh, cloud provider solutions, like, uh, for the monitoring, like, the CloudWatch and, uh, for monitoring and loading, like, CloudWatch and and and similar solution. The CloudWatch is on the AWS.
To end up, uh, to to to ensure the reducing of the false positive in the user, uh, recognition user intent recognition in the chatbot, I'll I'll go with the flow I I'll go with, uh, using recall as a metrics for the recognized intents. And apart from that, we'll utilize the class weights to take the negative class to have the negative class of more weight than the positive one. And so which will be irrespective. And and similar such methodologies can be utilized in the intent recognition as well. And when while training the intent recognition model, we can have uh, in the we we we we can oversample the negative intents in in in such scenarios.
To ensure that the the data analysis component of the chatbot is optimized for the performance, we can, uh, we and it scales well. We can use the, the infrastructure, uh, which is geared towards scaling, like, the, uh, the in the elastic cluster and similar things. And apart from that, we can also have, uh, we we we we can also utilized, uh, other parallel processing techniques and improve the data pipelines over that, uh, such that we take the data in the incremental load so that the data received is, uh, if that didn't we do not receive the entire data again again for such analysis rather yeah. That's all.
In a chatbot conversation, a model to predict predict user intents will be a kind of, uh, if we are going by a supervised, uh, way, then it will be a classification problem where we can use use several, uh, text classification. So it will be, uh, it will be a sequence classification problem. We can utilize several transformers model and add add such in those cases. And we can have the we can utilize 0 sort short supervised plus, uh, 0 sort approach as well. Apart from that, we also have, uh, another thing. Apart from that, we can also utilize, uh, in the model such as, uh, crossing borders to compare the similarity of other intact examples and as such.
One issue which I see with this is, uh, this is that the port values are not provided in the database connection, and usually, the database is not on the default port also the local host. So, uh, and that and the database name is also not provided. So it is taking the database, uh, and so it is not taking the appropriate database in such case. So yeah. Uh, so it might connect to a different database on the same server. And so the the so it is a ambiguity regarding this database to connect.
So in in in the Java snippet, uh, to handle exceptions, uh, in the in in in this scenario, we can have different exceptions. There might be exceptions, like, file not found or the file is of the improper form format and the text cannot be utilized. So utilizing the custom exceptions of the PDF reader library, uh, will help. So whatever the custom exceptions are might be present, say, example that, uh, for one example is the file not found exception or some other, uh, custom exceptions. So we can, uh, make it more specific and go from specific to generic exceptions in the catch block itself.
When when developing a a scalable chatbot, uh, the in in there are some some consideration like the utilizing caching as well as utilizing indexes in the database and and checking the appropriate, uh, so so so the database which which support asset properties, uh, will have so taking the concurrency into the account that what, uh, is is so there might be, uh, the the database is it should be not susceptible to, uh, the consistency or the integrity issues in case of, uh, the the concurrent executions, which will be, uh, very highly likely in the scalable chatbot because a lot of requests are being processed.
So in the case of feedback, I'll take the feed, uh, I'll take the feedback from, uh, from the user itself. So whenever the user generates, then such as just how we take in the chat GPT, uh, just how they open air text in the chat GPT, and they they took to, uh, up and down, uh, feedback. So up and thumbs up and thumbs down. And, generally, when it's, uh, we can also have multiple generations with with us so that the multiple generations, uh, users select the best out of them. And and and and user can also select whether the single generation is helpful or not. So in in this is how we can take the feedback to improve the model accuracy and go in that way.
So for maintaining the chatbot solution over time, we, uh, we we apply the CICD pipeline whereby we have the optimized image of, uh, the deployment and as well as in the in in in the say, for example, in case of AWS, we can have the ECR repositories, which, uh, which built into using the AWS code build. It it builds into uh, a a deployment build, and we can take the, uh, and we we can utilize the appropriate versioning of the builds and select the, uh, the large table, uh, versioning for the case of the production chatbot and and a dev build as well, which you you which takes that, uh, which works on the development environment whereby the in in in whereby the users can, uh, actually go in the the developers can, uh, integrate their code and test it out before the production. We can have a free production environment as well where, uh, before, uh, releasing into the into the production on similar to a production, we can utilize over the production data, uh, and check the accuracy and efficacy over the production data. Yeah.