profile-pic
Vetted Talent

Sanjiv Gupta

Vetted Talent

Experience in fine-tuning and Prompt Engineering of LLMs such as GPT-3.5, Llama-2, and Mistral including RAG models. Proven expertise in Generative AI, Langchain, OpenAI models, Llamaindex, RAG, Hugging face, and LLM Finetuning. My journey in AI started with a strong foundation in Electrical and Electronics Engineering from MIT College of Engineering, Pune, which has been instrumental in developing my analytical and problem-solving skills. With a focus on RAG and chatbot technologies, we've crafted intelligent systems that have significantly improved client interaction and service delivery. My commitment to innovation and collaborative approach has been key in delivering projects that not only meet but exceed our client expectations, fostering a culture of excellence and continuous improvement within our organization.

  • Role

    AI/ML/LLM Engineer

  • Years of Experience

    6.10 years

  • Professional Portfolio

    View here

Skillsets

  • Deep Learning - 4 Years
  • Generative AI - 2 Years
  • Automation Frameworks
  • Data Engineering
  • Data Visualization
  • Database management
  • Exploratory data analysis
  • image preprocessing
  • LLM Fine-tuning
  • Machine Learning
  • Natural Language Processing
  • object detection
  • Prompt Engineering
  • Scripting
  • Vector databases

Vetted For

12Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Machine Learning Engineer / Data Scientist (Remote)AI Screening
  • 70%
    icon-arrow-down
  • Skills assessed :Scikit-learn, spaCy, Speech Recognition, Computer Vision, Natural Language Processing (NLP), NLTK, PyTorch, speech recognition APIs, TensorFlow, AWS, machine_learning, Python
  • Score: 63/90

Professional Summary

6.10Years
  • Aug, 2019 - Present6 yr 9 months

    AI/ML/LLM Engineer

    Applied AI Consulting
  • Aug, 2018 - Jul, 2019 11 months

    Programming Analyst Trainee

    Cognizant
  • May, 2018 - Aug, 2018 3 months

    ML Engineer

    UshaiTechLabs Pvt Ltd

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    MATLAB

  • icon-tool

    Django

  • icon-tool

    Docker

  • icon-tool

    Kubernetes

  • icon-tool

    Terraform

  • icon-tool

    Elasticsearch

  • icon-tool

    Kibana

  • icon-tool

    Kafka

  • icon-tool

    RabbitMQ

  • icon-tool

    Datadog

  • icon-tool

    Argo CD

  • icon-tool

    Cucumber

  • icon-tool

    Smartsheets

  • icon-tool

    LangChain

  • icon-tool

    PostgreSQL

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    FastAPI

Work History

6.10Years

AI/ML/LLM Engineer

Applied AI Consulting
Aug, 2019 - Present6 yr 9 months
    Built a Specialized AI chatbot tailored for the QA community using Generative AI, Langchain, RAG, PGVector, Postgres DB, LLM models to facilitate knowledge sharing. Designed and Developed an innovative AI/ML Model testing platform with Synthetic Data Generation feature to test, analyze, and interpret ML models using Python, Data Augmention, DynamoDB, and OpenAI API. Debugged and implemented fixes for Data Analytics Platform using Python, Django, Docker, Kubernetes, Facebook Graph API, Twitter API, Datadog, Runscope, Argo CD, Elasticsearch, Kibana, Kafka, RabbitMQ ensuring data integrity and optimizing performance. Developed IVR call automation system using Plivo and custom trained AI models reducing call handling time by 50%, integrated AWS Transcribe with 98% transcription accuracy, and utilized OpenAI's AI language model for 90% accurate information extraction. Developed in-house SaaS product Marxeed from scratch using Python, Serverless framework, Google Custom Search Engine, Sheets API, Slides API, Linkedin API, Ritekit API and AWS services, conducting unit tests with 90% code coverage, and implementing CI/CD pipelines for deployment on multiple environments. Developed CRM application utilizing Strapi CMS, Docker build images for different environments, deployed AWS ECS Fargate cluster. Designed architecture for MS SQL Server to Postgres DB migration with 99% query conversion accuracy, wrote 200+ test cases with 95% coverage using Piggly.

Programming Analyst Trainee

Cognizant
Aug, 2018 - Jul, 2019 11 months
    Engineered a Python equivalent BDD testing framework for CRAFT, initially developed in JAVA, enhancing testing efficiency and flexibility. Automated script generation in Cucumber, decreasing test case development time by 60% and ensuring consistency and accuracy in testing procedures. Conducted training sessions for team members on using the new testing framework, ensuring smooth adoption and transition. Integrated additional functionalities into the testing framework using Nightwatch.js, such as parallel test execution and reporting, resulting in a 30% reduction in test execution time.

ML Engineer

UshaiTechLabs Pvt Ltd
May, 2018 - Aug, 2018 3 months
    Provided mentorship and guidance to over 50 Bachelors, Masters and Ph.D. students in developing projects related to Machine Learning and Computer Vision using Python and MATLAB. Applied TensorFlow, PyTorch, and Scikit-learn to develop over 30 projects encompassing a wide range of machine learning techniques, including CNNs, RNNs, LSTMs, Classification, Regression, and Price prediction. Led projects covering diverse areas such as NLP, Speech recognition, Object detection and Segmentation, Face recognition, and Sentiment analysis, providing students with exposure to various domains within machine learning and computer vision. Oversaw the entire project lifecycle, including Exploratory data analysis (EDA), Data preprocessing, Feature selection, Model Training, and Testing, ensuring students gained hands-on experience in all aspects of project development.

Achievements

  • Implemented changes according to Social Media platforms API changelog, maintaining compatibility with evolving APIs and ensuring seamless integration with social media platforms.
  • Developed and deployed an IVR call automation system for Promantra, reducing claim status retrieval time by 50%.
  • Developed and implemented a BDD Automation code generation system using OpenAI model, resulting in a 60% reduction in time spent on writing automation scripts.
  • Spearheaded the design and implementation of the synthetic data generation feature, resulting in a 30% increase in data synthesis efficiency and a 50% reduction in time required for model testing setup.
  • Architect and develop the QualityX Chatbot using LangChain, seamlessly integrating with existing QA tools, resulting in a 40% reduction in average query response time.
  • Spearheaded the creation and management of AWS infrastructure using Serverless architecture, resulting in a 30% reduction in infrastructure costs and a 50% improvement in scalability.
  • Automated Smartsheets data updates, reducing manual effort by 80% and ensuring data accuracy and consistency.

Major Projects

8Projects

QualityX

Applied AI Consulting
Oct, 2023 - Dec, 2023 2 months
    • Architect and develop the QualityX Chatbot using LangChain, seamlessly integrating with existing QA tools, resulting in a 40% reduction in average query response time.
    • Develop a robust system for user contributions to the knowledge base, ensuring quality and relevance.
    • Implement secure storage for user interaction history, facilitating learning from past queries.
    • Design an intuitive onboarding process and create interactive tutorials to enhance user experience, leading to a 50% increase in user
    • Engagement within the first month of launch.
    • Monitor user feedback and analytics to enhance Chatbot performance and user satisfaction.

AiTest: AI Model Testing and Analysis

Applied AI Consulting
Aug, 2023 - Dec, 2023 4 months
    • Spearheaded the design and implementation of the synthetic data generation feature, resulting in a 30% increase in data synthesis efficiency and a 50% reduction in time required for model testing setup.
    • Integrated the platform with the ability to accept both model files and model endpoints, providing users with 70% more flexibility in testing models across various environments.
    • Implemented various testing modules, resulting in a 20% improvement in overall model testing coverage and a 40% increase in the detection of potential model vulnerabilities.
    • Developed data analysis tools to generate comprehensive graphs and visualizations for each test, providing users with actionable insights into their model's performance and behavior.

AiTest: Automation Copilot

Applied AI Consulting
May, 2023 - Aug, 2023 3 months
    • Developed and implemented a BDD Automation code generation system using OpenAI model, resulting in a 60% reduction in time spent on writing automation scripts.
    • Designed and implemented a solution enabling users to upload Selenium IDE recordings and automatically generates Feature files, Step Definitions, and Page Objects, reducing manual script writing time by 70%.
    • Utilized OpenAI's model to automate the generation of code artifacts, improving efficiency by 50% and reducing manual effort in script development.

Promantra

Applied AI Consulting
Feb, 2023 - Jul, 2023 5 months
    • Developed and deployed an IVR call automation system for Promantra, reducing claim status retrieval time by 50%.
    • Integrated AWS Transcribe for call audio transcription, achieving an accuracy rate of 95% in converting audio to text.
    • Implemented a dynamic dialing mechanism for accurate data retrieval and collaborated on system design and integration.

Analec DB Migration

Applied AI Consulting
Aug, 2022 - Jan, 2023 5 months
    • Designed and implemented the architecture for migrating MS SQL Server to PostgreSQL databases, ensuring a smooth transition process and minimizing downtime.
    • Successfully converted over 500 MS SQL server queries to PostgreSQL queries, achieving a 100% conversion rate and maintaining data integrity.
    • Developed comprehensive test cases for migrated queries, covering 100% of query functionalities and ensuring accuracy and reliability.

Skyword-Trackmaven

Applied AI Consulting
Apr, 2020 - Jan, 20232 yr 9 months

    Implemented changes according to Social Media platforms API changelog, maintaining compatibility with evolving APIs and ensuring seamless integration with social media platforms.

    Monitored Datadog dashboard for system performance and discrepancies, ensuring 99.9% uptime and proactively identifying potential issues.

    Automated Smartsheets data updates, reducing manual effort by 80% and ensuring data accuracy and consistency.

VedasLabs

Applied AI Consulting
Apr, 2020 - Jan, 20232 yr 9 months
    • Engineered a CRM application tailored for a social venture funding platform, enhancing user engagement and increasing platform adoption by 40%.
    • Developed Strapi deployment code, streamlining the deployment process and reducing deployment time by 50%.
    • Created Docker build images for different environments, improving development and testing efficiency by 30%.
    • Deployed AWS ECS Fargate cluster, achieving 99.9% uptime and ensuring high availability of the platform.

Marxeed

Applied AI Consulting
Oct, 2019 - Mar, 2020 5 months
    • Spearheaded the creation and management of AWS infrastructure using Serverless architecture, resulting in a 30% reduction in infrastructure costs and a 50% improvement in scalability.
    • Led the development of Python codebase for Marxeed, implementing robust and efficient algorithms to generate curated content for marketing campaigns, resulting in a 40% increase in content generation speed.
    • Orchestrated the implementation of CI/CD pipelines for seamless deployment across multiple environments, reducing deployment time by 60% and enhancing overall productivity.

Education

  • Bachelor of Engineering in Electronics and Telecommunications

    MIT College of Engineering Pune (2018)
  • Diploma in Electronics and Telecommunications

    Cusrow Wadia Institute of Technology Pune (2015)
  • SSC

    Indian Education Society School Pune (2012)

Certifications

  • Aws certified developer - associate

    AWS
  • Aws machine learning - speciality

  • Aws machine learning foundations

  • Improving deep neural networks

  • Exploratory data analysis

  • Introduction to self-driving cars

  • Neural networks and deep learning

Interests

  • Badminton
  • Cricket
  • Trekking
  • Travelling
  • AI-interview Questions & Answers

    K. Let me understand more about the background. Yeah. So myself and Jim, I started working at Applied AI from 2019 in August. So, initially, I was working on Python and AWS. So where, initially, my work was to make a desktop app as a serverless app. So there, I used AWS and the serverless framework to convert this desktop app into a serverless app where it would have multiple APIs and use serverless AWS services such as S3, DynamoDB, Lambda, SQS, and other things. So, yeah, that was a product. Later on, I had worked on serverless and serverless APIs and AWS. So, I created an app or a software where the user can generate APIs for a microservice. So, that I call a serverless API generator. So, that was completely developed by me. Later on, I was placed on a custom project where a data analytics platform was used. The project was all about data analytics where the customer can track the progress of their social media handles as well as their competitors. So, we used to collect the analytics from Facebook, Instagram, Twitter, LinkedIn. All these analytics were shown to the customer at a UI. So, yeah, keeping that app up and running, if any issue occurs, so filling the bad data and all those things, that was handled by me. Later on, then I was a part of a project where I had used ChargeGPD OpenAPIs for a health analytics platform where the company was trying to build an IVR agent IVR board where the board will call an IVR number. It will press in the numbers. Like, it will listen to the call, press in their numbers automatically, and get all the data required for a particular patient. Their AWS transcript was used for speech to text, then Comprehend was used to identify the entities. After the advent of shared GPT, so mostly I had worked on some internal product products where I built a machine learning testing platform where users can come in with their ML models. They can upload a sample data. From that data, synthetic data is generated for testing the robustness of the model, the exploratory test of the model, stress test of the model. So, in different categories, we used to generate the synthetic data. Then, this synthetic data can be used to test their ML model, before going back into production, and the test thing is to use testing all these generated data against that model, getting a report of that along with all these metrics like and other test-related metrics. Also, I generated a chatbot for internal company FAQs so that the user previously, they had a turnaround time that was too much for getting some issues resolved, which can be directly found from the website data or some other relevant materials. So, that material was integrated into a RAG app where the sources have been defined. Like, it can be a website. It can be a PDF, YouTube videos. From that, a chatbot was developed, and the user can come in with their questions, ask the questions to this chatbot, and they can get the answer. Also, I had designed an automated test case generation platform where the user can upload a Selenium recording. And from that recording, we have to generate the automation test for that. So, yeah, that's pretty much.

    So just another one implementing test for a fast API service. That interface we can learn. So, yeah, FastAPI, I want to use it in one of the projects, I had used FastAPI for, like it was recently for one of the clients where we had to design an Excel to DB service where there were, like, too many Excels, and they wanted that data to be as part of the DB so that data can be queried. So, I'd used FastAPI for designing the APIs for that where they can come in with different queries that can be asked to the API, and the answer to those queries was from those Excel sheets which have been converted to a database format, like designing the scheme and all that. So, yeah, so when we want that FastAPI service to be interfaced with elements, so basically, the FastAPI would be exposing some APIs, for example, for a chatbot where a user can come in and ask the question to that API. So that API would be, like, interesting with the LLM. The integration of FastAPI to this LLM can use langchain as a backend service where FastAPI interacts with the langchain methods and functions. Langchain is, like, widely used nowadays for the chatbots. So building the chatbots, we're using a rack-based approach. Like, with any lens, it supports mostly all the top-notch lenses. Maybe it can be an open-air or any open-source lens as well. So I think Lantern can do the integration for that.

    Provost, this thing started before you're adding the coherence of responses from an entire system. Okay. So what I understand is, dialogue system, maybe I can assume it as a chatbot, powered by Chat GPT. So where the user is coming in, there's a conversation being built where the user is asking a question, getting an answer, then asking a follow-up question. Right? So for evaluating the governance of a response, basically, for any LLM giving out responses, it can hallucinate at times, or we need to have a testing strategy for the output so that the output is tested against all the things. It should not be biased; it should be fair enough. All these things need to be tested. Also, the current response is not current or not. For this, like, there are multiple libraries. One is the events library for LLMs, which gives you whether the response is biased or not, its fairness coefficient, and how much it is current, measuring the similarity between the question asked and the response.

    In what ways can multithreading be leveraged in Python DSLM application to improve performance? In what ways can multithreading be leveraged in a Python based LN application to improve performance? So, if a Python based LN application is there, maybe multithreading can be mostly used, like, if for that LN application, if there are multiple users coming in at a time, the questions being asked will be at a similar time. There can be multiple questions being asked to that LLM. We can apply threading there where multiple threads are invoking the LLM model in the backend model at the same time, getting the questions from all users at once. So, the answers to all those questions are being faced at a time. That's one thing for multithreading. And that's the most important part for the user experience, where they would not have to wait too much for that. That would be one of the primary focuses where we can use multithreading. Apart from that, I think, for also getting the data from when new data comes in, if you want to reindex the LM or if it's a rack-based pipeline, if there are new resources coming in or we have to train the LLM again once. So, for data from different sources or for multiple clients if it's deployed, that can be done via multithreading, where the training or the indexing part is using multithreading to get the data from multiple sources at once. And, in that way, we can save time instead of having a sequential way of getting all the data or indexing the data and saving that into a vector database.

    This is a protocol you would implement to internally live update the interval of prompt performance metrics. Live updating the leaderboard off prompt. And so, we can have a rabbitmq queue, which would work, where any messaging queue kind of service or protocol we can use. So if it's in AWS, we can have SQS. Right? SQS or open source messaging queues like Kafka or RabbitMQ, where the application continuously sends the prompt metrics to this queue, and this queue is integrated with some leaderboard. So, one of the projects I did involved integrating a Kafka messaging queue with Datadog. Datadog was the dashboard where all these metrics were current, for the data analytics platform from Facebook with all the different social media kinds. The metrics are continuously being sent to that Datadog dashboard. So, that dashboard was a central point where we keep an eye on it. If it drops under a threshold, like 97, 98%, the color of that box becomes red, so we can identify and also get an email when it drops so we can then debug what's the issue around or just that retrieval or service. So, we can have this RabbitMQ or Kafka messaging queue, which will integrate it with the application or model, which will continuously send the prompt metrics to the Datadog dashboard, where this is live updating.

    Detect to influence those caching mechanisms Python or frequent elements. Yeah. So the chatbot that I had built in that, they used. So it had different mechanisms apart from this getting the answers from the vector db's. So it also had an option whether to save it in the history or not. So yeah. So when that history is being shared and the question and answer that are being used asked, we have to keep the user history so that next time he comes in, he has that history where like, what are the questions they ask, similar to chat Liberty. So that is still being shared into the DB along with the user details, the question, and the response to that he had got. And yes. So server-side, mostly I work in AWS. So for AWS also, when server-side caching is being implemented, we see. So the application is mostly served using a CloudFront service. So CloudFront is a service by AWS where it has multiple edge locations so that a guy sitting in Mumbai, even if the server is in North Virginia, the guy asking for Mumbai would have less latency. So for that, the caching is being implemented in the CloudFront itself so that next time the user asks a question, it would be served from its nearest edge location and not from the original server location. So that is one thing. And yeah, I think then apart from that, one is saving as a persistent state in a database or using a CloudFront for caching.

    We think there is another important item in the email description. What we are using for receiving and completing in this group. Okay. Pipeline, it defines this very main image part. Yeah. So, the pipeline has been defined, and the image part is being passed on to the function, and that function is invoking the pipeline along with the image part. But the max length is 50, so that is one parameter where we are defining so that the output description can be not less than 50 characters. So that is the reason where some of the images we can receive complete descriptions. Like, if the description is too long, though, that will be clipped or it will be cut around the 50 character length. So, that is what I found. And so, how will we debug this is to set a trial approach, kindly, like, max length. So we'll see what kind of images we want to test upon. So based on the images or the complete dataset, we can define a max length. That would suffice mostly all of the images, which surely is not 50 gigawatt descriptions. In 50 characters, it's too much less. So some standard length for a description or if you want each description to be lengthy. So in that way, we can define that. So yeah.

    Generate prompts. So the function is in the prompt. A task is sent in along with the text. So for this example, we are sending task s 2, and where is the Eiffel Tower? Prompt is a little prompt of text, but the length of prompt is 2 we are sending into. So prompts of 2 will be out of range because Python is 0-based indexing. So if you want the second summarizing task, then it should be 1. If you want English to French, it should be 0. But 2 is causing the error at prompt physical task. So, that's a range error exception, out of range exception. So that is what is causing the issue. So I can see where the question is, where is the filter? I don't think there is too much to summarize here. So if you want to ask, like, translate English to French, it should be 0. Even if you want to summarize, it should be 1.

    Okay. Now the procedure to transition a synchronous LMPI to a synchronous LMPI contracting integrity is as follows. So synchronize LLM API. Assuming this will be mostly in Python, it is using Python's built-in asynchronous invocations. So, the usual synchronous calls can be converted to asynchronous by using the async and await keywords. FastAPI also has asynchronous calls, but that should not be an issue. That can be done as well, even if we are using Flask or Django. So, from the core perspective, you can do that if it's because I can see it's from the API contract integrity. The API's structure or the text tag should not be changed or retained. That's why I would not be thinking from a different perspective, completely changing in the tech for a better asynchronous call. But, sync can await from the Python definitions, so we can use that for the synchronous architecture.

    So I mostly go for serverless apps. So assuming I would deploy this on AWS, right, once the app is ready, like, code-wise, all the things are ready. The front end is ready. Back end is ready. So the front end core will be deployed. So we say suppose the core is residing somewhere, maybe it can be as a package in an AWS Lambda function or some other way. So the URL will be served using Route 53 for routing the URL to the actual app. Then, from Route 53, it will go to the DNS for DNS distribution. So distribution, we would use CloudFront. Then from CloudFront, it would go to the application load balancer so that if a high number of users come in, the application won't be disrupted too much. So from the ground up, the request would go to the application load balancer. From there, we have the target group specified. So the code, like, the back end code for this app will be residing somewhere as a Docker image, and it will be deployed on ECS because ECS is also serverless and so that we are not worried about scaling things. On ECS, the ECS is pointed by the ALB. From ALB, the request is going to ECS. So a large number of requests are handled by ALB and ECS. So ECS is working fine. And from the UI, as I said, it will go to Route 53, then CloudFront. CloudFront is basically responsible for distributing the UI inference from the UI side, so that the requests coming in are cached at the user's nearest edge locations, right? And for the LLM inference, the LLM model can be hosted as an inference endpoint on, say, maybe, like, if the model is trained using SageMaker. SageMaker has several endpoints, which can auto-scale based on the number of requests coming in. So the latency is not a problem there. So we can use serverless inference for this. And I would see, like, if edge computing is suggesting small devices for edge computing. So for that, we would mostly use small LLM models, which are not big in size but not too much at the cost of quality, which can be deployed on edge computing. And if we want to train that model, we would use the LoRa techniques and PFT, like, parameter efficient fine tuning for that so that the number of parameters are very less, in size and not too much, so that they can be deployed on an edge computing device.

    Optimizing an existing code base to streamline interactions with DALL-E involves several steps. For an existing Python code base, one approach is to integrate the OpenAI APIs into the existing functionalities while adding the necessary interactions for image generation. Existing code base has functionalities defined, which it is currently working, and if you want to add interactions with DALL-E for image generation. For DALL-E, we have the OpenAI APIs, which have been integrated into the Python core base for interactions. If the user asks questions, the Python code base calls the OpenAI API along with the description for the image and other parameters to generate an image. Until the time we get the image from DALL-E, we can have a loading bar or some UI element to show the user that the image is being generated.