profile-pic
Vetted Talent

Vignan Malyala

Vetted Talent

Principal Data Scientist/ Head of AI / Mentor with vast experience in building AI use-cases from scratch and deploying them to production. Amazing experience in GenAi, LLm fine-tuning , RAG, vector databases, NLP, Machine learning, Deep learning, transfer learning, working with LLM, MLOPS, deployment using aws, airflow, data bricks, pyspark, pipelining, containerization. Effective and proactive communicator with experience in leading teams and projects. Expertise in Computer Vision for OCR related information extraction from images, pdf parser, XML parser, box detection, entity

detection and recognition, Data Mining, Data tagging, Data Analysis, Feature Selection & Model Selection, Model Building, Model Validation, Model threshold validation, log analysis.

  • Role

    Gen AI Engineer Manager (Delivery Lead)

  • Years of Experience

    11.2 years

  • Professional Portfolio

    View here

Skillsets

  • Naïve Bayes
  • PyTorch
  • Python
  • PySpark
  • Pinecone
  • PCA
  • Paddle ocr
  • OpenCV
  • OCR
  • NLTK
  • NLP
  • Neo4j
  • rag
  • Naive Bayes
  • MySQL
  • MS SQL
  • MongoDB
  • MLFlow
  • LSTM
  • LLM
  • Linux
  • LightGBM
  • Layoutlm
  • SQL
  • XgBoost
  • Word2Vec
  • Wmd
  • Weaviate
  • Ubuntu
  • Transformer
  • TensorFlow
  • Tableau
  • T5
  • SVM
  • LangGraph
  • spaCy
  • Solr cloud
  • Solr
  • Sklearn
  • Scrapy
  • Scikit-learn
  • Sagemaker
  • Rnn
  • Regression
  • Random Forest
  • vector search
  • Data Visualization
  • Data Engineering
  • CRF
  • Crawlera
  • Cnn
  • BERT
  • Azure
  • AWS
  • Athena
  • ANN
  • Airflow
  • Dependency parser
  • text generation
  • semantic search
  • Recommendation Systems
  • Multi-Agent Systems
  • Machine Learning
  • Information Extraction
  • Generative AI
  • Data Science
  • data augmentation
  • Prompt Engineering - 2 Years
  • GPT-3
  • LangChain
  • kNN
  • Kmeans
  • Keras
  • Jina
  • HuggingFace
  • Hierarchical clustering
  • Haystack
  • Gunicorn
  • GPT-4
  • Deep Learning - 7 Years
  • Gensim
  • Flask
  • Fastext
  • FastAPI
  • ensemble learning
  • EMR
  • ELK
  • Document classification
  • Docker
  • Doc2vec

Vetted For

18Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Senior Generative AI EngineerAI Screening
  • 59%
    icon-arrow-down
  • Skills assessed :BERT, Collaboration, Data Engineering, Excellent Communication, GNN, GPT-2, graphs, Large Language Models, Natural Language Processing, Sagemaker, Deep Learning, neural network architectures, PyTorch, TensorFlow, machine_learning, Problem Solving Attitude, Python, Vertex AI
  • Score: 59/100

Professional Summary

11.2Years
  • Sep, 2023 - Feb, 2024 5 months

    Principal AI Consultant & Advisory (Part Time)

    OrthoQuant
  • Feb, 2023 - Dec, 2023 10 months

    Lead Data Science (Contract)

    The Weather Channel
  • Aug, 2022 - Feb, 2023 6 months

    Senior Applied AI Engineer

    Work Fusion
  • Feb, 2018 - Jun, 2018 4 months

    NLP Scientist

    Senseforth AI Research
  • Jun, 2018 - Oct, 2018 4 months

    NLP Scientist

    Theatro Labs
  • Oct, 2018 - Aug, 20223 yr 10 months

    Principal Data Scientist Head of Data Science

    Oorwin Labs
  • Jun, 2015 - Feb, 20182 yr 8 months

    Senior Software Engineer - Data Scientist

    Infosys
  • Co-founder & CTO

    Lumino AI
  • Consultant AI Lead (Contract)

    MezmerMedia

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    AWS (Amazon Web Services)

  • icon-tool

    ML

  • icon-tool

    NLP

  • icon-tool

    OCR

  • icon-tool

    Deep Learning

  • icon-tool

    Business Analytics

  • icon-tool

    DevOps

  • icon-tool

    Computer Vision

  • icon-tool

    Artificial Intelligence

  • icon-tool

    Data Visualization

  • icon-tool

    Generative AI

  • icon-tool

    Docker

  • icon-tool

    Azure

  • icon-tool

    AWS

  • icon-tool

    Tableau

  • icon-tool

    GraphQL

  • icon-tool

    Flask

  • icon-tool

    Gunicorn

  • icon-tool

    Solr

  • icon-tool

    Scrapy

  • icon-tool

    GCP

  • icon-tool

    Airflow

  • icon-tool

    MLFlow

  • icon-tool

    Haystack

  • icon-tool

    LangChain

  • icon-tool

    weaviate

  • icon-tool

    GPU

Work History

11.2Years

Principal AI Consultant & Advisory (Part Time)

OrthoQuant
Sep, 2023 - Feb, 2024 5 months
    Led a team of 4 AI developers. Fine-tuned multiple LLMs on scale with multi-GPUs. Developed Gen AI based applications including custom LLM model finetuning, RAG based generation, extractive QA search, semantic search application, large data migration pipelines. Facilitated regular progress reviews, presenting structured insights and improvement suggestions based on qualitative feedback analysis from internal users.

Lead Data Science (Contract)

The Weather Channel
Feb, 2023 - Dec, 2023 10 months
    Identified first party audience on the platform based on different health predictions and classified users based on Social determinants for web, android & IOS using app usage & behavioural data. Worked on large scale data modelling of around 400 features and 600M records. Built Pyspark, EMR, Sagemaker pipelines, python, AWS Athena, gin configs, etc. Predicted users on Home owners, breast cancer, business travelers, Psoriasis, Asthma based on app usage data. Led the team of 3 working on SDoh use-cases. Applied Generative AI on Weather article checks with multiple data points. Built a Vector search platform for stakeholders that ingests data, finds right videos and articles based on search inputs and maps it to right tags.

Senior Applied AI Engineer

Work Fusion
Aug, 2022 - Feb, 2023 6 months
    Implemented Table Detection and Table Structure Detection on banking documents using CascadeTabNet, table transformer and LayoutLM. Implemented Native PDF Extraction using fitz and paddle ocr. Generated models with 94% accuracy on bank documents equivalent to 3rd party APIs. Exposure to Active Learning, MLOps.

Principal Data Scientist Head of Data Science

Oorwin Labs
Oct, 2018 - Aug, 20223 yr 10 months
    Built Resume Parser & JD parser by training India, US and Singapore based resumes/jds. Achieved 84% F1-score on combined parsed fields. Built pre-screening chatbot, job scraping engine and analytics from over 65 job boards. Product Development & Project Management (ATS Staffing & Recruitment Analytics, HR Analytics, CRM Analytics). Team Building, Team Training. Analyzing Requirements by coordinating with clients and Product managers. AI models & server Deployment, maintenance. Dockerization of product services. Research: Computer Vision based Video Interviews via deepface. Data Pipelines: Client Data Migration Automation, Airflow data pipeline for data processing, MLFlow for continuous training. Projects: Resume Parser, Job Description Parser, Email Parser, Document Parser, LCA Parser. Implemented Keyword Search and Semantic Search for Candidate Ranking System. Built Scraping Engines for jobs, candidate profiles, recruitment data/news.

NLP Scientist

Theatro Labs
Jun, 2018 - Oct, 2018 4 months
    Worked on Audio codec analysis (G711, Speex, Opus, AMR, Flac) for ASR engine. Worked on NLP chatbot using Google Dialogflow & RASA. Implemented streaming audio and integrated with product using GRPC CPP / Python Application. Text mining / analysis using Word2vec model, CRF model, LSTM - CNN models.

NLP Scientist

Senseforth AI Research
Feb, 2018 - Jun, 2018 4 months
    Built framework and pipelines for training custom chatbot. Developed Chatbots using NLP / NLU - GATE text engineering. Extracting intents/entities from text, scoring, synonym module, antonym module, time and date extraction module, phrase extraction modules. Integrated all to chatbot framework. Worked for Banking, Telecom, IT-Support, Finance domains. Implemented MIS Chatbot for finance related queries.

Senior Software Engineer - Data Scientist

Infosys
Jun, 2015 - Feb, 20182 yr 8 months
    Worked on Candidate Rankings and Candidate Comparison use case using Machine learning Random Forest. Created Tableau visualization. Worked on clustering use-case for telematic data, fraudulent claim predictions, client subscription analysis, predictive analytics. Worked on telecom data, Email Sentimental Analysis, Chiller plants maintenance data using Linear Regression Analysis, Exploratory Analysis, Association Analysis.

Co-founder & CTO

Lumino AI
    Handled multiple clients in Europe and USA in delivering use-cases in Gen AI Space. Delivered multi-agent orchestration platforms for due diligence with massive data with a copilot and dashboard. Built advanced RAG chatbots, automated report maker agents, doc updator agents, custom voice ASR/TTS, STT models for clients in different dialects. Designed a GenAI-powered audit copilot concept for logistics and field-service operations to analyze delivery, route, ETA, SLA, driver and support-ticket data. Combined structured analytics, business rules, vector search, RAG and LLM reasoning to generate grounded audit-style findings and recommendations.

Consultant AI Lead (Contract)

MezmerMedia
    Domain based Article Generation for Sports & betting companies based on brand style & language. Used Generative AI for custom fine-tuning on styles and sport scenarios. Built LLM chatbots, recruiter chatbot, hardware faucet stores software with RAG. Built article generation software with LLMs for sports media companies. Built recursive candidate validation software for recruiter analytics involving GenAI and scraping. Built automatic SRS document maker, refiner, troubleshooter and Launch first estimator with LLMs. Fine-tuned custom LLMs on custom data for text to sql queries, code retriever, recipe maker, cricket analyst with PEFT. Implemented customer support dialogue chatbot on custom data. Built ad recommendation pipelines and models on big data.

Achievements

  • Oorwin Awards - Awarded for building NLP Parsers effectively in short span with atmost metrics that helped company reduce heavy costs
  • Data Science Mentor & Industry Tutor Mentored & Taught around batches (9 month courses) on weekends in Upgrad & Great Learning out of interest in teaching subject rightly AI advisory Worked as AI advisory for startups Spiritualist Have great interest in philiosphy and service for higher purpose.
  • Promoted to Head of Data Science for building AI platform, research & strategy
  • Going above & beyond award National Talent Search Exam - NTSE Achieved State wide 3rd rank in NTSE Exam and qualified Nationals
  • Awarded for building NLP Parsers effectively in short time
  • Promoted to Head of Data Science for building AI platform
  • State wide 3rd rank in National Talent Search Exam (NTSE)

Major Projects

1Projects

Non-Invasive Infrared Blood Glucose Monitoring Project

    Awarded Best Project Recognition at SRM University for developing a non-invasive infrared blood glucose monitoring device.

Education

  • MBA/PGDM (PART-TIME)

    Aegis School of Data Science
  • B Tech/ BE: Engineering

    SRM University (2015)

Certifications

  • Courses & Certifications

  • Product management from institute of product leadership (ipl)

  • Blockchain - advanced distributed ledger technology from iiit hyderabad

Interests

  • Watching Movies
  • Driving
  • Games
  • AI-interview Questions & Answers

    Okay, so this is Sai, Vignan Mayala. I have around 8.5 years of experience in the field of data science. I'm very good at working in real-world applications. I have developed products from scratch. I use Python. I use machine learning, deep learning, NLP, and generative AI. I have led teams. I have good experience, even deploying the models and architecture to production. Good at understanding products and good at deploying and making them to production. So, that's kind of my background. I worked in Infosys, and a couple of startups like Sensport and Theatro. And then I worked at Urwin. I had a long stint of four years at Urwin. As a core team member, I built AI platforms and pipelines from scratch. I work with AWS. I work with Hugging Face, Thomas, I'm very good at working with OpenAI and generative ARAG models, even the latest Llama models. I know how to integrate them, how to deploy them, and how to use them on AWS Fargate, AWS Bedrock. So, I'm good at understanding all the end-to-end architecture, have good experience, and very good at doing hands-on work, as well as leadership. That's my background. Thank you.

    So the selection of loss function purely depends on the use case. It's based on the regression, or binary entropy, or categorical entropy, or maximum loss function, depending on the use case. And, suppose I take an example of SVM. We go with the maximum loss function or margin loss function, which is finding the nearest point to the hyperplanes. It depends on the use case in deep learning techniques, how to or what to use in the deep learning algorithm, the appropriate loss function. We can even trigger our own loss functions. I used raw metrics to get the loss functions. And even the deep learning, I used many other models, like categorical loss of entropy. I used binary loss entropy. I used even skewing loss entropy. So I used a lot of new techniques in the loss implementation also. That's a very good implementation from my perspective.

    Yeah, so to train my model on a large dataset, I would primarily use some GPUs or a high RAM. I mean, obviously, using a cloud infrastructure like AWS is also an option. And the techniques I use for a large model are basically setting up the hidden layer, the front layer, hidden layers, and the end layer in setting the deep learning architecture. Then, while implementing the architecture, I need to make sure the data is batch normalized so I can implement some best practices, including dropout to randomly drop out nodes to make it learn better, and applying some saturation functions, such as ReLU. So these are all different techniques in deep learning that I would apply to train on large models because finding patterns is very important in large datasets. Applying all these techniques, like normalization, saturation, and the ReLU technique, or even increasing the number of nodes or layers, all these are kind of techniques to enhance the learning of the model. And, obviously, the loss function and the optimizer I use also have an effect on training the model. From the computer, I would require good RAM to do it, and I would not directly start with the larger dataset. I would take some samples from it, strategically implementing some algorithm to get the right samples, and then implement the model and check if it's really working with my metrics. If it's working very well for the small dataset, at least for a little minimum of the p-value, then I can go ahead and load the larger dataset. So it's a bare minimum to check on that.

    Do continuously implement continuously the implementation of an NLP model with incoming data. So, you would obviously implement something like active learning. Active learning is a technique where we continuously train an NLP model, where the data is incoming, we put some threshold. Suppose I'm doing some NER in an IP, so entity definition in NLP. So, whenever new data comes, we try to classify it NNEA, but we put a threshold or and we make something like an entity classifier. We make a submodel to classify it in real-time. So, if the thresholds are at the bare minimum or in the borders, we would not take that in value and put it in the human feedback or reinforcement learning reward model feedback. So, there are multiple techniques again, reward model or human feedback or even some threshold-based understanding, so rule-based understandings. All these things can be applied to those NER values that are not coming into the detections, which are in the border, which are far away from the probability of predictions. So, these values could be continuously added to the model and MLOps engine, which is also part of my active learning technique. So, it will continuously feed the data, train it. If required, it'll take out the human loop or a RHL or Reinforcement model or some rules. Right? So, I can use rules. I can use rewards. I can use human feedback. And all these things are implemented when the prediction is happening. And based on a set of rules, the probabilities, we'll try to shift them, and then send it to the active learning or the MLOps engine to train again. So, this kind of real understanding. So, I took an example of NER, like how we detect a particular word. Every day when we use new words, the NLP learn model has to understand the new words also. Right? So, these things can be triggered accordingly.

    So versioning of models, how would you manage the versioning of both data and models? So, I would obviously put versions in DVC, we have data version control, and I have Git version control. So, all these things I would use to put versions of my data and the models and push it and pull it accordingly. That's the real technique. In MLOps, when I'm continuously iterating over and training a new model, I'll create a new model and then make sure that the versions are put in Git or DVC. These are quite straightforward techniques where I can version, and I can even use S3 for customized versions. So, all these things are quite manageable in real time.

    For testing and development of generative models, there are three factors, or triplets. The honesty, and harmfulness. And one more is harmless, honest, and fact. So there are three things which generative AI models have to make sure they're working properly. So they should not be hallucinating. It should not be harmful. When people ask dangerous questions, such as how to make a bomb, it should not be answered. And it should be harnessed. It should not give some wrong facts. For example, we have to say that the president of India is Modi. Can generative models create something new? Right? So the 3H formula has to answer that we have to make sure. And how to make sure it is giving the right answers is the human feedback when you're making it, and RLHF, the reinforcement learning from human feedback model. There's also something called constitutional AI where you can set different sets of rules and then make a secondary check in the testing phase to make sure it is giving the right results. And for the scores and all, we have ROSE metrics. We have benchmarks. So we can test the benchmarks. We can test the ROCE metrics and make sure the model is very good for deploying. Right? So testing, obviously, the 3H formula and which I said already. Also using ROG metrics and benchmarks. Yeah. So these things will help for a good model. And deploying, yeah, we have to deploy safely in AWS servers, customized models, or use some third parties with custom APIs. So a very good, robust deployment, and it can auto-scale based on the load balance effect. Right? So these are the strategies I would use. Right?

    Yeah. So this is a simple issue. So, transform model has no attribute to pretrain. That means either your import is wrong, you know, from adding case import, transform model Importer naming is wrong, so that is a primary reason. Secondly, that particular library is not available for that model. Like, sometimes we use auto model for class sequence classification. Sometimes we use Llama model for token classification. So it depends on the model, and it depends on the library. And even after doing all these things, if it is still coming along, that means the import doesn't have a functionality at all. Or your import is something like import hugging face dot transformers as xyz. Right? So if you're using Xfizer, then the Xfizer will not work here. So from trade trends, basically, it will not work if the model is not pre-trained and the model doesn't have that feature of, you know, taking from a pre-trained model. So that's an easy issue.

    Why might this loss function be inappropriate is that it's a custom loss function. That's great. And loss dot David has reduced mean of absolute of y true minus y predicted. So why are you trying to reduce mean of absolute for a generative model? I don't think this is an effective way to decrease the loss for a generative model. Mean is not suitable for generative models. Obviously, there's no y true and y predicted in a generative model, which are words. Right? They're not numbers or something. Right? So you can't say that y true minus y predicted or something like this happening in the generative model. Right? So you have to use something like ROUGE metric, maybe something like reducing the mean of absolute of the number of true words matching minus the number of predicted words matching. Suppose I'm predicting "here is my house" and the actual is "here is the house". So 3 words have matched, "the", "my", and "house" haven't matched. So three out of four words have matched. So you can say that the number of true words matching is 3 and the number of predicted words matching is 3. So 3 minus 3 and absolute of that is, you know, something mean. Right? Mean of that is, you know, 0. So you can say the ROUGE score is 0.

    Design a very high level architecture for a scalable generated way to system focused on text generation. Okay. The architecture high level architecture. So we have we have front end, and then, the front end calls the back end. And the back end has, integrations with AI systems so AI servers. So AI servers, they deal with something called RAG methods, like lang chain or something. the rag method deal with, has how to access, you know, the models like OpenAir, you know, llama 2 or something, which is hosted in some GPUs or maybe third party APIs. And, you should have some vector databases to do this for doing some manual actions, you have to access some microservices, to do this. And then, no. You have to have this So or, you know, Airflow or MLOps engines, which continuously give inputs. You have to have the, reward model being trained, or affected and submodest to entity class entity classifiers. So they're all subset of things. We'll link it to each of the AI server to LAN chain or to, a RAG model I mean, a RAG method and then to API to, the original model, Nava model or something, and then direct databases, And then the reward models, the sub entity classification models, so all these are part of ecosystem. And the inside this, there are multiple serverless, multiple hits to internal APIs and all. And then this call gives back response to back end. Back end back end has access to, no, the internal, the session management or databases or all these things, and then it gives back to the front end and manages it. So all these are part of the AWS cloud, and, you have to access you have to do this real time with MLOps. you have to do with Bitbucket, the cloud versioning, the cloud model versionings. So all these things are part and parcel.

    In a multiple project environment, how would you ensure consistent performance of generative models across teams and datasets? Okay, so in a multi-project environment, how would you ensure consistent performance of generative models across different teams and datasets? So when you mean consistent performance of generative models across teams and datasets, that means the context of the data generative model has to not change why it has to be used. You have developed for a particular reason, and if it is not being used for a different, same reason, then it will obviously not be consistently performing. So, it'll have this consistent layer with a set of rules, what does do, what does not do, and when you measure the performance of the model across teams and different datasets. Yeah. So one thing is, if you are working with different teams, then they have to prompt-engineer the model according to their need and use case. So the model is already in a cloud server, so they can access it. Every team has a different use case, so every team has to write proper engineering steps accordingly and hit the server. Secondly, different datasets. Obviously, every team has its own datasets. They can fine-tune the model and put it as a version. With customer access, they can create instructional datasets or something. And, thirdly, if they're not doing training, so they can host their data in some vector databases or some semantic DBs. So where the data is put in their collections, and the model has to model the context to send to the prompt, and then the model gives a good response. So you have a vector database. You have a generative AI model. You hit the context, get the real context from the vector database, mix it up with a question, and send to the generative model, and then you get the response. The model is not even tested to change. Just the interaction is happening with multiple teams. So it's all centered around the generative model, and the teams and datasets are accessing it according to their respective understandings. Here, the database is separate for everyone, or the collection is separate for everyone. And, the use case is different for everyone, so they have to write their steps of instructions. So it's all a combination of steps, databases, and then the model.

    I think we split in model for chatbot project and just for your choice. Okay. For chatbot project, very good hack invest model. Obviously, I would say there are many. I know we can use Mistral. We can use NAMA too. We can use even Bert. Why not? Many times, if the chatbot use case is very small domain or small set of tasks, so we can go for smaller models because smaller models can be fine-tuned very easily and they're fast in inference. They're fast in inference. So chatbot models are good. I mean, small models are good for small tasks specific tasks, and they're very fast in inference. They don't need to wait because when you hit a larger model, it takes time to infer. Small models for small tasks are good. For a generic task, you need a bigger model. For a small task, small models are good. And sometimes, if we disable specific rule-based understanding of the chatbot, and all. So then you can use three to four models like a small intent classifier, a small entity classifier, and then a small dialogue generation model. So three models you can use and then combine them. Now if it is like broader, then you can use a small specific task-based generation model like Llama 7B or Mister 7B or 13B, or some small models, or even BERT, if it is fine-tuned, or BigBird, if it is fine-tuned. So that's a good choice. If it is a really broad use case, then we have a bigger model like LaMaa, a 270 million model or something. Right? So if it's really big, it's very good at understanding and replying back accordingly. Right? So there are a lot of small digital models, so we can use even those. They're very fast and small, but have the same accuracy. Secondly, you can also implement it with the quantization of, for example, 16-bit or 18-bit, instead of using 32-bit, they are faster. Because chatbots have to be fast. We can't wait for a response. It can't be generating; it has to be giving the response.

    Purpose in an approach to fine tune a GPT two model, specifically for client's domain specific language. Okay, so it is quite straightforward. You need to have that domain knowledge, firstly, and then you have to create the dataset for that particular use case, and then you use a GBT two model, where you have your input and output or dual inputs and one output, or whatever is the idea. So it's input and output. You are using a decoder mode, a decoder-only model. Okay. So, you have to use the same embeddings. First, whatever input you are having, embed it with GPT two embeddings, then give your input and output, and then it trains it one by one. And then based on the response or loss, it can understand the performance of the model, and then you can retrain or something. Right? So the fine-tuning approach is straightforward. Get the dataset, embed it, and give the input and output in the right formats and embedded formats, and then train it. Few things to make sure is the data should be very good, and you have to choose the inputs accordingly. You can have five inputs also. Right? You have to choose the inputs accordingly. The input and the output have to be related to it. So, this is a step. And the last, when a model is made, so it's basically GPT two is like next word prediction, next sentence prediction. It's like causal learning. Right? So it is like, when you hear it again, for every word, it is saying to put the next word. So, that is an end-to-end approach. You can also implement pretraining if required, not just fine-tuning.