profile-pic
Vetted Talent

Gelli Tarun

Vetted Talent
A skilled machine learning engineer passionate about solving real-world problems. Wish to explore this cutting-edge technology to help organizations develop new and integrate products Collaborated with multivariate teams of product development to insert trained models and gauge performance improvement. Planned, researched, and developed SOTA deep learning models to evaluate and perform semantic segmentation, object detection, and classifications. Developed data analysis and data preparation pipeline.
  • Role

    AI/ML Engineer and DATA SCIENTIST

  • Years of Experience

    4.3 years

Skillsets

  • Deep Learning
  • NLP
  • Python Programming
  • Classifications
  • Convolutional neural network
  • Data Analysis
  • Data preparation
  • object detection
  • semantic segmentation
  • Speech to text
  • Statistical Modeling
  • Python - 5 Years
  • LLM - 3 Years
  • AI/ML - 4 Years

Vetted For

10Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Machine Learning Scientist II (Places) - RemoteAI Screening
  • 66%
    icon-arrow-down
  • Skills assessed :Large POI Database, Text Embeddings Generation, ETL pipeline, LLM, Machine Learning Model, NLP, Problem Solving Attitude, Python, R, SQL
  • Score: 59/90

Professional Summary

4.3Years
  • Jan, 2021 - Present4 yr 8 months

    AI Engineer and DATA SCIENTIST

    BIOMED INFORMATICS

Applications & Tools Known

  • icon-tool

    R PROGRAMMING

Work History

4.3Years

AI Engineer and DATA SCIENTIST

BIOMED INFORMATICS
Jan, 2021 - Present4 yr 8 months
    Building AI models, explaining the usefulness of the AI models to a wide range of individuals within the organization, developing infrastructures for data transformation and ingestion.

Achievements

  • Building AI models
  • Built working models using Deep Learning
  • Developing infrastructures for data transformation and ingestion
  • Applied data science techniques

Major Projects

24Projects

Detecting Diabetic Retinopathy

    Built a CNN model for detecting diabetic retinopathy and deployed it using TensorFlow Serving.

Stock Price Prediction Using DEEP-Q Learning

    Prepared an agent by implementing Deep Q-Learning that can perform unsupervised trading in stock trade. The aim of this project is to train an agent that uses Q-learning and neural networks to predict the profit or loss by building a model and implementing it on a dataset that is available for evaluation.

Health diseases Cardiovascular diseases

    Cardiovascular diseases are the leading cause of death globally. It is therefore necessary to identify the causes, so i had developed a system to predict heart attacks in an effective manner.

Stock Price Prediction Using DEEP Learning

    Prepared an agent by implementing Deep Learning that can perform unsupervised trading in stock trade. The aim of this project is to train an agent that uses Deep Learning and neural network models like RNNS AND LSTMS to predict the profit or loss by building a model and implementing it on a dataset that is available for evaluation.

Health Diabetic Retinopathy

    I had built a CNN model using distributed training that can detect diabetic retinopathy and deploy it using TensorFlow Serving.

Predictive Modeling

    I had built a machine learning model for a US Client which can predict runs of a batsman and number of wickets can be taken by a bowler in T20 matches using machine learning.

Health of Lung Infection

    I had built a model using a convolutional neural network that can classify lung infection in a person using medical imagery

Health diseases

    Cardiovascular diseases are the leading cause of death globally. It is therefore necessary to identify the causes, so i had developed a system to predict heart attacks in an effective manner

Chatbot Using Generative AI

    I had developed a real-time chatbot using LLMS and Layout LLM (Open ais Gpt-3, Whisper, Microsoft T-5) for sequencing and Whisper for speech to text processing to engage with the customers to boost their business growth by using NLP and Speech Recognition. We had deployed using Flask for web development & Microsoft Azure for deployment. The chatbot is very helpful for its 24/7 presence and ability to reply instantly.

Studio Looker

    This is a Solution for our Relationship Managers so that they can easily understand about our clients when they are about to contact and know about their likes and dislikes using the data aggregated of all features like from demographics to text to investments.

Top Up Model

    So, we had developed a model which uses the textual data from our clients to prioritize to contact our clients based on the probability by predictions, we used the meeting notes, call notes and the emails data and also with some feature engineered features from the above data.

Digital Footprint

    It is a dashboard created by us which uses the clients and prospects data from emails to calls to meetings counts and their digital activities and uses XGB Model to predict which client is low hanging fruit and fruitful to become a client or do a Top up if he is a client already. So, based on their activities to make it easy for our RMS to Contact and understand whom to contact in order.

Chatbot (SPEECH TO TEXT FOR CUSTOMER SUPPORT)

    I had developed a real-time chatbot to engage with the customers using voice commands and solving queries in order to boost their business growth by using NLP and Speech. The chatbot is very helpful for its 24/7 presence and ability to reply instantly.

Detection of Lung Infection

    Built a CNN model to classify lung infections in patients using medical imagery.

Health Care/Cardiovascular diseases

    Developed a system to predict heart attacks effectively, addressing a leading global cause of death.

Chatbot Development

    Developed a real-time chatbot with NLP and Speech Recognition to engage with customers and enhance business growth.

Facial Recognition

    Using a deep convolutional neural network (CNN) to perform facial recognition using Keras.

Emotion Recognition

    Future customizations, such as understanding human emotions, could lead to a range of advancements, such as determining whether a person likes a specific statement, item or product, food, or how they are feeling in a particular circumstance, and so on. I had built a model using a convolutional neural network that can classify a person's emotion

Lending Loan Data Analysis

    For some companies correctly predicting whether or not a loan will be a default and it is very important. In this project, using the historical data, I had built a deep learning model to predict the chance of default for future loans.

Prepared an agent by implementing Deep Q-Learning that can perform unsupervised trading in stock trade.

Stock Price Prediction

Health Informatics/Detecting Diabetic Retinopathy

Health Informatics/Detection of Lung Infection

Health Informatics/Cardiovascular diseases

Education

  • PGP IN AI/ML ENGINEER

    PURDUE UNIVERSITY, USA (2023)
  • B.TECH IN COMPUTER SCIENCE

    SREENIDHI INSTITUTE OF SCIENCE AND TECHNOLOGY (2022)

Certifications

  • Pgp in ai/ml engineer

AI-interview Questions & Answers

Sure. Uh, myself, Kelly Turan. I'm currently working as a AI engineer at Biomed Informatics. Basically, uh, my background is I had studied, uh, and I had done my bachelor's in computer science engineering. And then I did a PG diploma in AI and ML. After that, like, I started working with Biomed Informatics as a AI engineer. Like, here, base basically, we are into developing AI ML models, which are used in health care sectors mostly. Like, using computer vision technologies, we are building predictive models which can analyze CT scans and extreme images and can detect which type of tumor or else which type of, like, fractures are is the patient suffering from? And, also, we do predictive analysis for the p to b industries placed on the client's data. Like, suppose recently, we worked on a project which is, like, a IPL runs prediction of the bats men and the wickets prediction for the bowlers based on the data, like, which was provided by the client. And, also, like, we do develop chat boards integrated with the LLMs, like implementing ad pipelines and then fine tuning the models based on specific purpose, use cases. And then, also, we quantize the model so that they can be deployed easily with the less requirements of the hardware systems. So these are the main, uh, background of my education and work experience.

Uh, so, basically, the ETL pipelines is extract, transform, and leverage pipelines, which is used in our transforming the raw data into the processing methods so the model can able to digest the data. So geospatial data means, like, a data with the diverse, uh, kind of, like, uh, text or else, uh, like, like, in Excel numbers or else categorical. So to develop the setting up a procedure in this pipeline, firstly, we need to convert the categorical ones, which is, like, uh, weather, uh, sunny, like this, into the vectors. Like so the model can uh, we want to convert into numbers. And then, also, we have different procedures, like in the geo special where we have to transform the scaling procedure of our data. So in this case, we use the MinMax standard scaler, robust scaler based on the kind of data and their values. So, particularly, we do scaling to make the all the, uh, values into a particular range so the model doesn't get, uh, overfitting or underfitting. So in this case, we use scaling. So to setting up the ETL pipeline, firstly, we use a pipeline method from the psychic loan to set up the pipeline. So in that, we include the scaling methods and the model, what kind of model we have to use, and the metrics we need to we need to, uh, check for the model. So this is a basic procedure and a high level procedure for implementing our ETL pipeline to handle the geospatial data. So using the pipeline from the site click learn, we can able to create a ETL pipeline which can extract, transform, and leverage the data whether which kind of data is it. It's, uh, it can leverage all kinds of data and handle on all kind of preprocessing techniques.

So, basically, uh, the system designed to automate the recognition and flag flagging of outdated, uh, POA. Design. Uh, so So, like, uh, I am not familiar with this, uh, kind of, but I have a bit of knowledge on, uh, the automate recognition. Like, like, uh, POA, I am not POA, just a second. UI listings. So, basically, the POI, uh, listing means it's a kind of, like, a technique which can, uh, leverage the pipeline of a process. Like, it's a important process where the where we can uh, sorry. I'm not able to recognize exactly.

So the LLM is best choice to enhance an existing NLP based system because, uh, the NLP based systems are built using LSTMs or or encoder and decoder, but, uh, they are not trained on huge amount of data. So they are not able to, uh, give a good response to the, uh, users, basically. So as, uh, instead of the NLP based system, if we use an LLM on the data which we want, then it is very good technique to make our chatbot very effective, uh, which can be liked by the users, and the users will be using more. And they will be also, uh, they will be satisfied with the, uh, they will be satisfied with the model. So it is also easy to train the model and leverage the model because using, uh, existing LLM, which is trained on huge amount of data with billions of parameters can enhance more knowledge and, um, more knowledge on our data after converting our data in which in the words to vectors. So based based on, uh, using LLM, it can able to not only generate good responses, but also can, uh, leverage the techniques and, uh, enhance and make the word make the model to just respond as a human instead of a just a chatbot and also can help in many ways in, uh, with the less number of training hours, and then it is very cost effective also for the training purpose. Like, instead of using hundreds of GPUs, we can use a single GPU and train a model or train a multi a billion model with multimodal also, which can be effective and very, uh, best in the case of using an existing NLP based system.

More script for that event to Huddl do now. Can fold increase in data science. So, basically, uh, instead of using our, uh, we are optimizing the R code, we can affect efficiently handle the data, like using data table instead of a data frame for faster data manipulation. Also, the memory management cleanup used objects like, uh, using r m and garbage collector functions to free up the memory. Also, vectorization replaced loops with vectorized operations whenever possible, and, also, it is it is, uh, it supports parallel processing, like parallel libraries utilizing packages like parallel and full search for parallelized tasks, and distributed computing can also be used. We're considering using sparkly to interface and for a with a page for for distributed data processing and also algorithm optimization, like choosing algorithms that scale well with the data size, such as stochastic gradient descent for linear models, also sampling techniques. Then we have batch processing for algorithms that suppose process data in batches instead of loading all the data. And it is efficient, uh, operations with the data storage and database integration with the Postgres SQL and MySQL and the hardware or utilization, like using the machines with more RAM and CPU post for cloud solutions like AWS for scalability and GPU acceleration using packages like TF, Fintense, flow, and Keras for GPU accelerator and machine learning. And also benchmarking profiling, we we can use using pro FPIS package, and then cloud and distributed computing, like using h 2 for scalable machine learning with Huddl interface. So these are the techniques, uh, so we can conclude that scaling an art script for a 10 fold increase in data science requires a combination of coding practices, leveraging parallel and distributed computing, and potentially utilizing more powerful hardware or cloud services. By systematically, like, applying these strategies, we can ensure that, uh, our script remains performant even with significant larger datasets.

Oh, so, basically, uh, like, firstly, we can involve the system architecture, like, using real time data streams, like, uh, by each Kafka, Amazon, Kinesis, or Google Cloud, and then maintaining a database of POI information which could be stored in relational database like, uh, PostgreSQL or MySQL. And then data processing layer here, we have stream processing using stream processing frameworks like AppEdge, Flync, or AppEdge, or Spark Streaming, or Kafka Streams. And then microservices, we have implementing microservices for different tasks such as data ingestion, data processing, and data enrichment. And then, uh, using the POI table, we can store the information in a relational table with appropriate indexing. And, uh, then real time, we have data tables. Then, uh, enriching and process, like, using the data in this and data parsing, geo spatial enrichment. We process our data and into with POI information. And then, uh, we have setting up with the real time, like, streaming data. Here, we can use, like, geospatial queries using post GIS extension and post SQL. And, uh, and then we have monitoring and scaling where we are implementing monitor techniques using tools like, uh, and to track performance and detect ball bottlenecks and auto scaling, like, using we can use, uh, the Kubernetes or else cloud based solutions, like scaling factors to handle varying loads. So, like, by leveraging, like, real time stem processing frameworks, efficient database, g special capabilities of post JS and microservices, we can build a robust SQL based solution for real time POI data enrichment. And sure we need to ensure to monitor scale our system as needed to handle growing data volumes and

Selecting UI table with certain attributes is what it's trying to accomplish. Uh, so, basically, here, uh, we are trying to accomplish. So, like, in this scenario, we, uh, So, basically, uh, looking at the table, I can see that we, uh, I can see that, uh, we are selecting the name and the location category from, uh, POIS. Like, uh, it's a table. So we, uh, fill then filtering by category. Like, it attempts to filter the POI where the category is either hotel or, uh, restaurant. And then non null location, it filters out, uh, the POIs where the location field is null. And then sort by sorting by, uh, name length we are doing, the results are, like, intended to be ordered by the length of the name in field in the descending order. Then it we are, uh, limiting the results up to the top ten results. Uh, I think there are a few syntax errors, uh, in that, like select field, like, uh, few syntax user. So this is, uh, what we are trying to do from this SQL.

Uh, so we are exam uh, exam and this Python code for passing geospecial because there is a code that could lead to unhandling its patients. Uh, yes. I think there are few issues in the code. Like, uh, firstly, like, in 10 day indentation error, the code is inside to the try block. It's not properly intended. Uh, so this slices, uh, the indentation error, then the import statement the import statement uses GeoPandas as g d, but, uh, later uses GPT, which is, uh, inconsistent. It should be, uh, consistent consistent, uh, throughout. Mhmm. And the read file function calls the mismatcher code around the file name. And then the syntax in the expect exception handling the string inside, uh, the print function has mismatched your codes and is missing the close parenthesis and double quotes in the first print statement. The print statement for the second general exception error handling is missing an opening code for the format string or a closing and the closing parenthesis. So these are the errors I can see in the code.

Uh, so, basically, uh, to, uh, uh, extending an ETL pipeline to integrate with the POA and, uh, finding involves, like, several steps. Like, we can use the API integration, like, using the restful APIs or web services provided by the 3rd party POI data providers using tools like the request in Python for fetching data and then, uh, scheduling extraction, like implementing crons or jobs or scheduling tools, like, a pitch airflow to regularly pull data. And then, uh, using the transforming phase, like data cleansing and normalization scheme mapping, like mapping the 3rd party data scheme to our internal schema or data validation, like validating data types, removing duplicates, handling missing data, then we have we do the loading phase where we have transactional loading and stacking tables, like using databases transactions to ensure asset properties. Right. And tagging tables, like loading data into tagging tables. Firstly, like, to validate before merging and, uh, merging into, like, main tables, then ensuring, uh, asset properties, like ensuring that the its ETL step is atomic in case of failure. The system should revert to the previous consistent state. And, uh, using constraints like trigger and validation rules in the database to maintain integrity and the isolation we have, like, implementing proper transaction isolation levels to ensure that transactions do not interfere with each other and the durability, like, ensuring one that once the transaction is committed, it is stored permanently to so the fact that we use the reliable storage solutions and regular backups. So this is the process where we can integrate third party POI data feeds while maintain

So, like, basically, of divider being so, basically, like so if I click to the without self filing, data integrity. Uh, so in this case, we had to do the extracting phase using, as I said, API integration using restful APIs or web services. Oh, and then and then we use the transforming phase like schema, uh, mapping and then data validation and loading phase like transactional loading and stacking tables. And then we have, like, uh, we we ensure the it for, uh, with we ensure the asset property. So for greater efficiency, like, without sacrificing the data integrity. So, uh, so, like, so we can plan our plan our, uh, optimizing this, uh, like, uh, like this, uh, below, Like, the previous process we for the previous question, like, we we had asset properties and, uh, like that, we can do it. Or it's slightly viewing and refactoring queries, like analyzing existing queries to identify the bottlenecks and uses, uh, indexes, like identify columns frequently used in weight clause to end joint conditions, and they are limiting the use of distinct minimize the use of distinct charges can significantly impact poetic performance, ensure, um, that it isn't necessary for the specific views, and using union oil instead of union, and normalizing the data model, denormalizing for reoperation. So consider demonet denormalizing table for reading heavy operations to reduce joint operations and improve query performance and query execution plan analysis. We can do, like, using join operations. So we need to choose appropriate algorithms like hash, merge, and is still based on the size of the table and available indexes. Then, uh, parameterization, like, instead of embedding values directly into SQL queries, we can use parameterized queries to avoid SQL injection and regular maintenance and then testing and which things which can be used to plan and optimize SQL queries used in ETL process for greater efficiency without sacrificing the data.

Auto plus which are data integrated in an existing Python based e l t t l pipeline for the announced processing. So, basically, we can, testing button. So firstly, like, choosing the model and then data preprocessing and then, uh, like, uh, model inference, like, using the NLLM model inference on the input data depending on the case. And then post processing process output generated by the LLM model as needed. This may include, like, decoding, formatting of further analysis and then other, uh, the handling, and then we have we can do testing and validation. After that, we have to mostly to performance optimization, optimizing the LLM integration for performance by leveraging techniques like a batch processing and parallelization, and then documentation and training, documenting the integration process, providing training and support for the team members who are working with the integrated LLM model. And then we used the CICD pipelines continuously monitoring and evaluating the performance of LLM integration. So, uh, this is the main process. In other case, we can also use the AB testing, prompt tuning, retuning, prefix tuning, fine tuning the models based on the data which we have, which can improve some models for, like, rogue. We we can check the rogue metrics or using an LLM only. We can do the model evaluation. So using these kind of steps, uh, more more now, we have the weights and biases, which is called as WNB, also can be used to ETL pipeline, which can regularly monitor the model's performance and sends, uh, a mail if, uh, if the model is stacking or, uh, lacking behind the scores, which we have benchmarked. So this is a process we can enhance the NLP processing.