
Consultant
DeloitteAssociate Data Scientist
Reflexion AIAssociate Data Scientist
Reflexion.aiPython Developer
Biosense TechnologiesAI Developer
Dataviv TechnologiesAssociate Data Scientist
Sciffer Analytics Pte LtdMachine Learning Intern
Biosense TechnologiesCyber Ambassador

Python
.png)
Docker
AWS (Amazon Web Services)

Google Cloud Platform

Django
.png)
FastAPI
.jpg)
Hugging Face

TensorFlow Hub

PyTorch

Tensorflow

OpenCV
Yuki Tushy. I have approximately 5 years of experience in data science, AI, ML, as well as Python development. I have a versatile portfolio where I worked on many different types of projects, including machine learning, mostly deep learning and computer vision. My expertise lies within deep learning and computer vision, but as the trend has more diverted towards LLM and large language models. I have also contributed to getting familiarized with Tuuk's technologies. About a year ago, I started working on NLMs. One of the projects I am working on is based on visual transformers as well as the latest video released with the code. So I have expertise in the Python language as well. Apart from that, I have a good understanding of core principles such as database management, handling pipelines of data, and handling deployment. I'm also eager to learn new technologies. I have recently gained exposure to MLflow and other deployable technologies. I also have expertise in the latest version of PostgreSQL, this database.
The techniques depend on the and the size of data which we need to train. Most preferably, I'll use continuous development and continuous integration techniques where I would have a base model prepared on a large dataset. And depending on the base model's accuracies and the base model's performance benchmark on different sets of datasets. I'll build up on that by hyperparameter tuning. I can do manual hyperparameter tuning depending on the experimentation as well as opt for an option, which is an automated hyperparameter tuning tool. There are also some different libraries such as Weights and Biases and TensorBoard which can help in logging and monitoring the different types of parameters used for experimentation.
There are different types of understanding of loss functions. Primarily, the problem that we are tackling is using algorithms that have worked the best in research papers, depending on the loss function. That is one of the mechanisms I'll use while approaching deep learning and choosing the loss function. In case there are other resources or materials suggesting better loss functions, I can opt for those as well. However, primarily, published research papers and conferences suggest the loss function that has performed better in this case, and my approach would be to use that particular loss function. If I have to switch loss functions, my approach would be to have an experimental approach where I would consider opting for multiple similar types of loss functions. For example, if I'm choosing the cross-entropy loss function, it might be also beneficial to ensure that it works well on classification-type data.
Assigning tasks is what is primarily important when working on cross-functional tasks, depending on the manner they might be. And as the latest trends and technologies have been evolving, there have been incorporations to implement multiple sources or multiple types of technologies to get a better output. One of those examples might be a Postgres ML, where now you can have model inferencing through a basic SQL script. So I think that is something I take as a lesson, and I would prioritize individual tasks over team tasks because once individual tasks have been fulfilled and met, then only the dependencies towards other team members' tasks can be functionally fulfilled. So while implementing AI features in a cross-functional development, I would prioritize individual tasks over team tasks because in a cross-functional platform or cross-functional environment, individual ideologies or individual work are more important than teamwork because the individual's work is dependent on the cross-functional team.
There are multiple approaches to solve the problem of implementing a continuously training NLP model with new incoming data. One of the most efficient approaches is to have a storage mechanism where incoming traffic is stored, then attached to a data pipeline. The data pipeline's job is to clean the data and properly establish it for the model. Then, there will be a training pipeline that takes the cleaned data and trains the model to perform the NLP task. Once the training task is done, there will be two types of pipelines. One can be a deployment pipeline and another is an inference pipeline. Within the deployment pipeline, we need to add a scheduler or scheduling mechanism where it would periodically retrain itself depending on the solution required, but also doesn't have a catastrophic forgetting mechanism where it forgets previous rates and outputs. So, that is something we need to monitor. To tackle those problems, I think newer trends and newer technologies such as vector databases can be implemented to have a storage mechanism.
The mitigation strategies are basically multiple strategies to avoid biases in the data. One of which is to incorporate representative data that represents all types of data, regardless of the problem we need to solve. Another way is to demographically ensure the data shows all regions and countries. That is also one of the approaches we can use to mitigate skewness and bias in the data. There are also other imputation and augmentation techniques that can be used to normalize the data. That's it. I think these are the strategies we can use. Apart from that, there are also other strategies that can be researched and implemented depending on the data and the scenario we are working on.
Self-attention mechanism wouldn't take three inputs of x, it would be a single input. The basic transformer block has the attention layer with a feedforward neural network, so I think that's something which is missing. And also before the attention layer, there should be some additional layers such as an embedding layer. In case we are incorporating it within the transformer block, we can add the positional embedding vector layer also within the transformer block. I think that's something. The input shape for the self-attention mechanism, I think that's something which is
Absolute mean mechanism is somewhat often nonstandard for loss mechanism in LLM. The most frequently used or most promiscuously used loss mechanism for any kind of generative model is mostly the root mean squared error. So that the steps the model takes towards the gradient are higher, and it reaches its global minima faster. So I think that is something we can implement here.
There are multiple readily available architectures which we can use for text generation. One of which is using the BERT mechanism, which is a bidirectional encoding and coding representation of transformers, where basically it has only the decoder layer. Stacking the decoder layers together can be helpful for index summarization problems, whereas the output layer can be a softmax layer.
For ingestion pipeline, there are multiple cleaning and preprocessing steps that can be done beforehand to identify anomalies. One of the steps, while the condition of data is met, is to clean the data and have preprocessing done on the data. This particular step can be useful in avoiding anomalies. I think that is a particular step that can be done to avoid anomalies in the data. Apart from that, we can also have a CloudWatch mechanism where it checks the data. It has crawlers. We can use different types of crawlers even before having the data ingested into the pipeline to basically sort and eliminate any kind of anomaly.
There are different types of benchmarks to validate output based on the LLM. So, the sentiment analysis part, I think, there are different benchmarks readily available. I don't remember, on the top of my head, the benchmarks to validate the particular process. But, apart from the readily available benchmark, we can also have a human-in-the-loop mechanism where the human itself can go through the result output and validate it if the output is somewhat acceptable.
Depending on the use case, we can choose either TensorFlow or PyTorch. In both cases, the programmatically and logical consensus between TensorFlow and PyTorch are somewhat similar. So either choosing TensorFlow or choosing PyTorch won't hamper the overall development, but for the simplicity's sake, I would prefer PyTorch as it is open source. Although TensorFlow is also open source, it has its own limitations and works well with TPU format architecture, whereas PyTorch can handle almost all types of architecture. The conversion of TensorFlow and PyTorch to optimized version, to a quantized or optimized situation are somewhat similar. Both can be condensed into smaller forms. Both have batch fetching mechanisms which can be useful. Both have their own limitations and enhancements, but I think either is fine. While choosing development, transformer-based text mostly most examples of transformers have been made available by PyTorch. So I think PyTorch is more preferable than TensorFlow as the backend for the Hugging Face CP, which uses PyTorch. But again, TensorFlow, if we are choosing to develop Google-based architecture like Bard, then I think TensorFlow is much better. So depending on which you choose for the alternative model, it can be beneficial to have a