
Data Scientist
Info Edge India Ltd.NLP Developer Intern
Crion TechDeputy Head
International and Alumni Relations Student Council, IIT Madras
SKlearn

Elasticsearch

Tensorflow

Keras
.png)
Docker
Responsible for managing the complete seeker search environment for Naukri.com
This includes context-rich feature extraction from jobs to index in the elastic index, autosuggestors, job retrieval and sorting along with the top most personalization layer of 'learning to rank' architecture
Invented new dl Learning to Rank Architecture - Siamese network of BERT encoder, increasing CTR@5 by 7.8% and daily applies by 8%
Leading team of 50 members, raised Rs. 44 million from hostel donation campaign #KeepItFlowing
Spearheaded Alumni Reunions the 3-day flagship event with 500+ alumni by managing 30+ volunteers
Strengthened the connection between alumni and students by introducing structured and efficient webinars
Hello. I'm Rampar Sadra Kone. I am from Apollo, Maharashtra. I was born and brought up in Akola. Uh, I attended IIT Madras for the college. Uh, during my college tenure, I was 3 times national analyst for deep learning competition organized by Amazon and ABN Web. So currently, I'm working with, uh, InfoEdge, uh, for nokli.com@noida. I am data scientist too over here handling secret search. For seeker search, we have to, uh, maintain the complete user seeker search journey. And in that, there are 3 components. 1st is, like, whenever a user come and he searches for a query, we have to fetch all the relevant jobs. Then we have to re rank those job. And then on top of that, there is a personalization layer of re ranking last 20 or 40 jobs. So I've developed an innovative algorithm for, uh, this last layer, which has, you know, given us massive gains of 7.5% in CTR 5 and 7% in applies.
I'm not sure about this.
So by data set that we are working with.
I'm not sure about this.
It depends on what NLP based system we are talking about. If it is the one that can be benefited from the embeddings, then the embedded generated by LLM can be used. If it is the one that basically algorithms can be used directly as well.
I'm not sure about this.
It is trying to fetch 10 values from points, which are basically name, location, and category. Their category should be hotel or restaurant, and location shouldn't be null. And then we are trying to order them by length of name. But the problem is over here that, uh, limit 10 and the first, uh, uh, first and uh, first statement are separated by, uh, semicolon. So that means the first will run entirely and then the second will run. So that is not actually achieving the purpose. So, uh, after this, there should not be a symbol.
So in this Python code, we are, uh, trying to read a data frame. If their file is not found, then they're using the error. But what's, uh, what if, uh, there is some, like, file is found, but there is an there is another error, then that error is not being cached. So I believe that, uh, there will be a difficulty, and there there is a bug. So instead of file not found, uh, we can directly put, uh, try and last exception block over there.
So we'll talk about this in some steps. 1st of all, data gathering, then, uh, doing the EDA on the data, trying to get the feel of the data. How is the data? What are the feature? Then we will go to the feature selection methods. There are, uh, there will be, uh, feature selections and preprocessing of the features, then the model decision, uh, that is to be made. Then what is our what will be our metric as or here it is, we can high accuracy metric. And then we will have to go into the actual model building part. So let's start with the data collection. So for the data collection process, uh, if we already have data with us, then that's great. Then, uh, we will face that data, and we will, uh, do perform media on it. If there are any, uh, from medium, what about other findings? Those findings will be helpful for the further process. So if, uh, there are any correlated features so that that we will get to know over here as well. And in feature selection method, we will, uh, check for correlation between the features and what if there are many features to in in the, uh, that we have data on, then what features that should be used, uh, that can be checked as well. In the middle of the preprocessing, uh, then if it is a categorical data, then we will convert it to to correct form. In a if it is the new the categorical data will be converted to the numbers by using any encoding method depending on which algorithm we are using. For here, I'll just assume that we will be using random forest, uh, and it will suffice for the given problem. So we will encode the categorical data, and numerical data will be, uh, standardized or normalized. And then that data can be again used and kind of a smaller can be built on. Then, uh, coming to the matrix part, we have to create high accuracy matrix. It works, uh, most of the way, and that should match. If it was a precision or recall oriented metric, then we have to focus on one part where we should maximize precision, we should maximize recall, having the the decent value in, uh, the other field. Then, uh, after this, mostly, it comes down to, uh, validating the model and validation dataset and the for call like, you know, datasets similar to the production. Then after that, it will be basically coming the deployment part will be coming.
I'm not sure about that.
So as I mentioned previously, the best approach would be to, uh, use the embedding generated by instead of, uh, uh, the previous embedding that you are using in NLP based processing. That will be the fastest approach and giving us a lot of gains. Then other way can be, uh, using the generative features of LLM, uh, in the end processing. So if, uh, there is any, uh, any other task, then, basically, we can, uh, use LLM or or the address for a bit more clarity and all. This is a basic example I'm giving. It is a small task, and I you ask, uh, considering LLM. But depending on what we are doing, LLMs can be integrated in their L processing as well.