Vetted Talent

Anubhav Kamal

Vetted Talent

I am a passionate Machine Learning engineer with a strong background in mathematics and computing, eager to solve challenging data problems and deliver tangible results. My expertise spans the entire ML lifecycle—from data exploration and feature engineering to model development, optimization, and deployment. I thrive on building robust, scalable pipelines and have hands-on experience with deep learning frameworks, MLOps best practices, and performance benchmarking techniques.

Leveraging a solid foundation in algorithmic thinking, I excel at translating research-level concepts into real-world applications and ensuring that complex ML solutions not only work efficiently but also integrate seamlessly into production environments. Above all, I’m driven by the impact of AI and the exciting potential it holds for powering innovative products and services.

Role
Associate Staff & Compiler Engineer
Years of Experience
6.5 years

Skillsets

Deep Learning - 3.8 Years
Cnn - 3 Years
Optimization
Symbolic regression
performance benchmarking
Classification - 3 Years
Regression - 3 Years
TensorFlow - 3 Years
Python - 3.8 Years
performance benchmarking
Python
C++
TensorFlow
C
Machine Learning

Vetted For

15Skills

Roles & Skills
Results
Details

Founding ML Engineer/Scientist (Remote)AI Screening
49%

Skills assessed :Excellent Communication, Classification, Cnn, Deep Learning, LLMs, ML libraries, PyTorch, Regression, Rnn, Scikit-learn, Supervised ml, TensorFlow, Positive Approach towards Work, Proactive, Python
Score: 44/90

Professional Summary

6.5Years

Jun, 2025 - Present 10 months
Associate
JPMorganChase
Mar, 2024 - Jun, 20251 yr 3 months
Machine Learning Engineer
Samsung Semiconductor
Jun, 2022 - Jun, 20253 yr
Senior Engineer
Samsung Semiconductor
May, 2019 - Jul, 2019 2 months
Research Student
University of Warwick
Apr, 2020 - Jul, 2020 3 months
Research Associate
Indian Institute of Technology, Kharagpur
Jun, 2021 - Jun, 20221 yr
Deep Learning Engineer
Ceremorphic, Inc.

Applications & Tools Known

Python
C
C++
MATLAB
Mathematica
Docker
Valgrind
Shell Scripting
Linux
LATEX
Valgrind
Shell Scripting
Linux

Work History

6.5Years

Associate

JPMorganChase

Jun, 2025 - Present 10 months

Machine Learning Engineer

Samsung Semiconductor

Mar, 2024 - Jun, 20251 yr 3 months

Senior Engineer

Samsung Semiconductor

Jun, 2022 - Jun, 20253 yr

Collaborated with the Foundry team to develop AI-based techniques for circuit characteristic predictions. Replaced traditional empirical methods (CDF) with explainable AI-based technique (XAI), significantly improving accuracy and efficiency in 80.60% of cases. This data analysis provided crucial insights into device performance, significantly impacting the development and optimization of PDKs. PIMLibrary - Development and Testing (Github) Developed and restructured a comprehensive testing framework for proprietary hardware, PIM, using Python and C++. Implemented a unified environment for different platforms (HIP and OpenCL) utilizing Docker Compose to streamline development and testing processes. Conducted performance benchmarking and analysis using tools like GProf, Valgrind, and Nvidia Nsight to identify and optimize critical performance bottlenecks. Utilized Linux for development, testing, and deployment, leveraging shell scripting and Linux command-line tools for automation and system management. Symbolic Regression for Lithium-ion Battery state estimation Applied Python-based symbolic regression techniques for state estimation to predict voltage profiles and battery parameters. Used MATLAB to generate the simulated P2D dataset. Developed algorithms for on-device state estimation in Battery Management Systems (BMS), focusing on minimizing computational overhead. Conducted simulations and validations using state-of-the-art Pseudo 2D model data to ensure accuracy and reliability of estimation. Prediction involves analytical solutions improving the explainability of the AI model (XAI). QEMU Created a working QEMU environment with Linux installed, facilitating hardware and software emulation. Enabled and modified the default size of huge-pages in the Linux environment to optimize memory usage. Gained in-depth understanding of memory management in QEMU and extracted virtual and physical addresses using custom plugins.

Deep Learning Engineer

Ceremorphic, Inc.

Jun, 2021 - Jun, 20221 yr

Research and development of deep learning compiler Research and development of mathematical approximations to work closely with the original function. Implemented critical mathematical functions (log, sigmoid, softmax, tanh, and sqrt) in backend software stack using C++. Owner of testing framework Built a comprehensive testing and debugging framework for hardware and software testing, ensuring readiness for tape out. Designed and developed test cases for each datatypes to check each hardware pipeline and functionality on heterogeneous hardware (CPU and NPU). Design and optimised solution of ML algorithms Developed different ML/computer vision layers(Convolution and Pooling) from scratch taking ISA/hardware in consideration.

Research Associate

Indian Institute of Technology, Kharagpur

Apr, 2020 - Jul, 2020 3 months

Research Student

University of Warwick

May, 2019 - Jul, 2019 2 months

Optimized power dissipated by the rollers for an important industrial process called metal sheet rolling. Worked on a novel Adjoint-based optimization technique and formulated an example to demonstrate the application. Aimed to develop an adjoint-based optimization solver in OpenFOAM for industrial applications.

Major Projects

4Projects

Non-Photorealistic Rendering Using Evolutionary Algorithm

Generated an image starting from white background to match the input image using evolutionary algorithms. Used concept of mutation to control the sizes, colours and position of dots to match the original image very closely.

Error Based Classification Using Non-linear SVM

Classified 6 classes using 3 kernel functions (Linear, Polynomial and Gaussian RBF) and one vs all technique. Used convex optimization technique for finding optimal hyperplane and corresponding non-linear decision boundaries. Determined most suitable kernel function for given data set by calculating accuracy using F1-score, Precision and Recall.

Multi-Layer Brinkman Solution with Application in Modelling of Arterial LDL Transport

Conducted extensive literature review to gather relevant data on arterial physiology and LDL transport mechanisms. Formulated and solved the Brinkman equations for multi-layer fluid flow in arterial walls.

Fluid Flow Inside a Wavy Channel Filled with Porous Medium

Identified a gap in research for mathematical modelling of fluid flow in anisotropic porous medium in a wavy channel. Developed mathematical models to describe fluid flow behaviour in a wavy channel filled with an anisotropic porous medium. Theoretical analysis of the corner cases of the analytic solution obtained by solving the system of PDEs.

Education

Integrated M.Sc. in Mathematics and Computing
Indian Institute of Technology Kharagpur (2021)

Certifications

Machine learning operations (mlops): getting started
Nlp course

AI-interview Questions & Answers

Yeah. So I am I'm Anwar Samil. I have graduated from Indian Institute of Technology, director I have a master's degree in mathematics and computing. I've worked with an organization called Ceramosic. It is a start up. Of building semiconductors. We were working on a propriety hardware called neural processing unit or an NPU. Currently, I'm working with Samsung Semiconductor India Research. I'm in AI computing department there. So I've been working on a propriety hardware called processing in memory. With a lot of AI and ML driven projects in between, I've been working with in collaboration with foundry team to work on machine learning related tasks and solution of the real problem that they were facing. That they were facing related to their PDK problem. They were trying to find an analytical solution 2 to do their PDC, and we help them And this is in the domain of action enabled AI. I use a technique called, symbolic regression and a Python package called PISR. And, I've been working with AIML and like, for past 3 years, First with Ceramorphic, in the deep learning compiler. And now with Samsung Semiconductor India Research on voice, processing in memory and other projects.

And then balance it in it.

When integrating cell phone model into production, which has been in the at least in model service. model service. I'm not aware of the term with modules. So when I integrate things, PyTorch module into production is challenging task in itself. So we were working with Python's lite and integrated and tried to, run Python's lite on ARM devices on, like, mobile phones and stuff. So we use, Samsung S 24 Ultra to deploy Python Lite and, those were, like, the problem that we were facing was the compatibility issue with the architecture and building pipelines. Right? And to, with certain compiler and stuff. So while building things with ARM devices, we generally use, C line plus or LLVM compilers. And, the version mismatch can happen with Python. And those are the major issues, along with some issues in Linux that you can see. So I'm not aware with model serving and stuff.

So we do this computation first. So once you have dimension reduction okay. So dimension reduction can be applied in various form. 1 of the, popular technique, it's called PCA. The full form is, component principal component analysis. So it is a linear algebra based technique where you find the importance of each of the features and either combine those features with given weights. So it's, or you if the, like, importance of feature is very low, you actually remove them. So that is 1 of the technique, and that is the most popular technique that we use because it does not automatically, make a feature useless, but it can also combine the feature into 1. that is 1 way. Other is to perform various feature engineering on the whole dataset and, see if, like, the correlation matrix and, like, find how much, data is correlated with the how much each teacher is correlated with the output. And, so there is a simple correlation formula depending on each, random variable. And, that can be organized into a matrix and giving a number between minus and 1 and 1. Minus 1 being inversely correlated. That is, like, absent like, if, the output is going is increasing, then your feature would be decreasing. That is absolute, negative correlation. And 1 is, positive correlation and 0 is no correlation. So some a number around 0 is something we do not want into our dataset, and that's how we can reduce the dimension of the data as well.

I think we're just lost in dollars. So for class and moments, A class time balance can be, address with certain, I'm not sure about this patient. No. Yeah.

When setting up supervised learning pipeline, like, it is deep consumerizing, which are in GMS directions. So, while setting up supervisor machine learning pipeline, there have few consideration to keep in mind. Something like you always make sure that the split of the dataset is random enough to give it, like, whole holistic view of the entire dataset, what we are working on. The second is, when working with the feature engineering, you always, normalize the data and then continue the feature engineering. You always make sure, you always make sure that the output and input the output is not need to be supervised. Output is not need to be normalized with the input is. And, you know, you and you always select right feature engineering tools and right feature engineering methods to go about the feature engineering. 1 of them can be data visualization with different tools. So selecting a couple of feature engine features and plot it against the output data. That can give you a sense of how each feature behaves along with the data. There can be, nonlinear, relationship between output and the features. but there should be some correlation. The second would be looking into the correlation matrix and selecting the feature with high correlation shows. And, yeah.

Let me start at 1 end. It's been the issue and how often I get experience. So the optimizer here has been used as, opt in dot SCD. And then we have in files module dot. We have not specified what kind of optimized code yet.

In the 2nd system, it's in the 1 division. It requires and it's saying why it's not much. Much outside. It's accepted. It's a when we need this and accept to generate a short. So division where she was handled, that is really the major concern when you're dividing to number. Right now, in the second print section, we are dividing 10 with a string 2. That is also a major issue because, that is fine. And then we are dividing 2 numbers, not 1 number within, string. So we need to either convert to into a number or this is for an, for an error in Python.

So, I have deployed a few modules on mobile USB or SD license. basically, on ARM CPU. And to look into the latency issue, we generally benchmark server or benchmark any application we are working on. We benchmark using, some of the benchmarking tools, SimplePerks, is a great benchmarking tool. You can directly launch it from ADB. And, what you can also you can get all the cache information, all the memory information, and all the, all the computation information as well. And how you can decide whether to, you know, work on, like, working on the computation part or there is problem with the memory management in the system or the cache is under or overutilized and mostly underutilized. So, in certain cases, like, you make according to the need, you make certain changes.

So selecting a 3 day model. So it should be trained on a very similar dataset that I've I wanted to work on. it should, the data the input dataset should be, of similar or same size that I want, it will work on the filter and the number of, number of image layer that we want. So, the RGB layer, so, channel base should be the same as, we want to we wanted to make we want our, testing image to be. And, it should be tunable enough. The it's the source code should be available enough that we can make enough changes that we can debug the code and fine tune the model for our needs. And also, like, frame the gradient and stuff in between if required.

We want a CW client for the 7.99, so that's 1008, because Okay.