profile-pic
Vetted Talent

Anubhav Kamal

Vetted Talent

I am a passionate Machine Learning engineer with a strong background in mathematics and computing, eager to solve challenging data problems and deliver tangible results. My expertise spans the entire ML lifecycle—from data exploration and feature engineering to model development, optimization, and deployment. I thrive on building robust, scalable pipelines and have hands-on experience with deep learning frameworks, MLOps best practices, and performance benchmarking techniques.

Leveraging a solid foundation in algorithmic thinking, I excel at translating research-level concepts into real-world applications and ensuring that complex ML solutions not only work efficiently but also integrate seamlessly into production environments. Above all, I’m driven by the impact of AI and the exciting potential it holds for powering innovative products and services.

  • Role

    Associate Staff & Compiler Engineer

  • Years of Experience

    6.5 years

Skillsets

  • Deep Learning - 3.8 Years
  • Cnn - 3 Years
  • Optimization
  • Symbolic regression
  • performance benchmarking
  • Classification - 3 Years
  • Regression - 3 Years
  • TensorFlow - 3 Years
  • Python - 3.8 Years
  • performance benchmarking
  • Python
  • C++
  • TensorFlow
  • C
  • Machine Learning

Vetted For

15Skills
  • Roles & Skills
  • Results
  • Details
  • icon-skill_image
    Founding ML Engineer/Scientist (Remote)AI Screening
  • 49%
    icon-arrow-down
  • Skills assessed :Excellent Communication, Classification, Cnn, Deep Learning, LLMs, ML libraries, PyTorch, Regression, Rnn, Scikit-learn, Supervised ml, TensorFlow, Positive Approach towards Work, Proactive, Python
  • Score: 44/90

Professional Summary

6.5Years
  • Jun, 2025 - Present1 yr 1 month

    Associate

    JPMorganChase
  • Mar, 2024 - Jun, 20251 yr 3 months

    Machine Learning Engineer

    Samsung Semiconductor
  • Jun, 2022 - Jun, 20253 yr

    Senior Engineer

    Samsung Semiconductor
  • May, 2019 - Jul, 2019 2 months

    Research Student

    University of Warwick
  • Apr, 2020 - Jul, 2020 3 months

    Research Associate

    Indian Institute of Technology, Kharagpur
  • Jun, 2021 - Jun, 20221 yr

    Deep Learning Engineer

    Ceremorphic, Inc.

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    C

  • icon-tool

    C++

  • icon-tool

    MATLAB

  • icon-tool

    Mathematica

  • icon-tool

    Docker

  • icon-tool

    Valgrind

  • icon-tool

    Shell Scripting

  • icon-tool

    Linux

  • icon-tool

    LATEX

  • icon-tool

    Valgrind

  • icon-tool

    Shell Scripting

  • icon-tool

    Linux

Work History

6.5Years

Associate

JPMorganChase
Jun, 2025 - Present1 yr 1 month

Machine Learning Engineer

Samsung Semiconductor
Mar, 2024 - Jun, 20251 yr 3 months

Senior Engineer

Samsung Semiconductor
Jun, 2022 - Jun, 20253 yr
    Collaborated with the Foundry team to develop AI-based techniques for circuit characteristic predictions. Replaced traditional empirical methods (CDF) with explainable AI-based technique (XAI), significantly improving accuracy and efficiency in 80.60% of cases. This data analysis provided crucial insights into device performance, significantly impacting the development and optimization of PDKs. PIMLibrary - Development and Testing (Github) Developed and restructured a comprehensive testing framework for proprietary hardware, PIM, using Python and C++. Implemented a unified environment for different platforms (HIP and OpenCL) utilizing Docker Compose to streamline development and testing processes. Conducted performance benchmarking and analysis using tools like GProf, Valgrind, and Nvidia Nsight to identify and optimize critical performance bottlenecks. Utilized Linux for development, testing, and deployment, leveraging shell scripting and Linux command-line tools for automation and system management. Symbolic Regression for Lithium-ion Battery state estimation Applied Python-based symbolic regression techniques for state estimation to predict voltage profiles and battery parameters. Used MATLAB to generate the simulated P2D dataset. Developed algorithms for on-device state estimation in Battery Management Systems (BMS), focusing on minimizing computational overhead. Conducted simulations and validations using state-of-the-art Pseudo 2D model data to ensure accuracy and reliability of estimation. Prediction involves analytical solutions improving the explainability of the AI model (XAI). QEMU Created a working QEMU environment with Linux installed, facilitating hardware and software emulation. Enabled and modified the default size of huge-pages in the Linux environment to optimize memory usage. Gained in-depth understanding of memory management in QEMU and extracted virtual and physical addresses using custom plugins.

Deep Learning Engineer

Ceremorphic, Inc.
Jun, 2021 - Jun, 20221 yr
    Research and development of deep learning compiler Research and development of mathematical approximations to work closely with the original function. Implemented critical mathematical functions (log, sigmoid, softmax, tanh, and sqrt) in backend software stack using C++. Owner of testing framework Built a comprehensive testing and debugging framework for hardware and software testing, ensuring readiness for tape out. Designed and developed test cases for each datatypes to check each hardware pipeline and functionality on heterogeneous hardware (CPU and NPU). Design and optimised solution of ML algorithms Developed different ML/computer vision layers(Convolution and Pooling) from scratch taking ISA/hardware in consideration.

Research Associate

Indian Institute of Technology, Kharagpur
Apr, 2020 - Jul, 2020 3 months

Research Student

University of Warwick
May, 2019 - Jul, 2019 2 months
    Optimized power dissipated by the rollers for an important industrial process called metal sheet rolling. Worked on a novel Adjoint-based optimization technique and formulated an example to demonstrate the application. Aimed to develop an adjoint-based optimization solver in OpenFOAM for industrial applications.

Major Projects

4Projects

Non-Photorealistic Rendering Using Evolutionary Algorithm

    Generated an image starting from white background to match the input image using evolutionary algorithms. Used concept of mutation to control the sizes, colours and position of dots to match the original image very closely.

Error Based Classification Using Non-linear SVM

    Classified 6 classes using 3 kernel functions (Linear, Polynomial and Gaussian RBF) and one vs all technique. Used convex optimization technique for finding optimal hyperplane and corresponding non-linear decision boundaries. Determined most suitable kernel function for given data set by calculating accuracy using F1-score, Precision and Recall.

Multi-Layer Brinkman Solution with Application in Modelling of Arterial LDL Transport

    Conducted extensive literature review to gather relevant data on arterial physiology and LDL transport mechanisms. Formulated and solved the Brinkman equations for multi-layer fluid flow in arterial walls.

Fluid Flow Inside a Wavy Channel Filled with Porous Medium

    Identified a gap in research for mathematical modelling of fluid flow in anisotropic porous medium in a wavy channel. Developed mathematical models to describe fluid flow behaviour in a wavy channel filled with an anisotropic porous medium. Theoretical analysis of the corner cases of the analytic solution obtained by solving the system of PDEs.

Education

  • Integrated M.Sc. in Mathematics and Computing

    Indian Institute of Technology Kharagpur (2021)

Certifications

  • Machine learning operations (mlops): getting started

  • Nlp course

AI-interview Questions & Answers

Yeah, so I am Anwar Samil. I have graduated from the Indian Institute of Technology, and I hold a master's degree in mathematics and computing. I have worked with an organization called Ceramosic, which is a startup building semiconductors. We were working on a proprietary hardware called neural processing unit, or an NPU. Currently, I'm working with Samsung Semiconductor India Research, where I'm in the AI computing department. So, I've been working on a proprietary hardware called processing in memory. With a lot of AI and ML-driven projects in between, I've been working in collaboration with the foundry team to work on machine learning-related tasks and solutions to the real problems they were facing. They were facing a problem related to their PDK. We helped them find an analytical solution to their PDK problem. This is in the domain of action-enabled AI. I use a technique called symbolic regression, and a Python package called PISR. And, I've been working with AIML, like, for the past three years. First, with Ceramorphic, in the deep learning compiler. And now, with Samsung Semiconductor India Research, on voice processing in memory and other projects.

And then balance it out.

When integrating a cell phone model into production, which has been in at least model service. I'm not aware of the term "model service." With modules. So when I integrate things, the PyTorch module into production is a challenging task in itself. So we were working with Python's Lite and integrated, and we tried to run Python's Lite on ARM devices on mobile phones and stuff. So we used the Samsung S 24 Ultra to deploy Python Lite, and those were the problems we were facing - compatibility issues with the architecture and building pipelines, right? And with certain compilers and stuff. While building things with ARM devices, we generally use C++ or LLVM compilers. And the version mismatch can happen with Python. And those are the major issues, along with some issues in Linux that you can see. I'm not aware of model serving and stuff.

So we do this computation first. So once you have dimension reduction okay. So dimension reduction can be applied in various forms. One of the popular techniques is called PCA. The full form is component principal component analysis. So it is a linear algebra based technique where you find the importance of each of the features and either combine those features with given weights. So it's or if the importance of a feature is very low, you actually remove it. So that is one of the techniques, and that is the most popular technique we use because it does not automatically make a feature useless, but it can also combine features into one. That is one way. Another is to perform various feature engineering on the whole dataset and see if the correlation matrix and how much data is correlated with how much each feature is correlated with the output. And so there is a simple correlation formula depending on each random variable. And that can be organized into a matrix and given a number between minus 1 and 1. Minus 1 is inversely correlated. That is, if the output is increasing, then your feature would be decreasing, which is absolute negative correlation. And 1 is positive correlation, and 0 is no correlation. Some numbers around 0 are something we do not want in our dataset, and that's how we can reduce the dimension of the data as well.

I think we're just lost in dollars. So for class and moments, a class time balance can be addressed with certain, I'm not sure about this patient. No, yes.

When setting up a supervised learning pipeline, it is deep consumerizing, which aligns with GMS directions. While setting up a supervisor machine learning pipeline, there are a few considerations to keep in mind. Something like you always make sure that the split of the dataset is random enough to give a whole holistic view of the entire dataset, what we are working on. The second is, when working with feature engineering, you always normalize the data and then continue feature engineering. You always make sure that the output does not need to be normalized with the input. And you always select the right feature engineering tools and methods to go about feature engineering. One of them can be data visualization with different tools. So, selecting a couple of feature engines and plotting them against the output data can give you a sense of how each feature behaves along with the data. There can be a nonlinear relationship between output and the features, but there should be some correlation. The second would be looking into the correlation matrix and selecting the feature with high correlation.

Let's start at one end, it's been the issue and how often I get experience. So the optimizer here has been used as opt-in SCD. And then we have the in-file module. We have not specified what kind of optimized code yet.

In the 2nd system, it's in the 1st division. It requires and it's saying why it's not much. Much outside is accepted. When we need this and accept to generate a short. The division where she was handled is really the major concern when you're dividing between numbers. Right now, in the second print section, we are dividing 10 by 2. That is also a major issue because that is incorrect. And then we are dividing two numbers, not one number within a string. So we need to either convert it to a number or this is for an error in Python.

I have deployed a few modules on mobile USB or SD card with an ARM CPU license. Basically, on ARM CPU. To look into the latency issue, we generally benchmark the server or benchmark any application we are working on. We benchmark using some of the benchmarking tools, SimplePerks is a great benchmarking tool. You can directly launch it from ADB. And, what you can also get is all the cache information, all the memory information, and all the computation information as well. How you can decide whether to work on the computation part or there's a problem with memory management in the system, or the cache is under or overutilized, and mostly underutilized. So, in certain cases, you make changes according to the need.

Selecting a 3-day model. It should be trained on a very similar dataset that I've worked on. It should have the data input dataset of similar or same size that I want it to work on. The filter and the number of image layers we want should be tunable. So, the RGB layer, channel base should be the same as we want to make our testing image to be. And it should be tunable enough. The source code should be available enough that we can make enough changes to debug the code and fine-tune the model for our needs. And also, like, frame the gradient and stuff in between if required.

We want a CW client for $7.99, so that's $1008.