profile-pic

Uz Zama Khamar

As a Data Scientist I apply my skills in Python and data science to develop and improve solutions that are secure, scalable, and user-friendly. I graduated with a master's degree in Data and Knowledge Engineering from Otto-von-Guericke University Magdeburg in Dec 2021, where I wrote my thesis on a novel method to detect liver fibrosis using heartbeat as excitation mechanism. I have over 3 years of work experience in data science, data analytics, and research, working with various domains, such as telecommunications, environmental science, Automotive, Automation and Pharma. I am passionate about finding innovative ways to leverage data and knowledge to solve real-world problems and create value.

  • Role

    RAG Engineer

  • Years of Experience

    4 years

Skillsets

  • OpenAI
  • Databricks
  • DeepEval
  • Gephi
  • GitLab
  • Hmm
  • LangGraph
  • LSTM
  • Machine Learning
  • Mosaic AI
  • CSS
  • OpenAPI
  • OpenCV
  • Pydantic
  • PySpark
  • rag
  • Sagemaker
  • vector search
  • Yolov8
  • AWS
  • Git
  • HTML
  • LLM
  • Kubernetes - 1 Years
  • Oracle
  • JavaScript
  • Python - 5 Years
  • SQL - 1 Years
  • Design patterns
  • Java
  • Flask
  • Tableau
  • OCR
  • LangChain - 2 Years
  • Python
  • ChromaDB
  • CICD
  • crewAI

Professional Summary

4Years
  • Dec, 2024 - Present1 yr 3 months

    Machine Learning Engineer

    Amgen
  • Feb, 2022 - Nov, 20242 yr 9 months

    Data Scientist

    Fortuna Identity
  • Jun, 2017 - Dec, 20181 yr 6 months

    Software Development Engineer (SDE)

    CDK Global
  • Data Science Intern

    Conscript HR Advisors
  • Research Assistant

    Helmholtz Center for Environmental Research - UFZ
  • Data Analyst Intern

    Deutsche Telekom
  • Software Engineer Intern

    ReverEye Tech Labs

Applications & Tools Known

  • icon-tool

    Python

  • icon-tool

    Tableau

  • icon-tool

    Pandas

  • icon-tool

    Keras

  • icon-tool

    NLTK

  • icon-tool

    XGBoost

  • icon-tool

    Flask

  • icon-tool

    MLFlow

  • icon-tool

    ChatGPT

  • icon-tool

    Visual Studio

  • icon-tool

    Google Colab

  • icon-tool

    AWS Sagemaker

  • icon-tool

    MySQL

  • icon-tool

    MongoDB

  • icon-tool

    HTML5

  • icon-tool

    CSS3

  • icon-tool

    Tableau

  • icon-tool

    Pandas

Work History

4Years

Machine Learning Engineer

Amgen
Dec, 2024 - Present1 yr 3 months
    Designed and built an end-to-end Retrieval-Augmented Generation (RAG) pipeline, ingesting data from external APIs, performing chunking, transformation, and generating synthetic data to be indexed into the Databricks vector store. Developed scalable vector ingestion jobs using PySpark and implemented Databricks Mosaic AI Vector Search to enable enterprise-grade semantic search and retrieval. Created prompt-tuned instances of CustomGPT and integrated with Databricks vector store endpoints. Collaborated with OpenAI and Databricks support teams to troubleshoot and resolve enterprise integration challenges, including domain authorization mismatches, vector query latency, and performance optimization. Led the transition from traditional Databricks vector-store-based RAG to Model Context Protocol (MCP)-based RAG to address token limitations, improve accuracy and reduce hallucination. Architected MCP tools with Pydantic models and output parsers to guarantee consistency across all the MCP tools domain while mentoring other developers on design and implementation. Developed MCP TestBench, an internal GenAI evaluation tool developed using DeepEval to evaluate the application performance. Integrated MCP TestBench into GitLabs CICD pipeline to evaluate merge requests before merging any code into the application repository. Supported deployment teams during rollout, ensuring smooth releases, secure connectivity between components, and improved overall system reliability. Improved deployment stability by debugging MCP client-server communication issues, optimizing Kubernetes pod allocation, and hardening MCP client configurations against network interruptions. Implemented a multi agent POC called SQL Agent that can dynamically translate natural language query to SQL query, validate, execute and analyze the result using Langgraph. Implemented another multi agent POC called Auth Agent that classifies the query based on users access levels using CrewAI multi agent framework. Owned agile delivery activitiesSprint planning, backlog refinement, demos, code reviewsand collaborated cross-functionally to integrate domain expertise.

Data Scientist

Fortuna Identity
Feb, 2022 - Nov, 20242 yr 9 months
    Fine-tuned YOLOv8 Image Segmentation on UI elements dataset to detect UI elements on the screen for EZYBot. Worked on image processing algorithms using OpenCV and Google OCR, resulting in a surface automation tool named EZYBot that reduced automation script development time by 80% and eliminated coding completely. Collaborated with cross-functional teams to integrate EZYBot with various automation platforms, enhancing its versatility and adoption across different workflows. Developed an LLM Framework that can breakdown given tasks into subtasks and resolve them by making accurate API calls from given EzyBot OpenAPI specification using LangChain. Developed LLM functionality to automatically fill global parameters needed for making API calls while prompting the user if data is not present. Automatically makes requests to safe API calls while gets user confirmation for sensitive API calls. Added memory functionality to remember users' chat history across multiple sessions. Implemented output parsing to verify the correctness of the received responses. Integrated LangSmith for tracing LLM requests and performance analytics. Added a RAG (Retrieval Augmented Generation) based feature using Croma vector database to resolve queries of Bot Developers from multiple documentations, PDFs, and manuals. Created insightful visualizations of graph data using Gephi, aiding interpretation of complex clusters in the graphs. Executed projects on Hidden Markov Models (HMMs) and Long Short-Term Memory (LSTM) networks for sequence-dependent predictions. Deployed Machine Learning models as endpoints via Amazon SageMaker on AWS, created RESTful microservices using Python Flask.

Software Development Engineer (SDE)

CDK Global
Jun, 2017 - Dec, 20181 yr 6 months
    Collaborated closely with product management to conceive and craft Tableau dashboards tailored for executive-level insights. Collaborate with backend developers to create server-side logic using Java. Optimized data migration between dealerships and OEMs (General Motors), reducing downtime from 3 hours to 35 minutes. Worked closely with frontend developers to implement user interfaces using HTML, CSS, and JavaScript. Applied design patterns to improve code organization, reusability, and maintainability. Developed Java APIs and microservices to handle data requests and responses between the frontend and backend. Executed SQL queries on Oracle to manage and retrieve data efficiently. Used Git version control systems for tracking changes and collaboration.

Data Analyst Intern

Deutsche Telekom

Research Assistant

Helmholtz Center for Environmental Research - UFZ

Data Science Intern

Conscript HR Advisors

Software Engineer Intern

ReverEye Tech Labs

Major Projects

3Projects

Research Team Project - Large Scale Supervised Link prediction

    Predicted collaboration links between authors and authored a research paper based on findings and conclusions.

Data Science Seminar - Machine Learning in Games

    Compared traditional and deep learning algorithms for playing the game of Othello and authored a research paper based on findings and conclusions.

Master Thesis: Computer Vision at Inka

Apr, 2021 - Dec, 2021 8 months
    Led innovative research aimed at assessing the viability of utilizing heartbeat-induced stimuli for staging liver fibrosis disease. Implemented U-Net Encoder-Decoder model for Image Segmentation and OpenCV optical flow methods for image processing using Python. Validated the hypothesis and authored Masters Thesis.

Education

  • Masters in Data Science

    Otto-von-Guericke University (2021)
  • B. Tech. in Computer Science and Engineering

    Mahatma Gandhi Institute of Technology (2017)

Certifications

  • Data science: natural language processing (nlp) in python

  • Applied deep learning

  • Tableau for data science