profile-pic

Vijay Tadikamalla

Vijay Tadikamalla

Data Scientist with expertise in junk URL detection, machine learning, and big data, aiming to enhance user experiences by leveraging deep analytical techniques and sophisticated algorithms.
  • Role

    Data Scientist II

  • Years of Experience

    3.5 years

  • Professional Portfolio

    View here

Skillsets

  • Python - 5 Years
  • SQL
  • Bash
  • C++
  • LaTeX
  • Haskell
  • Programming
  • Programming
  • Tools and frameworks
  • Cloud - 3 Years
  • NLP - 3 Years
  • LLM - 2 Years
  • APIS - 3 Years
  • Backend Development - 1 Years
  • AI/ML - 5 Years

Professional Summary

3.5Years
  • Mar, 2024 - Present1 yr 1 month

    Data Scientist 2

    Microsoft
  • Aug, 2021 - Feb, 20242 yr 6 months

    Data Scientist

    Microsoft
  • Mar, 2021 - Aug, 2021 5 months

    Mentor

    TensorFlow (Google Summer of Code)
  • May, 2018 - Jun, 2018 1 month

    Content Developer

    Easy Prepare
  • May, 2019 - Sep, 2019 4 months

    Software Engineer

    Haskell (Google Summer of Code)
  • May, 2020 - Jul, 2020 2 months

    Data Scientist Intern

    Microsoft

Applications & Tools Known

  • icon-tool

    Apache Spark

  • icon-tool

    Kafka

  • icon-tool

    PyTorch

  • icon-tool

    Scikit-learn

  • icon-tool

    Git

  • icon-tool

    Selenium

  • icon-tool

    Microsoft Azure

  • icon-tool

    Kafka

Work History

3.5Years

Data Scientist 2

Microsoft
Mar, 2024 - Present1 yr 1 month
    Enhancing user experience in Microsoft Bing by removing Junk (Dead & low-quality) URLs from search results. Fine-tuned and applied knowledge distillation on a BERT-based model, achieving 99% precision and 86% AUPRC for Junk URL detection. Integrated unsupervised clustering with a high-precision Junk classifier, building a scalable platform detecting over 12 billion Junk URLs. Automated URL categorization with a GPT-based LLM, cutting human labeling costs by $100K annually.

Data Scientist

Microsoft
Aug, 2021 - Feb, 20242 yr 6 months
    Engineered robust Big Data pipelines using Spark streaming APIs and Kafka, enabling low-latency blocking of junk pages. Revamped legacy Junk pipelines using Azure tools (Logic Apps, Functions and ARM), significantly reducing on-call workload. Achieved a 75% reduction in false positives in Junk detection techniques through in-depth analysis of user behavior on Edge browser.

Mentor

TensorFlow (Google Summer of Code)
Mar, 2021 - Aug, 2021 5 months
    Spearheaded a collaborative effort with fellow-mentors from Google Brain to enhance the open-source TensorFlow Datasets library. Mentored a student to utilize contributions from TensorFlow & Hugging Face communities, effectively doubling the readily accessible datasets.

Data Scientist Intern

Microsoft
May, 2020 - Jul, 2020 2 months
    Pioneered the enhancement of scanned PDF accessibility by adding an Optical Character Recognition (OCR) feature in the Edge PDF reader. Enabled users to select and search text in images of PDF files, enhancing the PDF reader's capability. Created a network communication workflow for making Azure Cognitive Services API calls via the browser network layer.

Software Engineer

Haskell (Google Summer of Code)
May, 2019 - Sep, 2019 4 months
    Developed an open-source HsYAML library for serializing and deserializing YAML documents in Haskell. Extended the data model to allow round-trips while preserving comments, anchors, etc. Achieved 99% accuracy on YAML-Test-Suite, establishing HsYAML as the best YAML processor.

Content Developer

Easy Prepare
May, 2018 - Jun, 2018 1 month
    Crafted educational materials to aid students preparing for JEE Main and Advanced exams.

Achievements

  • Operational Excellence Award from Microsoft Bing Leadership
  • Team Recognition Award from Microsoft Bing Leadership
  • Google Research AI Summer School participant
  • Top 20 out of 6000 teams in Flipkart GRiD All India ML Challenge
  • Operational Excellence Award
  • Team Recognition Award
  • Teaching Assistant for Algorithms, Programming Principles and AI courses
  • Core Member of Machine Learning and Software Development clubs
  • All India Rank 709 out of 1 million applicants in JEE Advanced Examination

Major Projects

2Projects

Enhancement of Microsoft Bing

    Participated in the enhancement of Microsoft Bing by developing clustering algorithms and engineered high performance Big Data pipelines with Spark and Kafka for bulk junk page detection.

OCR Feature in Edge PDF Reader

    Developed an Optical Character Recognition feature for the Edge PDF reader, allowing users to select and search text in scanned PDF documents.

Education

  • B.Tech. in Computer Science and Engineering

    Indian Institute of Technology Hyderabad (2021)