profile-pic

Saket Kumar

My passion for developing solutions stemsfrom my dedication to optimizing userexperiences and enhancing . Through my work on diverseprojects, such as creating a central LLMinteraction service with load balancing andbuilding high-performance backend systemsfor various applications, I aim to push theboundaries of innovation.
  • Role

    Lead AI Engineer

  • Years of Experience

    7.8 years

Skillsets

  • LlamaIndex
  • Arcface
  • Adaface
  • Federated memory
  • Structural drift mitigation
  • Self-healing ai
  • Websockets
  • Multi-Agent Systems
  • Model context protocol
  • ARM Neon
  • LangGraph
  • Langfuse
  • GraphRAG
  • Go
  • Federated search
  • Episodic memory
  • Debezium
  • Speaker diarization
  • Scoring pipelines
  • LLM evaluation
  • Structured inference
  • Yolov8
  • Xnnpack
  • Whisperx
  • vLLM
  • TFLite
  • Memori
  • Silero vad
  • Retinaface
  • Pyannote.audio
  • Prompt Engineering
  • Openai whisper
  • CUDA
  • Biometric Security
  • Redis
  • Distributed Systems
  • Data Modeling
  • Data Modeling
  • Celery
  • Celery
  • API Design
  • API Design
  • SQL
  • Docker
  • Python
  • Postgres
  • OpenAI
  • Message Queues
  • LangChain
  • Firebase
  • FastAPI
  • Django
  • Grafana
  • Ollama
  • nginx
  • Qdrant
  • Ollama
  • Prometheus
  • Qdrant
  • Retrieval-Augmented Generation
  • serverless computing
  • CI/CD
  • Cloud task queues
  • GCP
  • LLM safety
  • Mem0
  • Policy-driven moderation

Professional Summary

7.8Years
  • Apr, 2025 - Present1 yr 1 month

    Lead AI Engineer

    Opkey
  • Jul, 2022 - Apr, 20252 yr 9 months

    Lead SDE

    proshort.ai
  • Apr, 2020 - Jul, 20222 yr 3 months

    Senior Software Developer

    NSEIT
  • Aug, 2018 - Apr, 20201 yr 8 months

    Software Development Engineer

    eClerx

Work History

7.8Years

Lead AI Engineer

Opkey
Apr, 2025 - Present1 yr 1 month
    Reduced manual effort by 75%, architected a distributed real-time ERP workflow automation platform with driven architecture; designed scalable WebSocket/REST APIs with FastAPI, Apache Kafka & Redis; Langfuse for observability, and reimplemented the system in Go utilizing channels, Goroutines, achieving 5x latency improvement. Designed and implemented Agent Context Management System, for session-scoped agent memory that groups turns into episodes, auto-tags, and performs token-budgeted recall to sustain coherent 60+ turn conversations while maintaining >85% recall of critical facts and cutting redundant context tokens by 40%; pluggable storage backends, and provider-agnostic embedding & reflection pipelines. Architected an autonomous multi-schema extraction engine using LangGraph and Pydantic, implementing a healing detect-extract-recover loop that dynamically re-calibrates extraction logic upon structural drift, ensuring 100% data integrity across unmapped, heterogeneous datasets. Architected an AI Data Layer by standardizing isolated Qdrant and Elasticsearch deployments into a Model Context Protocol (MCP) infrastructure, enabling a federated search capability that allows multi-agent systems to securely discover and retrieve cross-project patterns without compromising client-data boundaries. Designed & implemented scalable, high-performance near-real-time Search Engine with Elasticsearch and LLM, enabling natural language retrieval across over 100k records. Delivered robust API design (FastAPI) and top-k most frequent results, all with sub-second response times & easy extensibility. Developed a centralized RAG system using FastAPI, MySQL, Qdrant, and Nomic embeddings, implementing Strategy/Factory patterns for modular architecture, with Docker containerization and Kubernetes deployment, abling scalable semantic search and easy addition of new models/parsers. Architected policy-driven prompt guardrailing framework for multi-agent systems, combining regex, heuristic, and LLM-based detectors with per-agent config to enforce input/output compliance.

Lead SDE

proshort.ai
Jul, 2022 - Apr, 20252 yr 9 months
    Architected central LLM interaction service supporting multiple language models with intelligent load balancing, ensuring 99.9% service availability. Pioneered specialized prompt repository enabling dynamic prompt selection across services, improving code maintenance efficiency by 4x. Engineered shorts-creation and video-summarizing module, accelerating production time by 5x while achieving industry-leading 60% publishable rate. Co-developed text-to-video engine generating multilingual videos at scale from raw text, serving 100k+ unique users and enterprise clients. Built proshorts web & mobile application from ground up, with 10k+ DAU with Firebase-powered authentication. Orchestrated end-to-end Payment Module using Stripe, managing complex subscription lifecycles and transaction flows involving 10k+ users & 100s of transactions per week. Unified Payment Module across diverse platforms (proshort+, L&D platform, video consumption platform) through centralized user identity management. Delivered data-intensive CRM integration module syncing millions of records (deals, contacts, companies, emails) via event-driven architecture, using schedulers, webhooks, queues, Pub/Sub, and strategic caching. Implemented CDC-based real-time search updates using Debezium (Postgres WAL), Pub/Sub, and Elasticsearch, enabling near-instant search without impacting primary DB performance.

Senior Software Developer

NSEIT
Apr, 2020 - Jul, 20222 yr 3 months
    AI Proctoring for remote examinations with OpenCV/Deep Learning. Worked on Anomaly Detection on NSEs financial data with DBSCAN and Greenplum in order-trade data. Text Extraction for eKYC on PDF/image with success on over 75% docs with AXIS bank as client. Led development of two websites on Django as Team Lead for naviinsurance.com directly consulting with client, exceeding their expectations on both yearly appraisals.

Software Development Engineer

eClerx
Aug, 2018 - Apr, 20201 yr 8 months
    Sales Forecast and manpower-requirement prediction with around 80% accuracy using RNN/ARIMA. Developed/Enhanced Flow based chatbots on RASA and Chatterbot, deployed with Flask using BERT.

Major Projects

2Projects

gleanr (PyPI)

    Session-scoped agent memory SDK with three-tier hierarchy (turns, episodes, facts), deduplication, consolidation with supersede pointers, and provider-agnostic storage (SQLite, pgvector, Chroma), versioned v0.2–v0.5.

NLP-powered wine ratings and recommendations for columbiawineco.com

    Built NLP-powered wine ratings, food recommendations, and review summarization using Python and NLTK, increasing engagement by 40%+ and CTR by 25%.

Education

  • M.Tech., Computer Engineering

    NIT, Kurukshetra (2018)
  • B.Tech., Computer Science and Engineering

    BCET, Durgapur (2014)