profile-pic

Abdul Rehman Khairdi

Seasoned engineering leader with over 12 years of experience driving innovation, scaling high-performing teams, and delivering complex technical solutions.
  • Role

    Head of Site Reliability Engineering

  • Years of Experience

    18.42 years

Skillsets

  • Prometheus
  • Incident Response
  • JavaScript
  • Jenkins
  • Lambda
  • MySQL
  • opentelemetry
  • PHP
  • PostgreSQL
  • Grafana
  • RDS
  • Signoz
  • SLI
  • SLO
  • Terraform
  • WAF
  • Wafv2
  • Workers
  • Datadog
  • Java
  • MongoDB
  • Python
  • Redis
  • Cloudflare
  • Kubernetes
  • Node.js
  • CloudWatch
  • AWS
  • Go
  • BOT Management
  • CDN
  • ClickHouse
  • Cloudfront
  • DDoS mitigation
  • EC2
  • Elasticsearch

Professional Summary

18.42Years
  • Apr, 2024 - Present2 yr

    Head of Site Reliability Engineering

    Sportskeeda
  • Oct, 2021 - Apr, 20242 yr 6 months

    Principal Engineer & Site Reliability Engineering Manager

    Sportskeeda
  • Jun, 2019 - Sep, 20212 yr 3 months

    Tech Lead

    Sportskeeda
  • Jan, 2015 - Dec, 20161 yr 11 months

    Senior Software Engineer

    Flatpi Tech
  • Nov, 2015 - Nov, 20161 yr

    Senior Software Engineer

  • Dec, 2016 - Apr, 20192 yr 4 months

    Senior Software Engineer

    YourDOST
  • Jun, 2012 - Nov, 20153 yr 5 months

    Tech Lead

    Eccentric
  • Jan, 2012 - Jan, 20153 yr

    Tech Lead

    Eccentric Engine

Work History

18.42Years

Head of Site Reliability Engineering

Sportskeeda
Apr, 2024 - Present2 yr

Principal Engineer & Site Reliability Engineering Manager

Sportskeeda
Oct, 2021 - Apr, 20242 yr 6 months
    Platform Scale: Infra supporting 40M+ pageviews/day, ~6.5K backend RPS, 300K+ concurrent users across teams. Migration Platform & Tooling: Built reusable migration platform enabling 6+ zero-downtime migrations (Redis, MongoDB, MySQL, Elasticsearch) across 4 teams. Created shadow dual-write gradual cutover pattern with automated consistency checks & rollback capabilities. Results: Migration incident rate 40%→5%, zero data loss across all migrations. Redis Enterprise: 50% cost reduction ($50K/year), sub-10ms p99 latency. MongoDB 3.0→4.4 upgrade saved $30K/year. Technical: Dual-write synchronization, percentage-based cutover, SLO-based gates, real-time monitoring. CDN Platform & Edge Infrastructure: Migrated CloudFront to Cloudflare, solved A/B test caching inefficiency (100 cache variants→2-3). Built Cloudflare Workers to normalize A/B segments at edge, adopted by frontend team for other use cases. Configured WAF rules and bot management protecting against DDoS attacks and malicious traffic. Results: 97% cache footprint reduction, 82%→97% hit rate, $40K/year savings. Cost optimization: Identified AWS egress cost increase, optimized cache strategy reducing 90% origin requests. Platform Caching Infrastructure: Designed multi-layer caching (CDN→Redis→App LRU→DB) adopted by 3 backend teams, 60% db load reduction. Built reusable golang-lru wrapper (10K items, 30s TTL) with monitoring dashboards and integration patterns. Performance: P99 latency 180ms→25ms during peak traffic, infrastructure costs down $30K/year. Developer Platform & Testing Infrastructure: Built production-like testing platform by extending A/B framework (cookie-based routing to isolated environments). Results: $55K/year savings vs traditional staging, 40% production bug reduction, parallel testing for 3 teams. Infrastructure Platform & Reliability: Observability: Migrated Datadog to self-hosted maintaining distributed tracing. Compute: AWS Graviton migration (40 instances, 20% cost reduction, $4.7K/year) with compatibility playbooks. Database: MySQL read replicas with analytics isolation; MongoDB 14 node scaling based on actual load. SRE: Established SLO framework (99.9%+ uptime), blameless postmortems, 30% MTTR reduction. Real-Time Platform (WebSocket Infrastructure): Built WebSocket infrastructure from scratch for Cricrocket App delivering fastest live cricket score updates. Designed scalable architecture: connection management, msg broadcasting, horizontal scaling across instances. Results: Supported 20K+ concurrent WebSocket connections in production, stress tested to 50K+ concurrent users.

Tech Lead

Sportskeeda
Jun, 2019 - Sep, 20212 yr 3 months
    Platform Evolution: Scaled infrastructure from 5K to 9K RPS supporting 200K+ concurrent users. Built distributed backend services (wallets, predictions, transactions) with strong consistency guarantees. Created production-replica environments reducing production incidents by 85%. Implemented connection pooling platform patterns for MySQL, MongoDB, Redis eliminating connection overhead. Optimized queries, caching to reduce tail latency without over-provisioning.

Senior Software Engineer

YourDOST
Dec, 2016 - Apr, 20192 yr 4 months
    Designed event-driven backend systems for expert-matching workflows with high correctness requirements. Built resilient payment integrations with retry logic, circuit breakers, and failure-handling patterns.

Senior Software Engineer

Nov, 2015 - Nov, 20161 yr

Senior Software Engineer

Flatpi Tech
Jan, 2015 - Dec, 20161 yr 11 months
    Backend API development and database optimization.

Tech Lead

Eccentric
Jun, 2012 - Nov, 20153 yr 5 months

Tech Lead

Eccentric Engine
Jan, 2012 - Jan, 20153 yr
    Led backend development team, built REST APIs and database systems.

Major Projects

2Projects

High-Traffic Sports Media Platform

    Designed horizontally scalable services with aggressive caching and CDN offload to sustain 300K+ concurrent users. Explicitly balanced latency, consistency, and cost under bursty traffic patterns.

Distributed Cache & Data Layer Redesign

    Re-architected data access using Redis Enterprise and replica clusters to address tail latency and failure propagation. Achieved ~20% latency reduction and improved operational predictability.

Education

  • B.Sc (IT)

    University of Mumbai (2012)