profile-pic

Himanshu Dahiya

Experienced Cloud Technical Lead with proven record in building and optimizing cloud infrastructure for India's Largest EV Charging Network. Skilled in leading teams, optimizing AWS infrastructure, and implementing robust monitoring and security measures. Proficient in developing scalable microservices and architecting efficient inter-service communication. Dedicated to driving success in cloud architecture and technology initiatives.
  • Role

    Site Reliability Engineer | MEAN, MEVN stack developer

  • Years of Experience

    6.8 years

Skillsets

  • Node.js
  • HAProxy
  • Helm
  • IAM
  • Java
  • Kafka
  • Kustomize
  • MongoDB
  • MySQL
  • New Relic
  • Go
  • PostgreSQL
  • Prometheus
  • Redis
  • S3
  • Sentry
  • Terraform
  • VPC
  • Wireguard
  • Bitbucket
  • Kubernetes
  • Docker
  • Grafana
  • TypeScript
  • Aerospike
  • Alertmanager
  • Ansible
  • ArgoCD
  • AWS
  • Python
  • Cilium
  • Cloudflare
  • CloudWatch
  • Druid
  • EC2
  • EKS
  • Elasticsearch
  • GitOps

Professional Summary

6.8Years
  • Jul, 2024 - Present1 yr 8 months

    Site Reliability Engineer

    Cool.Co
  • Apr, 2021 - Jun, 20243 yr 2 months

    Technical Lead

    Bolt.Earth
  • Apr, 2020 - Mar, 2021 11 months

    Software Development Engineer II

    Bolt.Earth
  • Jun, 2019 - Mar, 2020 9 months

    Software Development Engineer I

    Bolt.Earth

Applications & Tools Known

  • icon-tool

    EKS

  • icon-tool

    EC2

  • icon-tool

    Cloud9

  • icon-tool

    VPC

  • icon-tool

    S3

  • icon-tool

    Lambda

  • icon-tool

    Route53

  • icon-tool

    Firebase

  • icon-tool

    Kubernetes

  • icon-tool

    Docker

  • icon-tool

    Helm

  • icon-tool

    Cloudflare

  • icon-tool

    Nginx

  • icon-tool

    Redis

  • icon-tool

    MongoDB

  • icon-tool

    ElasticSearch

  • icon-tool

    Rancher

  • icon-tool

    ArgoCD

  • icon-tool

    Prometheus

  • icon-tool

    Grafana

  • icon-tool

    Sentry

  • icon-tool

    NewRelic

Work History

6.8Years

Site Reliability Engineer

Cool.Co
Jul, 2024 - Present1 yr 8 months
    Optimized Linux kernel and OS-level parameters to improve resource utilization and increase application performance by 25% under production traffic. Implemented a K3s cluster across 50+ dedicated Leaseweb servers; improved compute utilization by 30%. Developed observability stack (Prometheus, Grafana, Alertmanager) integrated with Slack, PagerDuty, and Email; defined SLIs/SLOs for latency (<200ms p95), availability (99.9%), and error budgets. Reduced mean time to detection (MTTD) by 40% and mean time to recovery (MTTR) by 35% by implementing alerting tied to SLIs/SLOs and standardized runbooks. Designed high-availability PostgreSQL cluster with ZFS-backed storage, streaming replication, and pg auto failover; ensured zero data loss and recovery within <60s during failover drills. Optimized Kafka brokers sustaining throughput of 500MB/s; fine-tuned partitioning, replication factor, and JVM GC settings to keep produce and consume latencies under 30ms. Tuned Apache Druid ingestion pipeline to handle 500K events/sec with consistent query latency <300ms. Automated rolling updates for PostgreSQL, HAProxy, and K3s clusters via Ansible, reducing operator toil by 70%.

Technical Lead

Bolt.Earth
Apr, 2021 - Jun, 20243 yr 2 months
    Designed monitoring framework with Prometheus, Grafana, Rancher, and Sentry, reducing customer-facing issues by 30%. Integrated alerting capabilities across Slack, Microsoft Teams, and Email channels, ensuring swift incident response and sustained 99.9% SLA for microservices. Reinforced system security via Cloudflare, mitigating traffic anomalies, rate limiting, and defending against bot & DDoS attacks, thwarted 9 DDoS incidents. Optimized AWS infrastructure, implementing database cold storage, EKS cluster sizing, and auto-scaling, reducing costs by 30%. Led migration from EC2 to EKS cluster, enabling seamless deployment and management of MongoDB, Apache Kafka, Redis, and 30+ microservices resulting in 30% infrastructure improvement. Engineered a microservice solution to manage 2 million TCP connections and IoT connections for the BOLT.EARTH product line, streamlining operational processes and enhancing scalability.

Software Development Engineer II

Bolt.Earth
Apr, 2020 - Mar, 2021 11 months
    Revamped the architecture from monolithic to microservices using MERN stack to elevate scalability within the infrastructure by 75%. Strategically integrated Redis caching at authentication and authorization layers, resulting in a remarkable 6x reduction in HTTPS request call latency, thus optimizing system responsiveness.

Software Development Engineer I

Bolt.Earth
Jun, 2019 - Mar, 2020 9 months
    Propelled the development of diverse client Software as a Service (SaaS) solutions specializing in Electric Vehicle (EV) inventory, sales, post-sales management, and EV fleet monitoring, deepening domain expertise. Initiated and led development efforts for company's portfolio of SaaS products from proof of concept to implementation.

Achievements

  • Built the entire cloud infrastructure from scratch for India's Largest EV Charging Network
  • Developed a microservice capable of scaling to accommodate over a million TCP connections
  • Led initiatives to optimize AWS infrastructure resulting in a 30% cost reduction
  • Achieved accelerated deployment timelines with average deployment time per service of minutes and rollback time of less than a minute
  • Proactively identified and addressed security threats including DDoS attacks, bot-driven network bombardment, and client-side vulnerabilities such as credential leakage

Major Projects

2Projects

Near Real-Time Ingestion Pipelines (Kafka Druid)

    Implemented near real-time ingestion pipelines using Kafka and Druid capable of handling 300K events/sec with minimal downtime, establishing clear SLIs and SLOs for optimal performance.

Automated Infrastructure Provisioning

    Automated infrastructure provisioning using Terraform and Ansible, achieving a 25% faster deployment time and reducing manual errors, leading to a 15% reduction in rollback rates.

Education

  • B.Tech Computer Science

    Indian Institute of Technology, Ropar (2019)