
I build machine learning systems that scale; from robotics production floors to large-scale generative AI platforms.
My work sits at the intersection of modeling, infrastructure, and reliability, with a focus on systems that don’t just train models, but continuously improve them in real-world environments.
Currently at Suki:
- Re-architected training infrastructure to achieve 97% faster cycle times (2 hrs → 3.5 min) while reducing idle storage costs by 85%.
- Led GPT-4 → Gemini 2.5 Pro migration analysis, modeling token consumption and GSU capacity to ensure scalable throughput.
• Developed distributed RL training pipelines and custom reward functions for automated clinical-note evaluation.
Previously at Dexterity:
- Designed a production Prediction Monitoring system for instance segmentation models that improved mAP by 12% and reduced false negatives by 5–7% through targeted retraining workflows.
- Standardized debugging frameworks for 24/7 robotics fleets, reducing average remote recovery time to <15 minutes.
Technically, I work across distributed training (DeepSpeed, Accelerate, Unsloth), RL-based evaluation design, and cloud-native ML systems on GCP and AWS. I’m particularly interested in ML systems engineering, large-scale model training, and infrastructure that balances performance, cost, and reliability.
ML Ops Engineer
Volt AIRobotics Engineer II
Dexterity Inc.Robotics Engineer, Software
Dexterity Inc.Robot Deployment Engineer (Intern)
Dexterity Inc.Robot Systems Engineering Associate
Dexterity Inc.
ECS
.png)
Docker