- Apr 19, 2026
Profiling vLLM on the GH200
What we learned benchmarking vLLM on the NVIDIA GH200.
Part 2 of 3 from my Master's thesis at ETH Zurich. - Apr 11, 2026
The GPU Memory Wall in LLM Serving
Why GPU memory is the bottleneck, and what the GH200 changes.
Part 1 of 3 from my Master's thesis at ETH Zurich. - Dec 26, 2025
Scaling LLM Training with DeepSpeed
Making the most of limited GPU memory with DeepSpeed.
- Oct 12, 2025
CUDA Programming - Optimizing GEMM
A work-in-progress attempt at writing a fast GEMM kernel.
- Sep 16, 2025
CUDA Programming - Fundamentals
The fundamentals of GPU programming, from first principles.
- Feb 10, 2025
Running DeepSeek R1 locally
Running inference for DeepSeek R1 with llama.cpp on a 4×A100 node.
- Jan 27, 2025
Biased Coin
A nice analytical solution I came up with for a quantitative trading interview question and a valuable life lesson for me.
- Jan 26, 2025
Ensemble Methods
A deep dive into ensemble methods, from decision trees to XGBoost.