Notes

Mostly about making things faster.

Apr 19, 2026

Profiling vLLM on the GH200

What we learned benchmarking vLLM on the NVIDIA GH200.
Part 2 of 3 from my Master's thesis at ETH Zurich.
Apr 11, 2026

The GPU Memory Wall in LLM Serving

Why GPU memory is the bottleneck, and what the GH200 changes.
Part 1 of 3 from my Master's thesis at ETH Zurich.
Dec 26, 2025

Scaling LLM Training with DeepSpeed

Making the most of limited GPU memory with DeepSpeed.
Oct 12, 2025

CUDA Programming - Optimizing GEMM

A work-in-progress attempt at writing a fast GEMM kernel.
Sep 16, 2025

CUDA Programming - Fundamentals

The fundamentals of GPU programming, from first principles.
Feb 10, 2025

Running DeepSeek R1 locally

Running inference for DeepSeek R1 with llama.cpp on a 4×A100 node.
Jan 27, 2025

Biased Coin

A nice analytical solution I came up with for a quantitative trading interview question and a valuable life lesson for me.
Jan 26, 2025

Ensemble Methods

A deep dive into ensemble methods, from decision trees to XGBoost.