Mufeez

Blog

AboutBlogNEWPhotosGitHub
Mar 11, 2026

Optimizing NVFP4 Grouped GEMM on Blackwell – Worklog

238 µs to 23.8 µs using tcgen05 MMA, TMA async loads, cluster multicast, etc.

ml sysperformance
19 min
Apr 1, 2025

Flushable SSTables in Pebble

~60% performance improvement when ingested SSTs overlap with the memtable

databasesperformance
7 min
Oct 6, 2024

Enforcing Robust Type Safety with ESLint

Surfacing ~250 type inconsistencies in Linear's type definitions

dev-ex
5 min
Aug 13, 2024

Shipping Linear Drafts

Making drafts a first-class entity throughout Linear

product
10 min
Jul 15, 2024

Concurrent Manual Compactions

Segmenting key space and parallelizing execution, 30% perf improvement

databasesperformance
7 min