Abhijit Ramesh

MS CS at UCSC, working on the WebGPU backend for ggml-org/llama.cpp. Before grad school I spent five years shipping medical AI — which is what convinced me the bottleneck sits a layer below the model.

Looking for Summer 2026 internships in ML performance / GPU systems — arames12@ucsc.edu.

I led ML at Theta Tech AI and TexNano: segmentation models for endoscopic ultrasound, training systems for an FDA-pathway cardiac device, and the production cloud inference services clinicians used at the point of care. Models shipped and are in clinical use. What running that infra taught me is how much money and engineering goes into serving modern models optimally — and how much of that pressure points toward the edge. Hospitals would rather not send patient data out at all, and the hardware they already own is often capable if someone writes device-specific kernels for it. That gap is where I wanted to work.

So I joined Tyler Sorensen’s Concurrency and Heterogeneous Programming Lab at UCSC, working with Reese Levine on the WebGPU backend for llama.cpp. Day to day: WGSL kernels, dispatch and memory limits across browsers, and the cross-browser benchmark I built to validate it all.