Labs

Live in the browser.

Two demos of work coming out of ggml-org/llama.cpp and the UCSC CHPL lab. Both run on your machine — nothing leaves the browser.

webgpu-bench

Cross-browser benchmark for the llama.cpp WebGPU backend. Measures inference throughput on your GPU. Tracks 10 models across 194 quantization variants. Runs in Chrome and Safari with WebGPU enabled.

Source on GitHub · Open in new tab

Embedded HuggingFace Space. Submit results from your browser via OAuth.

wllama

In-browser LLM inference via WebAssembly — original by @ngxson. This is a fork we maintain at the UCSC CHPL lab to demo GGUF models running directly in the browser as we keep landing changes upstream in ggml-org/llama.cpp.

Source on GitHub · Open in new tab

Forked from ngxson/wllama. Embedded mode runs single-threaded WASM — open in a new tab for multi-threaded performance.