Live in the browser.
Two demos of work coming out of ggml-org/llama.cpp and the UCSC CHPL lab. Both run on your machine — nothing leaves the browser.
webgpu-bench
Cross-browser benchmark for the llama.cpp WebGPU backend. Measures inference throughput on your GPU. Tracks 10 models across 194 quantization variants. Runs in Chrome and Safari with WebGPU enabled.
Embedded HuggingFace Space. Submit results from your browser via OAuth.
wllama
In-browser LLM inference via WebAssembly — original by @ngxson. This is a fork we maintain at the UCSC CHPL lab to demo GGUF models running directly in the browser as we keep landing changes upstream in ggml-org/llama.cpp.
Forked from ngxson/wllama. Embedded mode runs single-threaded WASM — open in a new tab for multi-threaded performance.