Probe 21: WebNN vs WebGPU — Interactive Chat

Multi-turn chat with the same model on different hardware. Toggle between Neural Engine (WebNN) and GPU (WebGPU).

Model

KV cache (context) 512 tokens · 96MB

Context cap is model-limited.

Max tokens (reply) 500 tokens

Reply cap is based on free context.

🟢 WebNN (Neural Engine)

🟣 WebGPU (GPU)

int8 · q4 · GPU transpose ✓ · single pass ✓

Browser/API availability

Checking…

Setup help

Pipeline diagram

Backend

—

Decode

—

Tokens

Prefill

—

Load a model and start chatting.

Context

0 / 256

Context nearly full

System0

History0

Current0

Free256