From scratch to 97%
Desktop recommended
This chapter contains interactive visualizations and code explorers that work best on a larger screen.
We have covered the individual parts: the neuron, the matrix, the layers, and the "blame game" of backpropagation. Now, we are going to put them all together and build something real.
You are about to build a neural network using WebGPU compute shaders, the same technology that powers modern AI training.
How this works
Below you will find a series of code cells, similar to a Jupyter notebook. Each cell builds on the previous one:
- Run cells in order - Click the Run button on each cell, starting from the first
- Watch the output - Each cell shows what happened: tensor values, computation results
- Explore the shaders - Click "GPU" tabs to see the WGSL shader code that would run on the graphics card
- Train the network - The final cell lets you train on 300 handwritten digits and watch the loss converge
By the end, you will have trained a neural network to recognize handwritten digits.
Interactive code explorer β Browse the complete neural network implementation below. Use β + click to jump to function definitions, or β₯ + click to find all references.
Use the Next button to step through the execution trace and see how data flows through the network during training.
// ββββββββββββ// GPU TENSORS: Data lives in Video RAM (VRAM), not RAM// ββββββββββββ//// Why separate memory? GPUs are optimized for massive// parallelism, not for talking to the CPU. The data pathway// between CPU and GPU (PCIe bus) is relatively slow.// So we want to:// 1. Copy data to GPU once// 2. Do ALL our compute work there// 3. Only copy results back when absolutely necessary// ββββββββββββ// CREATE A TENSOR FROM A JAVASCRIPT ARRAY// This is where the CPU β GPU copy happens// ββββββββββββfunction createTensor(device, data, shape) {// How much GPU memory do we need?// Each number is a 32-bit float = 4 bytesconst byteSize = data.length * 4;// ββββββββββββ// ALLOCATE GPU MEMORY// ββββββββββββ// This is like malloc() but on the GPU. We specify://// size: How many bytes to allocate//// usage: What operations we'll perform on this buffer// - STORAGE: Can be read/written by compute shaders// - COPY_SRC: Can copy data FROM this buffer (GPUβCPU)// - COPY_DST: Can copy data TO this buffer (CPUβGPU)//// mappedAtCreation: Start with buffer "mapped" to CPU// The driver sets up a memory region both can access.// While mapped, we can write to it like a regular array.// The driver handles the actual transfer when we unmap.const buffer = device.createBuffer({size: byteSize,usage: GPUBufferUsage.STORAGE |GPUBufferUsage.COPY_SRC |GPUBufferUsage.COPY_DST,mappedAtCreation: true,});// ββββββββββββ// WRITE DATA TO GPU MEMORY// ββββββββββββ// getMappedRange() returns an ArrayBuffer backed by GPU// memory. We wrap it in Float32Array to write 32-bit floats.// This is a direct memory write - straight to VRAM!const gpuMemory = buffer.getMappedRange();const gpuArray = new Float32Array(gpuMemory);gpuArray.set(data);// ββββββββββββ// UNMAP THE BUFFER// ββββββββββββ// CRITICAL: We MUST unmap before the GPU can use this!// While mapped, the buffer is "owned" by the CPU.// Unmapping transfers ownership back to the GPU.// After this, gpuArray becomes invalid.buffer.unmap();return { device, buffer, shape };}// ββββββββββββ// READ TENSOR DATA BACK TO CPU// This is the reverse: GPU β CPU copy// ββββββββββββasync function readTensor(tensor) {const { device, buffer, shape } = tensor;const size = shape.reduce((a, b) => a * b, 1);// ββββββββββββ// CREATE A STAGING BUFFER// ββββββββββββ// We can't map the original buffer directly (it's busy).// Instead, we create a temporary "staging" buffer with// MAP_READ usage, copy our data into it, then map THAT.const stagingBuffer = device.createBuffer({size: size * 4,usage: GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST,});// Copy GPU β staging bufferconst cmd = device.createCommandEncoder();cmd.copyBufferToBuffer(buffer, 0, stagingBuffer, 0, size * 4);device.queue.submit([cmd.finish()]);// Map and readawait stagingBuffer.mapAsync(GPUMapMode.READ);const copyArray = new Float32Array(stagingBuffer.getMappedRange());const result = new Float32Array(copyArray);stagingBuffer.unmap();return result;}// ββββββββββββ// TEST: Create a 2Γ3 tensor// ββββββββββββconst data = new Float32Array([1, 2, 3, 4, 5, 6]);const tensor = createTensor(device, data, [2, 3]);// The data is now on the GPU!//// IMPORTANT: The GPU buffer is just a flat array of numbers.// The "shape" is metadata WE keep track of - it tells us// how to interpret the flat data.//// shape [2, 3] means "read this as 2 rows, 3 columns":// Buffer: [1, 2, 3, 4, 5, 6]// βββrow0ββββββrow1βββ// Row 0: [1, 2, 3]// Row 1: [4, 5, 6]//// To find element [row, col] in a matrix with C columns:// index = row * C + col// Example: element [1, 2] = buffer[1 * 3 + 2] = buffer[5] = 6console.log("Created tensor with shape:", tensor.shape);
Neural network training
Step through gradient descent