Table of Contents
Quick Answer
- Training: feeding data to update model weights (happens once, costs millions)
- Inference: running the trained model on new inputs (happens billions of times, costs pennies)
Both use GPUs but in very different patterns.
What Do These Terms Mean?
During training, gradient updates flow backward through the network, adjusting billions of parameters. During inference, a single forward pass converts input tokens to output tokens — no learning happens (Stanford HAI AI Index, 2024; NVIDIA developer docs).
How Each Works
Training
- Feed a batch of data (e.g., 1M tokens)
- Compute the loss between prediction and ground truth
- Backpropagate gradients
- Update weights with an optimizer (AdamW, Shampoo)
- Repeat billions of times
GPT-4-class training: ~25,000 GPUs for months, $100M+.
Inference
- Load pre-trained weights into GPU memory
- Receive user input tokens
- Forward pass through all layers
- Sample next token
- Repeat until stop token
Inference for one chat response: <1 second, $0.001-0.10.
Examples
- Training: Meta trains Llama 4 on 15T tokens over 3 months
- Inference: ChatGPT serves 300M weekly users — trillions of inferences
- Fine-tune training: a small update of 10K examples on your support data
- Edge inference: phone model summarizes a webpage offline
- Batch inference: overnight job classifies 10M documents
Training vs Inference Costs
| Aspect | Training | Inference |
|---|---|---|
| Frequency | Once (or periodic) | Every user request |
| Cost scale | Millions of dollars | Cents per call |
| Hardware | H100 / B200 clusters | Anything from phones to H100s |
| Duration | Weeks to months | Milliseconds to seconds |
| Memory pattern | Store gradients + weights + optimizer states | Weights + KV cache only |
At scale, total inference cost eventually exceeds training cost — ChatGPT spends more on inference than it did on training.
When Each Matters
- Builders of foundation models: training dominates
- App developers using APIs: only inference matters
- Enterprises fine-tuning: small training cost + ongoing inference
- Researchers: both
Conclusion
Training builds the brain; inference uses it. App builders rarely train — they focus on prompts, retrieval, and evaluation. More on Misar Blog.
