Table of Contents
Quick Answer
- Parameter: learned during training (weights, biases)
- Hyperparameter: set before training, controls the learning process (learning rate, batch size)
Parameters update automatically; hyperparameters are tuned by humans.
What Do These Terms Mean?
Parameters are the internal values a model adjusts to fit the data. Hyperparameters govern how that adjustment happens — or the model's architecture itself (Stanford CS231n; Google AI Glossary, 2024).
If parameters are the words the model writes, hyperparameters are the rules of grammar the author sets first.
How They Differ
Parameters
- Initialized randomly
- Updated by the optimizer at every training step
- Count in millions to trillions
- Cannot be changed after training without retraining
Hyperparameters
- Chosen before training
- Fixed during a training run (usually)
- Dozens at most
- Can be changed by re-running training or via HPO tools
Examples
Parameters
- Layer weights
- Biases
- Embedding table entries
- Layer norm scales
Hyperparameters
- Learning rate (e.g., 3e-4)
- Batch size (e.g., 256)
- Number of layers (e.g., 32)
- Hidden dimension (e.g., 4096)
- Dropout rate (e.g., 0.1)
- Warmup steps
- Weight decay
- Optimizer choice (AdamW vs Lion)
Hyperparameter vs Parameter
| Aspect | Parameter | Hyperparameter |
|---|---|---|
| Set by | Training | Human / search |
| Count | Millions-trillions | Dozens |
| Updated during training | Yes | No (usually) |
| Stored in model file | Yes | Metadata only |
| Tuning method | Gradient descent | HPO (grid, random, Bayesian, Optuna) |
When Hyperparameters Matter Most
- Pre-training: wrong LR or batch size wastes months and millions of dollars
- Fine-tuning: poor hyperparameters cause overfitting or catastrophic forgetting
- Reproducing papers: matching hyperparameters matters as much as the architecture
- Inference: some inference-time knobs (temperature, top_p) are also called hyperparameters
Conclusion
Parameters are the "what" the model learned; hyperparameters are "how" it learned. Tune the right one and save months of frustration. More on Misar Blog.
