Table of Contents
Quick Answer
- Parameter: learned during training (weights, biases)
- Hyperparameter: set before training, controls the learning process (learning rate, batch size)
Parameters update automatically; hyperparameters are tuned by humans.
What Do These Terms Mean?
Parameters are the internal values a model adjusts to fit the data. Hyperparameters govern how that adjustment happens — or the model's architecture itself (Stanford CS231n; Google AI Glossary, 2024).
If parameters are the words the model writes, hyperparameters are the rules of grammar the author sets first.
How They Differ
Parameters
- Initialized randomly
- Updated by the optimizer at every training step
- Count in millions to trillions
- Cannot be changed after training without retraining
Hyperparameters
- Chosen before training
- Fixed during a training run (usually)
- Dozens at most
- Can be changed by re-running training or via HPO tools
Examples
Parameters
- Layer weights
- Biases
- Embedding table entries
- Layer norm scales
Hyperparameters
- Learning rate (e.g., 3e-4)
- Batch size (e.g., 256)
- Number of layers (e.g., 32)
- Hidden dimension (e.g., 4096)
- Dropout rate (e.g., 0.1)
- Warmup steps
- Weight decay
- Optimizer choice (AdamW vs Lion)
Hyperparameter vs Parameter
Aspect
Parameter
Hyperparameter
Set by
Training
Human / search
Count
Millions-trillions
Dozens
Updated during training
Yes
No (usually)
Stored in model file
Yes
Metadata only
Tuning method
Gradient descent
HPO (grid, random, Bayesian, Optuna)
When Hyperparameters Matter Most
- Pre-training: wrong LR or batch size wastes months and millions of dollars
- Fine-tuning: poor hyperparameters cause overfitting or catastrophic forgetting
- Reproducing papers: matching hyperparameters matters as much as the architecture
- Inference: some inference-time knobs (temperature, top_p) are also called hyperparameters
FAQs
Is temperature a hyperparameter? Yes — an inference-time hyperparameter.
Are hyperparameters learnable? Usually no. Some research explores "learned hyperparameters" (meta-learning).
What is HPO? Hyperparameter optimization — automated search over hyperparameter combos.
Which hyperparameter matters most? For LLMs: learning rate schedule, batch size, data mix.
Can I transfer hyperparameters across model sizes? Not directly — scaling laws give rough rules, but each size needs tuning.
Does architecture count as a hyperparameter? Yes, technically — number of layers, heads, dimension are hyperparameters.
How do I track hyperparameters? Tools like MLflow, Weights and Biases, or plain YAML configs.
Conclusion
Parameters are the "what" the model learned; hyperparameters are "how" it learned. Tune the right one and save months of frustration. More on Misar Blog↗.