Table of Contents
Quick Answer
- Parameter: any learnable number in the model (weights + biases)
- Weight: the multiplicative coefficient in a layer (the most common parameter)
"70B parameters" counts every learnable value; weights dominate that count.
What Do These Terms Mean?
A neural network is a giant function with millions-to-trillions of adjustable numbers. Each one is a parameter. Most parameters are weights — multipliers on inputs. A smaller set are biases — additive shifts. Both are learned during training (Google AI Glossary; Stanford CS231n).
How They Differ in Math
For a single neuron:
output = activation(w1*x1 + w2*x2 + ... + wn*xn + b)
w1 ... wnare weightsbis a bias- All are parameters
In a 175B-parameter model, ~98% are weights, ~1-2% are biases, and a tiny fraction are layernorm scales and other learned scalars.
Examples
- Llama 3 70B: 70 billion parameters (overwhelmingly weights)
- GPT-3 175B: 175 billion parameters
- Tiny model: a single-layer perceptron with 10 weights + 1 bias = 11 parameters
- Embedding layer: one weight vector per token — 50,000 vocab * 4096 dim = 200M parameters
- Attention head: query, key, value, output matrices — millions of weights per head
Weights vs Parameters vs Hyperparameters
| Term | Learned? | Examples |
|---|---|---|
| Weight | Yes | Connection strengths |
| Bias | Yes | Per-neuron offsets |
| Parameter | Yes | Weights + biases + other learned scalars |
| Hyperparameter | No (set before training) | Learning rate, batch size, number of layers |
The big distinction: parameters change during training; hyperparameters do not.
When the Distinction Matters
- Model size marketing: "7B parameters" is the industry convention
- Memory math: a 7B model in fp16 = 7B * 2 bytes = 14 GB
- Fine-tuning: updating 100% of parameters = full FT; updating <1% = LoRA
- Safety: some research distinguishes weight-based vs activation-based interventions
Conclusion
Weights are the dominant type of parameter; in most sentences the two words are interchangeable. Distinguish parameters from hyperparameters to avoid confusion. More on Misar Blog.
