Table of Contents
Quick Answer
Fine-tune open-source models (Llama 3.3, Qwen 2.5, Mistral Small) using LoRA on 100-10,000 examples for domain-specific tasks. Train on a rented A100 for $2-20; deploy via vLLM on your own GPU.
- Fine-tune only when prompting + RAG isn't enough
- 500-5000 well-curated examples beat 50k noisy ones
- LoRA is 10x cheaper than full fine-tuning with 95% of the quality
What You'll Need
- Hugging Face account
- GPU: rent from Runpod, Modal, or Lambda Labs ($1-3/hr for A100)
- Dataset: 500+ input/output pairs in JSONL
- Python environment with
transformers,peft,trl
Steps
- Prepare dataset. Format as JSONL with
messagesarrays (ChatML).
{"messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."}]}
- Choose base model. Qwen 2.5 7B or Llama 3.3 8B — strong base, fits on one A100.
- Rent a GPU. Runpod template with
axolotlorunslothpreinstalled. - Configure training.
unslothgets 2x speed on consumer GPUs. Sample config:
model_name: unsloth/llama-3.3-8b-instruct
lora_r: 32
learning_rate: 2e-4
num_train_epochs: 3
- Train.
python train.py— monitor loss in Weights & Biases. - Evaluate. Hold out 10% of data. Measure with task-specific metrics.
- Merge LoRA weights.
model.merge_and_unload(). - Deploy with vLLM.
vllm serve ./merged-model --port 8000— OpenAI-compatible endpoint.
Common Mistakes
- Tiny, noisy dataset. Curate ruthlessly.
- Too many epochs. 2-3 is standard; more causes overfitting.
- Wrong chat template. Must match the base model's template exactly.
- No eval set. You have no idea if it improved without one.
Top Tools
| Tool | Purpose |
|---|---|
| Unsloth | Fast LoRA training |
| Axolotl | Configurable training framework |
| vLLM | Production inference |
| Runpod | Affordable GPU rental |
| Weights & Biases | Experiment tracking |
Conclusion
Fine-tuning in 2026 is accessible to any developer with $20 and a weekend. Use Unsloth, LoRA, and vLLM — never train from scratch. Misar Dev includes a hosted fine-tuning workflow.