Table of Contents
Quick Answer
Fine-tune open-source models (Llama 3.3, Qwen 2.5, Mistral Small) using LoRA on 100-10,000 examples for domain-specific tasks. Train on a rented A100 for $2-20; deploy via vLLM on your own GPU.
- Fine-tune only when prompting + RAG isn't enough
- 500-5000 well-curated examples beat 50k noisy ones
- LoRA is 10x cheaper than full fine-tuning with 95% of the quality
What You'll Need
- Hugging Face account
- GPU: rent from Runpod, Modal, or Lambda Labs ($1-3/hr for A100)
- Dataset: 500+ input/output pairs in JSONL
- Python environment with transformers, peft, trl
Steps
- Prepare dataset. Format as JSONL with messages arrays (ChatML).
{"messages":[{"role":"user","content":"..."},{"role":"assistant","content":"..."}]}
- Choose base model. Qwen 2.5 7B or Llama 3.3 8B — strong base, fits on one A100.
- Rent a GPU. Runpod template with axolotl or unsloth preinstalled.
- Configure training. unsloth gets 2x speed on consumer GPUs. Sample config:
model_name: unsloth/llama-3.3-8b-instruct
lora_r: 32
learning_rate: 2e-4
num_train_epochs: 3
- Train. python train.py — monitor loss in Weights & Biases.
- Evaluate. Hold out 10% of data. Measure with task-specific metrics.
- Merge LoRA weights. model.merge_and_unload().
- Deploy with vLLM. vllm serve ./merged-model --port 8000 — OpenAI-compatible endpoint.
Common Mistakes
- Tiny, noisy dataset. Curate ruthlessly.
- Too many epochs. 2-3 is standard; more causes overfitting.
- Wrong chat template. Must match the base model's template exactly.
- No eval set. You have no idea if it improved without one.
Top Tools
Tool
Purpose
Unsloth
Fast LoRA training
Axolotl
Configurable training framework
vLLM
Production inference
Runpod
Affordable GPU rental
Weights & Biases
Experiment tracking
FAQs
Should I fine-tune or use RAG? RAG first. Fine-tune when you need style, format, or domain knowledge that RAG can't inject.
How expensive is it? A 7B LoRA on 5000 examples: $10-30 of GPU time.
Can I fine-tune GPT-4? OpenAI offers fine-tuning for some models — expensive and locked in.
Does it help with hallucinations? Not directly. RAG helps with hallucinations; fine-tuning helps with tone and format.
How do I version models? Push to Hugging Face with semantic versioning and a model card.
Can I fine-tune on customer data? Only with explicit consent and contractual rights. Check GDPR/DPDP.
Conclusion
Fine-tuning in 2026 is accessible to any developer with $20 and a weekend. Use Unsloth, LoRA, and vLLM — never train from scratch. Misar Dev↗ includes a hosted fine-tuning workflow.