How to Fine-Tune an AI Model on Custom Data in 2026 (Step-by-Step Guide)

Table of Contents

Updated August 25, 2025

Quick Answer

For most use cases, RAG beats fine-tuning. But when you need style/format/domain-language matching, fine-tune an open model (Llama 3.1 8B, Mistral, Qwen 2.5) with LoRA using Unsloth, Together.ai, or Modal. Budget: $5-50 for a single run.

Dataset size: 500-10,000 examples minimum
Cost per run: $5-50 (LoRA) or $200+ (full)
Time: 2-12 hours

What You'll Need

500+ high-quality input/output pairs (JSONL)
GPU access (Colab free, Modal, RunPod, or Together)
Python + PyTorch basics (AI assists)
Evaluation set (100+ held-out examples)

Steps

Decide: RAG or fine-tune? If knowledge changes often → RAG. If style/format/tone matters → fine-tune. If both → hybrid.
Build dataset. Format as JSONL with {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. Quality > quantity. 500 great > 5000 okay.
Pick base model. Llama 3.1 8B for general use, Qwen 2.5 7B for multilingual, Phi-3 for tiny/edge. Ask AI: "Which open model is best for my task: [describe]?"
Fine-tune with Unsloth (easiest & fastest). Notebook template handles LoRA config. Set rank 16-32, alpha 16-32, learning rate 2e-4, epochs 1-3.
Run training. On Colab free T4: ~2-4 hours for 1K examples, Llama 3.1 8B. On Modal A100: 30 min, costs ~$2.
Evaluate. Hold-out set. Compare fine-tuned vs base on rubric: correctness, format match, style. If fine-tuned loses on 3+ categories, dataset issue.
Deploy. Merge LoRA adapter into base, convert to GGUF with llama.cpp, serve via vLLM or Ollama on a VPS.
Iterate. Log production failures, add them to training set, re-tune monthly.

Common Mistakes

Tiny dataset: <200 examples won't budge the model. Overfit instead.
Mixed formats: Consistent JSONL structure across all examples.
No eval set: You can't claim improvement without measuring.
Tuning for knowledge: Models forget. Use RAG for facts.
Over-tuning: >3 epochs on small data = catastrophic forgetting.

Top Tools

Tool

Best For

Price

Unsloth

Fast LoRA tuning

Free

Together.ai

Hosted fine-tuning

$0.80/M tokens

Modal

Serverless GPU

Pay per sec

Ollama

Local inference

Free

vLLM

Fast serving

Free

FAQs

Q: How many examples do I need?

500 minimum for visible effect; 2-5K for solid results; 10K+ for hard domains.

Q: LoRA vs full fine-tuning?

LoRA for 95% of use cases. Full for frontier research or when LoRA caps out.

Q: Will my data leak?

Use local (Ollama, vLLM) or self-hosted inference. Avoid hosted if data is sensitive.

Q: Can I fine-tune closed models like GPT-4?

OpenAI offers it but BANNED — use open models per our AI policy.

Q: How much VRAM needed?

QLoRA on 8B model: 16GB. LoRA on 8B: 24GB. Full 8B: 60GB+.

Q: Can I fine-tune image models?

Yes — Stable Diffusion LoRAs follow similar process with different tooling.

Conclusion

Fine-tuning is powerful but over-used. Always try RAG first. When you do tune, invest 80% of effort in dataset quality — model choice is secondary. Small, clean datasets beat sloppy big ones every time.