Docs
LoRA vs QLoRA
When you submit a tune, Yachay auto-picks the right one based on model size. Override it manually if you have a specific reason.
Yachay defaults
| Model size | Default | Typical GPU | Why |
|---|---|---|---|
| ≤ 4B params | LoRA | L4 or A10G (Spot) | Fits comfortably in fp16 on a single mid-tier GPU; no quantisation needed. |
| 4B – 16B | LoRA | A100 40 GB (Spot) | Single A100 holds the model in fp16; LoRA gives the best quality/speed trade-off. |
| 16B – 24B | LoRA | A100 80 GB (Spot) | fp16 weights spill past 40 GB. Still LoRA — quality difference vs QLoRA is measurable at this size. |
| 24B – 75B | QLoRA | A100 80 GB (Spot) | Full LoRA needs multi-GPU above ~24B and the price doubles. QLoRA's 4-bit base fits on a single A100 80 GB with a ~1–2% quality cost. |
| > 75B (incl. MoE) | QLoRA | H100 80 GB (Spot) | Models like Llama 4 Scout need H100-class memory bandwidth to train cost-effectively. QLoRA keeps the per-job cost bounded. |
Trade-off matrix
Quality
- LoRA
- Higher. Updates adapter weights against full-precision base.
- QLoRA
- Slightly lower — usually within 1–2% on benchmarks. Hard to detect on most downstream tasks.
Memory
- LoRA
- Heavy. 16-bit base weights must fit in GPU VRAM.
- QLoRA
- Light. 4-bit base + adapter — fits 2–3× larger models on the same card.
Speed
- LoRA
- Faster forward passes (no dequantisation overhead).
- QLoRA
- Slower per step (~30–50%) due to dequant; often still wins on cost because you use a smaller machine.
Cost
- LoRA
- Higher for ≥32B models — needs more or bigger GPUs.
- QLoRA
- Materially cheaper for 32B+. About the same for ≤14B.
When to override
- Force LoRA on 70B+: if you’ve benchmarked QLoRA on this task and the small quality regression matters more than cost. Be ready for 2–3× the price.
- Force QLoRA on 14B: if you’re running many small tunes and want the cheapest possible per-job cost. Quality difference is rarely measurable at this size.
- Stick with the default: for everything else. Yachay’s defaults are tuned for the Pareto frontier of cost and accuracy on standard benchmarks.