Catalog

17 commercial-safe base models

We’ve vetted each license. Llama derivatives must include “Llama” in the model name; everything else is unrestricted commercial use. Tuned weights belong to you — Yachay never reads them.

Llama(5)

Meta's open weights — strongest general-purpose base.

Llama 4 Scout

109B (17B active) · Llama Community

5–15 min cold start

general chat · code · long context · multilingual

Typical tune: ~$80 · LORA

Llama 3.3 70B

70B params · Llama Community

Instant start

general chat · instruction following · reasoning

Typical tune: ~$80 · QLORA

Llama 3.1 8B

8B params · Llama Community

Instant start

general chat · instruction following · popular tuning base

Typical tune: ~$10 · LORA

Llama 3.2 3B

3B params · Llama Community

Instant start

edge · fast tuning · summarization

Typical tune: ~$10 · LORA

Llama 3.2 1B

1B params · Llama Community

Instant start

edge · extraction · classification

Typical tune: ~$10 · LORA

Qwen(4)

Alibaba's Apache 2.0 multilingual workhorse.

Qwen 3 32B

32B params · Apache 2.0

5–15 min cold start

multilingual · code · reasoning

Typical tune: ~$40 · QLORA

Qwen 3 14B

14B params · Apache 2.0

5–15 min cold start

multilingual · code

Typical tune: ~$10 · LORA

Qwen 3 8B

8B params · Apache 2.0

Instant start

multilingual · tool use · Apache 2.0 mid-tier

Typical tune: ~$10 · LORA

Qwen 3 4B

4B params · Apache 2.0

Instant start

edge · Apache 2.0 small

Typical tune: ~$10 · LORA

Gemma(2)

Google's open-weight family, derived from Gemini research.

Gemma 3 27B

27B params · Gemma TOU

5–15 min cold start

general chat · vision-ready architecture

Typical tune: ~$30 · QLORA

Gemma 3 12B

12B params · Gemma TOU

Instant start

general chat · mid-size workhorse

Typical tune: ~$10 · LORA

Phi(2)

Microsoft's synthetic-data-heavy reasoners (MIT).

Phi-4 14B

14B params · MIT

Instant start

reasoning · math · synthetic-data-heavy

Typical tune: ~$10 · LORA

Phi-4-mini

3.8B params · MIT

Instant start

edge · function calling

Typical tune: ~$10 · LORA

Mistral(2)

European Apache 2.0 family — strong cost/perf.

Mistral Small 3.1 24B

24B params · Apache 2.0

5–15 min cold start

general chat · Apache 2.0 mid-tier

Typical tune: ~$30 · QLORA

Mistral Nemo 12B

12B params · Apache 2.0

5–15 min cold start

multilingual · long context

Typical tune: ~$10 · LORA

DeepSeek Distill(2)

Distilled reasoning checkpoints — math and chain-of-thought.

DeepSeek-R1-Distill-Qwen-14B

14B params · MIT

5–15 min cold start

reasoning · math

Typical tune: ~$10 · LORA

DeepSeek-R1-Distill-Llama-8B

8B params · MIT

5–15 min cold start

reasoning · math · edge

Typical tune: ~$10 · LORA