Docs

Eval reports

How Yachay evaluates a finished fine-tune — what we record, what you get, and what’s coming.

v1 status

The per-job dashboard panel for loss curves and held-out perplexity is on the v1.1 roadmap. v1 records the underlying data — see “What we capture today” below — but there’s no built-in viewer yet. If you need the metrics now, email hello@condorbox.aiwith your job ID and we’ll send the raw JSON within one business day.

What we capture today

Per-step training loss — every gradient-update step writes a (step, loss) tuple to the orchestrator’s structured log. Available via the Cloud Logging export to GCS for the lifetime of the log retention window (30 days by default).
Validation perplexity at every checkpoint — we hold out 5% of your dataset (deterministic split, seed=42) and compute perplexity at each checkpoint. The final value is stamped on the Firestore job doc asfinalValPerplexity.
Training args — the full hyperparameter set used by the trainer (epochs, batch size, learning rate, optimizer, dtype, LoRA rank/alpha) ships in trainer_args.json inside the adapter bundle. Survives the standard 30-day retention.
Tokens-seen and wall-clock — stamped on the Firestore doc astrainTokens and trainSecondsActual. Surface in the dashboard’s per-job page already.

v1.1 roadmap

Loss-curve panel — per-job dashboard chart of (step, train loss, val perplexity) across the whole run. Source data already exists in Cloud Logging; this is a UI build.
Benchmark probe (opt-in) — for a flat $1.50 add-on at submit time, we run your tuned adapter against a small fixed benchmark (MMLU 5-shot, HumanEval pass@1) and surface the score next to your baseline.
Side-by-side eval — pick two completed jobs and compare loss curves + perplexity head-to-head. Useful for hyperparameter sweeps.

What we will NOT do

We won’t evaluate your adapter on a held-out dataset we don’t share with you. The v1.1 benchmark probe runs against fixed public benchmarks (MMLU, HumanEval) so the score is comparable and reproducible. If you need a custom eval set, run it client-side against the downloaded adapter — that keeps the eval data on your machine.

← All docs · Downloading your adapter · Pricing