Docs

Eval reports

How Yachay evaluates a finished fine-tune — what we record, what you get, and what’s coming.

v1 status

The per-job dashboard panel for loss curves and held-out perplexity is on the v1.1 roadmap. v1 records the underlying data — see “What we capture today” below — but there’s no built-in viewer yet. If you need the metrics now, email hello@condorbox.aiwith your job ID and we’ll send the raw JSON within one business day.

What we capture today

  • Per-step training lossevery gradient-update step writes a (step, loss) tuple to the orchestrator’s structured log. Available via the Cloud Logging export to GCS for the lifetime of the log retention window (30 days by default).
  • Validation perplexity at every checkpointwe hold out 5% of your dataset (deterministic split, seed=42) and compute perplexity at each checkpoint. The final value is stamped on the Firestore job doc asfinalValPerplexity.
  • Training argsthe full hyperparameter set used by the trainer (epochs, batch size, learning rate, optimizer, dtype, LoRA rank/alpha) ships in trainer_args.json inside the adapter bundle. Survives the standard 30-day retention.
  • Tokens-seen and wall-clockstamped on the Firestore doc astrainTokens and trainSecondsActual. Surface in the dashboard’s per-job page already.

v1.1 roadmap

  • Loss-curve panelper-job dashboard chart of (step, train loss, val perplexity) across the whole run. Source data already exists in Cloud Logging; this is a UI build.
  • Benchmark probe (opt-in)for a flat $1.50 add-on at submit time, we run your tuned adapter against a small fixed benchmark (MMLU 5-shot, HumanEval pass@1) and surface the score next to your baseline.
  • Side-by-side evalpick two completed jobs and compare loss curves + perplexity head-to-head. Useful for hyperparameter sweeps.

What we will NOT do

We won’t evaluate your adapter on a held-out dataset we don’t share with you. The v1.1 benchmark probe runs against fixed public benchmarks (MMLU, HumanEval) so the score is comparable and reproducible. If you need a custom eval set, run it client-side against the downloaded adapter — that keeps the eval data on your machine.

← All docs · Downloading your adapter · Pricing