AI Performance & Quality

Response Quality

When the agent gets answers wrong, this is the page that walks you through fixing them. Three levers — sources, prompt, model — applied in order.

Diagnose first

Before changing anything, classify the failure. Almost every bad answer falls into one of three buckets:

  • Hallucination— the agent is confidently wrong about a fact. Usually a sources problem (the right info isn’t indexed) or a prompt problem (no fallback for unknowns).
  • Off-topic drift — the agent answers the question but wanders into something else. System-prompt constraints fix this.
  • Tone or length issue — the answer is correct but too long, too short, or off-brand. Prompt tweak.

Lever 1 — Sources

Source coverage is the cheapest fix. Two patterns:

  • Missing answer — write a Q&A pair. Five minutes, immediately authoritative
  • Wrong answer in a source — find the source, edit it, retrain. Or toggle the source off if the wrong answer is part of marketing copy you can’t change
  • Conflicting sources — Suggestions panel surfaces these (Pro). Resolve by tightening one of the sources

See Data sources for the full source-type guide.

Lever 2 — System prompt

When sources have the answer but the agent uses them weirdly, look at the prompt. Most common moves:

  • Add an explicit Constraint — e.g. ‘Never give pricing — always direct to /pricing’
  • Add a fallback — e.g. ‘If you don’t know, say so and offer to escalate’
  • Tighten the Role — replace ‘help with anything’ with the specific job

The presets that ship with the playground (Customer support, Sales agent) are good starting points — they already include fallback + constraint patterns. See Best Practices for the full prompt template.

Lever 3 — Model

Save model upgrades for last. They’re expensive and they rarely fix what sources + prompt couldn’t. The exceptions:

  • Multi-step reasoning — the agent gets each fact right but assembles them wrong
  • Code or math — Mini models miss subtle bugs that frontier models catch
  • Long-context synthesis — pulling from a 50-page document, where Mini loses thread

Use Compare to A/B-test before committing. Often the upgrade isn’t worth the credit cost.

Build a feedback loop

You can’t improve what you don’t see. Two surfaces give you the visibility:

  • Activity (chat logs) — read real conversations weekly. Edit any AI response that needs a fix; the edit feeds back into training
  • Suggestions (Pro) — Chatmount auto-flags low-confidence questions. One-click accept turns each into a Q&A snippet
  • Analytics — sentiment trends tell you whether quality is moving up or down over time

Related