AI Performance & Quality
Response Quality
When the agent gets answers wrong, this is the page that walks you through fixing them. Three levers — sources, prompt, model — applied in order.
Diagnose first
Before changing anything, classify the failure. Almost every bad answer falls into one of three buckets:
- Hallucination— the agent is confidently wrong about a fact. Usually a sources problem (the right info isn’t indexed) or a prompt problem (no fallback for unknowns).
- Off-topic drift — the agent answers the question but wanders into something else. System-prompt constraints fix this.
- Tone or length issue — the answer is correct but too long, too short, or off-brand. Prompt tweak.
Lever 1 — Sources
Source coverage is the cheapest fix. Two patterns:
- Missing answer — write a Q&A pair. Five minutes, immediately authoritative
- Wrong answer in a source — find the source, edit it, retrain. Or toggle the source off if the wrong answer is part of marketing copy you can’t change
- Conflicting sources — Suggestions panel surfaces these (Pro). Resolve by tightening one of the sources
See Data sources for the full source-type guide.
Lever 2 — System prompt
When sources have the answer but the agent uses them weirdly, look at the prompt. Most common moves:
- Add an explicit Constraint — e.g. ‘Never give pricing — always direct to /pricing’
- Add a fallback — e.g. ‘If you don’t know, say so and offer to escalate’
- Tighten the Role — replace ‘help with anything’ with the specific job
The presets that ship with the playground (Customer support, Sales agent) are good starting points — they already include fallback + constraint patterns. See Best Practices for the full prompt template.
Lever 3 — Model
Save model upgrades for last. They’re expensive and they rarely fix what sources + prompt couldn’t. The exceptions:
- Multi-step reasoning — the agent gets each fact right but assembles them wrong
- Code or math — Mini models miss subtle bugs that frontier models catch
- Long-context synthesis — pulling from a 50-page document, where Mini loses thread
Use Compare to A/B-test before committing. Often the upgrade isn’t worth the credit cost.
Build a feedback loop
You can’t improve what you don’t see. Two surfaces give you the visibility:
- Activity (chat logs) — read real conversations weekly. Edit any AI response that needs a fix; the edit feeds back into training
- Suggestions (Pro) — Chatmount auto-flags low-confidence questions. One-click accept turns each into a Q&A snippet
- Analytics — sentiment trends tell you whether quality is moving up or down over time
Related
Best Practices
Prompt patterns + source curation that prevent quality issues before they show up.
Data sources
Source curation is the highest-leverage fix for most response issues.
Activity
Where you read real conversations and edit responses to retrain on the fly.
Analytics
Sentiment trends + topic clusters that signal quality drift over time.