AI Performance & Quality

Response Quality

When the agent gets answers wrong, this is the page that walks you through fixing them. Three levers — sources, prompt, model — applied in order.

Diagnose first

Before changing anything, classify the failure. Almost every bad answer falls into one of three buckets:

Hallucination— the agent is confidently wrong about a fact. Usually a sources problem (the right info isn’t indexed) or a prompt problem (no fallback for unknowns).
Off-topic drift — the agent answers the question but wanders into something else. System-prompt constraints fix this.
Tone or length issue — the answer is correct but too long, too short, or off-brand. Prompt tweak.

Lever 1 — Sources

Source coverage is the cheapest fix. Two patterns:

Missing answer — write a Q&A pair. Five minutes, immediately authoritative
Wrong answer in a source — find the source, edit it, retrain. Or toggle the source off if the wrong answer is part of marketing copy you can’t change
Conflicting sources — Suggestions panel surfaces these (Pro). Resolve by tightening one of the sources

See Data sources for the full source-type guide.

Lever 2 — System prompt

When sources have the answer but the agent uses them weirdly, look at the prompt. Most common moves:

Add an explicit Constraint — e.g. ‘Never give pricing — always direct to /pricing’
Add a fallback — e.g. ‘If you don’t know, say so and offer to escalate’
Tighten the Role — replace ‘help with anything’ with the specific job

The presets that ship with the playground (Customer support, Sales agent) are good starting points — they already include fallback + constraint patterns. See Best Practices for the full prompt template.

Lever 3 — Model

Save model upgrades for last. They’re expensive and they rarely fix what sources + prompt couldn’t. The exceptions:

Multi-step reasoning — the agent gets each fact right but assembles them wrong
Code or math — Mini models miss subtle bugs that frontier models catch
Long-context synthesis — pulling from a 50-page document, where Mini loses thread

Use Compare to A/B-test before committing. Often the upgrade isn’t worth the credit cost.

Build a feedback loop

You can’t improve what you don’t see. Two surfaces give you the visibility:

Activity (chat logs) — read real conversations weekly. Edit any AI response that needs a fix; the edit feeds back into training
Suggestions (Pro) — Chatmount auto-flags low-confidence questions. One-click accept turns each into a Q&A snippet
Analytics — sentiment trends tell you whether quality is moving up or down over time

Best Practices

Prompt patterns + source curation that prevent quality issues before they show up.

Data sources

Source curation is the highest-leverage fix for most response issues.

Activity

Where you read real conversations and edit responses to retrain on the fly.

Analytics

Sentiment trends + topic clusters that signal quality drift over time.