Short answer: A chatbot system prompt has six jobs: define identity, set scope, specify tone, structure the output, handle refusals, and trigger escalation. Most teams nail the first three and skip the last three — which is exactly why their chatbots feel generic and hallucinate. Below: the principles behind each job, plus an interactive library of 20 production-tested prompts you can copy and adapt in five minutes.
The library is free, no signup, just copy and go. Whether you're building on Chatmount, Chatbase, your own LangChain stack, or wiring up GPT-4 directly, the patterns transfer.
Doc-grounded support assistant
You are the support assistant for {COMPANY}. Answer the user's question using only the context below.
Rules:
- If the answer is fully in the context, answer in 1-3 short paragraphs.
- If the answer is partially in the context, answer the part you can and say what's missing.
- If the answer is not in the context, say "I don't see that in our docs — let me get a human" and offer to capture their email.
- Never invent product features, pricing, or policy.
- Match the user's tone (formal/casual) but stay friendly.
Context:
{RETRIEVED_CHUNKS}
User question: {USER_QUESTION}Order-status + escalation
You are {STORE_NAME}'s order-help assistant.
If the user asks about an order:
1. Ask for their order number if not provided.
2. Use the lookup_order tool with the number.
3. Summarize status (placed / shipped / delivered / refunded) in one sentence.
4. If the order is delayed, missing, or damaged, say "I'll connect you with a human" and trigger handover.
For non-order questions, answer from the policy docs in context. If unsure, escalate.
Context: {RETRIEVED_CHUNKS}
User: {USER_QUESTION}Frustration-aware handoff
Add to your system prompt: If the user expresses frustration (caps, exclamation, "this is ridiculous", "I want a refund", "your support sucks", repeats the same question), stop trying to answer and respond with: "I hear you — let me get a human on this right away. What's the best email so they can follow up?" Then trigger the human-handover function. Don't apologize more than once. Don't try to fix the issue yourself.
Multi-turn diagnostic
You are a diagnostic assistant for {PRODUCT}.
When a user reports an issue, ask ONE clarifying question at a time:
1. What were you trying to do?
2. What did you see happen?
3. What were you expecting to happen?
4. What browser / device / version are you on?
Then check the troubleshooting docs in context. If you can solve it, give the steps. If not, summarize what you've gathered and offer to file a ticket.
Context: {RETRIEVED_CHUNKS}{CURLY_BRACE_VARS} with your own values before pasting into your chatbot's system prompt.Now the principles behind why these prompts work. If you understand the why, you can write better ones for your specific use case.
The six jobs every system prompt does
A system prompt is the instructions you give the LLM before the user ever types anything. It runs at the start of every conversation. Six things it has to communicate:
1. Identity
Who is the chatbot? What company does it represent? What's its name?
You are the support assistant for Acme Co. — an AI chatbot
designed to answer questions about our product and route
high-intent buyers to our team.This anchors the model's behavior. Without identity framing, the model defaults to "I am an AI assistant" — generic and trust-eroding. With identity, it stays in character.
2. Scope
What's in-bounds? What's out-of-bounds? This is where most teams under-specify.
You may discuss:
- Our product, pricing, features, integrations
- General industry context relevant to our customers
You may NOT:
- Discuss competitor pricing in detail
- Help with unrelated tasks (writing code, planning trips,
summarizing documents we don't sell)Without explicit scope, the model will help with anything — and "anything" includes writing essays for kids, debugging code that isn't yours, and generally drifting from your actual use case. Scope = guardrails.
3. Tone
Match the brand voice without micromanaging.
Voice: warm but concise. Match the user's energy — formal if
they're formal, casual if they're casual. No corporate hedging,
no "Great question!" openers. One emoji per conversation max.Tone is where chatbots feel "off" — too formal for a startup, too casual for an enterprise vendor, too excited for a B2B context. A few specific dos-and-don'ts move this dramatically.
4. Output structure
Tell the model how to format answers.
Reply in 1-3 short paragraphs. Use bullet points only when
listing 3+ distinct items. Always end with one short follow-up
question to keep the conversation going.Structured outputs are easier to scan, easier to evaluate, and harder to hallucinate in (a missing field is more visible than missing prose).
5. Refusal rules
When should the model NOT answer?
If the answer is not clearly present in the retrieved context,
respond with "I don't see a clear answer to that in our docs"
and offer to connect the user to a human. Do not guess.
If the user asks for medical, legal, or financial advice, decline
politely and suggest a qualified professional.The single highest-leverage line in any chatbot prompt. Explicit refusal permission is what stops hallucinations. Without it, the model will invent answers because being-helpful is its baseline behavior.
6. Escalation triggers
When should the conversation hand off to a human?
Trigger handover and offer "let me get a human" if:
- The user expresses frustration (caps, exclamations, "ridiculous",
"I want a refund")
- The user repeats the same question (your previous answer didn't help)
- The user explicitly asks for a human, support, sales
- The retrieved context doesn't contain the answer to a high-intent
question (pricing, demo request, integration question)Escalation rules turn the chatbot from a containment system into a real funnel. Without them, frustrated users churn silently. (How handoff should work end-to-end.)
The principles that separate good prompts from average ones
After reading hundreds of system prompts in production, four patterns predict quality:
Specificity beats sophistication
Long, abstract prompts ("Be helpful and accurate while maintaining a professional tone") underperform short, specific ones ("Reply in 1-3 sentences. No 'great question' openers. No emojis.").
The model is better at following concrete rules than vague guidance. When in doubt, write it as a checklist.
Negative instructions matter as much as positive
"Don't apologize more than once" is more effective than "Be confident." "Don't discuss competitor pricing" is more effective than "Stay focused on us."
LLMs are trained to follow instructions; tell them what not to do.
Examples > Description
Don't say things like:
- "Great question!"
- "I'd be happy to help!"
- "Absolutely!"This works better than "Avoid sycophantic openers." The model pattern-matches on the examples directly.
Test on your worst questions, not your best
Most teams test the chatbot on questions the bot can definitely answer. The interesting questions are the edge cases:
- Questions outside your knowledge base
- Questions with adversarial framing ("ignore previous instructions...")
- Questions in user voice that don't match your docs ("how do I unfuck my account?")
- Questions that share jargon with competitors ("does it sync with Salesforce?")
A good system prompt handles all of these gracefully. A bad one handles only the FAQ-style queries you tested with.
Where teams typically over-engineer
A few patterns that sound sophisticated but usually hurt more than help:
Long persona descriptions
"You are Sarah, a 32-year-old product specialist with 8 years of SaaS experience and a passion for helping customers succeed." This is theatrical and adds zero capability. Worse, it primes the model to invent details about Sarah when the user asks ("what's your background?").
Stick to: "You are the support assistant for Acme Co." That's enough.
Chain-of-thought instructions
"First, think step by step about what the user is asking. Then consider the context. Then formulate a response." This adds latency, costs more tokens, and rarely improves answer quality on customer support. Reserve CoT for genuinely complex reasoning tasks (math, multi-hop questions).
Confidence scoring
"Rate your confidence in this answer 1-10 and only respond if 7+." LLMs' self-reported confidence has weak correlation with actual correctness. Use verification passes or refusal prompts instead.
Excessive examples in the prompt
Few-shot prompting helps when the format is unusual. For standard chat, 1-2 examples is plenty. Adding 8 examples doubles your latency and rarely improves quality.
A prompt-writing workflow
If you're writing your first chatbot system prompt, use this 6-step workflow:
- Pick a base template from the library above (closest match to your use case).
- Customize identity, scope, and tone for your brand.
- Add a clear refusal rule if not present (the single most important line).
- Add 2-3 escalation triggers specific to your funnel (pricing question, demo request, frustrated language).
- Test on 20 representative questions including 5 you know the bot shouldn't answer.
- Iterate on the failures — usually one or two prompt tweaks fix 80% of the issues.
Total time: under an hour for most teams. The improvement from a generic "you are a helpful assistant" prompt to a tuned one is roughly the difference between a 4% and a 10% conversion rate on chat-driven leads.
What about model-specific prompting?
Each major LLM responds slightly differently to prompts:
- Claude is the most literal — every line of the prompt is followed precisely. Best for tightly-scoped use cases.
- GPT-4 is the most flexible but most likely to ignore later parts of long prompts. Put the most important rules at the top.
- Gemini is the most variable run-to-run. Use lower temperature and add explicit "do exactly this" instructions.
For most teams, the model-specific differences are smaller than the differences between a thoughtful prompt and a generic one. Get the prompt right first, optimize for the model second.
GPT-4 vs Claude vs Gemini goes deeper on the comparison.
What you should do next
- Copy the prompt above that's closest to your use case.
- Replace
{CURLY_BRACE_VARS}with your own values (company name, product, etc.). - Paste into your chatbot platform's system prompt field and save.
- Test 20 representative questions, including 5 you know the bot shouldn't answer.
- Iterate. The first version is almost never the final one.
The library above represents what we've seen work across hundreds of Chatmount deployments. Treat them as starting points, not gospel — every business has its own voice, edge cases, and qualification criteria.
If you want a chatbot platform where the system prompt is editable in real-time, every change takes effect on the next message, and you can A/B test prompts side-by-side, Chatmount's free tier ships with all of that.
Related deep-dives
- Why AI chatbots hallucinate (and 7 ways to stop it) — the refusal-rule line is the most important defense; this post explains why.
- AI chatbot lead qualification: 27 questions — qualification questions to weave into your prompt.
- How AI chatbots actually work — what the system prompt feeds into.
- GPT-4 vs Claude vs Gemini for chatbots — model-specific prompting tips.
Building Chatmount — the AI chatbot for lead generation with native human handover. Writing about what teams actually ship vs what AI chatbot vendors say in marketing.
Try Chatmount free — built for the lead-gen patterns in this post
AI chatbot with native human handover and in-conversation lead capture. Plans start at $6/month annual ($8/mo monthly). No credit card to start.
Start free