Website Crawl · AI Training

Paste Your URL.
Get a Chatbot Trained on It.

Chatmount auto-crawls your website, indexes every page, and turns your existing content into a chatbot's knowledge base. Smart crawling, source-cited answers, scheduled re-training, and a one-line embed. No copy-paste, no developers, no spreadsheets of URLs.
7-day free trial
Cancel anytime
5 min setup

How to train an AI chatbot on your website

You train an AI chatbot on your website by pointing a chatbot builder at your URL. The platform crawls your pages, extracts the readable text, generates embeddings, and indexes everything in a vector database. When a user asks a question, the chatbot retrieves the most relevant pages and uses them as context for the answer — a pattern called retrieval-augmented generation (RAG).

In Chatmount the flow is: paste URL, click train, done. The crawler discovers pages via your sitemap.xml or by following internal links. JavaScript-rendered content (Next.js, React, Vue) is rendered properly so single-page apps work too. Indexing typically takes 2-5 minutes for sites under 500 pages.

You never set up a vector database, choose an embedding model, or write a chunking algorithm. You paste a URL.

A real website crawler, not a one-page scraper

Six things that separate a real website-trained chatbot from a weekend prototype.

Auto-crawl every page

Paste a single URL. Chatmount discovers and crawls your site recursively, respecting your robots.txt and any crawl-depth limits you set.

Sitemap-aware indexing

Has a sitemap.xml? Chatmount uses it to find pages. No sitemap? The crawler discovers pages via internal links. Either way, every public page can be indexed.

Source-cited answers

Every chatbot answer can include the URL of the page the answer came from. No hallucinated facts — and visitors can click through to read more.

Scheduled re-training

Set the chatbot to re-crawl your site daily, weekly, or monthly. New pages get indexed automatically. Removed content gets cleaned up.

Page-level control

Exclude specific URLs (like the admin area or marketing-only pages). Include specific subdomains. Throttle to be polite to your origin server.

Respects your site

User-agent identifies as Chatmount. Honors robots.txt. Configurable crawl rate so you never see a spike from us in your analytics.

The flow, in three commands

  1. 1

    Paste your URL

    https://your-site.com — that's all.

  2. 2

    Click train

    Crawler discovers pages, extracts text, chunks, embeds, indexes. 2-5 minutes for a typical site.

  3. 3

    Chat or embed

    Test in the playground. Copy the one-line embed script. Done.

What gets indexed (and what doesn't)

Indexed by default

  • Marketing site pages
  • Blog posts and articles
  • Product / pricing pages
  • Help center articles
  • Public docs
  • JS-rendered content

Skipped by default

  • Anything in robots.txt disallow
  • Pages with noindex meta
  • Auth-gated dashboards
  • PDF / image-only pages (use PDF training instead)
  • URL patterns you exclude in settings
  • Off-domain external links

Mix sources for the best chatbot

Most teams train one chatbot on multiple sources: the website crawl for breadth, PDFs for technical depth, Q&A pairs for hand-tuned FAQ answers.

Website crawl
PDFs
Q&A pairs

Frequently Asked Questions

Everything teams ask before pointing a chatbot at their website.

How do I train an AI chatbot on my website?

Paste your website URL into Chatmount and click train. The crawler walks your pages (using your sitemap.xml if available, otherwise following internal links), extracts the readable text, splits it into chunks, generates embeddings, and indexes everything. The chatbot starts answering from your content as soon as indexing finishes — usually 2-5 minutes for sites under 500 pages.

Will the crawler hit my server hard?

No. Chatmount's crawler runs at a polite default rate (one request every few seconds) and respects your robots.txt. You can throttle it further or set crawl-depth limits if you want to be even gentler. The crawler identifies itself as 'Chatmount' in the User-Agent header so you can see it in your access logs.

Can the chatbot stay in sync as I publish new content?

Yes. You can schedule re-crawls daily, weekly, or monthly. New pages are indexed automatically; removed content is cleaned up. Re-training is incremental — only changed pages get re-embedded — so it's fast and doesn't run up your monthly credits.

What if my site is JavaScript-heavy (Next.js, React, Vue)?

Chatmount's crawler renders JavaScript so single-page apps and JS-rendered content work correctly. Server-rendered pages and statically generated content work even better — they index faster and use fewer crawl credits.

Can I exclude specific pages from training?

Yes. You can: (1) Add disallow rules in your robots.txt for the Chatmount crawler. (2) Specify URL patterns to exclude in the chatbot settings (e.g., /admin/*, /private/*). (3) Use the noindex meta tag — Chatmount respects it.

How does this differ from training on a sitemap or PDF?

A website crawl gives you depth and freshness — it follows links and stays current. A sitemap gives you a curated list of pages. A PDF gives you static, point-in-time content. Most teams combine all three: site crawl for marketing/blog content, PDFs for product manuals, Q&A pairs for FAQ-style answers.

What about large sites — 1,000+ pages?

Chatmount handles large sites well, but storage budget matters. Free: 200 pages max crawl. Go: 1,000. Plus: 5,000. Pro: 20,000. Enterprise: unlimited. You can also exclude entire subdirectories to focus the crawl on the content that actually matters for chatbot answers.

Will the chatbot answer questions about content I haven't indexed?

By default the chatbot answers only from your trained content and admits when it doesn't know. You can configure looser behavior (allow general-knowledge fallback for off-topic questions) or stricter (refuse anything not in your sources).

Train a chatbot on your site in 5 minutes

7-day free trial. Paste your URL, click train, embed the script. Your content, conversational.