Crawltest Industries
A synthetic website that exists only to be crawled.
Every page here is generated from a static dataset, so the crawl is finite and deterministic. The site deliberately mixes route shapes (static, dynamic, catch-all, paginated), content types (prose, tables, lists, images, forms, structured data), and crawler edge cases (redirects, canonicals, robots rules, slow responses, client-only content).
Start here
- About — long-form prose, headings h1–h6, JSON-LD
- Blog — listing, pagination,
/blog/[slug], tags, RSS - Products —
/products/[id], schema.org Product, categories - Docs — a real catch-all (
/docs/[...path]) with nesting - Kitchen sink — one page exercising every HTML content element
Crawler edge cases
- /old-page → 301 to
/about(de-dup on canonical) - /redirect-me → 307 temporary redirect to
/about - /blog/legacy/hello-crawler → 308 to the real post
- /private/secret — renders fine but disallowed in
robots.txtand sendsX-Robots-Tag: noindex - /slow — delays ~3.5s before responding (tests timeouts /
waitFor) - /js-rendered — real content is injected client-side only
- /maze/start — links cycle back on themselves (tests visited-set + depth limits)
- /this-page-does-not-exist — 404
Machine-readable
- /sitemap.xml
- /robots.txt
- /llms.txt
- /feed.xml (RSS 2.0)
- /api/products (JSON)
- /files/datasheet.pdf (non-HTML resource)
Full page index (61 crawlable HTML pages, excluding the robots-disallowed/private/*)
A crawler that follows links from /, obeys robots.txt, and de-duplicates on the canonical URL should converge on this set. (The 307 from /blog to /blog/page/1 and the 301 from /old-page to /about mean a few extra URLs get *visited* before collapsing.)
- /
- /about
- /blog
- /blog/client-rendered-note
- /blog/hello-crawler
- /blog/images-and-media
- /blog/lists-tables-and-code
- /blog/page/1
- /blog/page/2
- /blog/page/3
- /blog/redirects-and-canonicals
- /blog/the-long-one
- /category/gizmos
- /category/misc
- /category/widgets
- /contact
- /deep
- /deep/level-1
- /deep/level-1/level-2
- /deep/level-1/level-2/level-3
- /deep/level-1/level-2/level-3/level-4
- /deep/level-1/level-2/level-3/level-4/level-5
- /deep/level-1/level-2/level-3/level-4/level-5/level-6
- /deep/level-1/level-2/level-3/level-4/level-5/level-6/level-7
- /deep/level-1/level-2/level-3/level-4/level-5/level-6/level-7/level-8
- /docs
- /docs/guides
- /docs/guides/install
- /docs/guides/install/troubleshooting
- /docs/intro
- /docs/reference
- /docs/reference/errors
- /external-links
- /files
- /gallery
- /js-rendered
- /kitchen-sink
- /maze/dead-end
- /maze/east
- /maze/loop
- /maze/north
- /maze/start
- /maze/treasure
- /products
- /products/gizmo-mini
- /products/gizmo-xl
- /products/thingamajig
- /products/widget-1000
- /products/widget-2000-pro
- /search
- /slow
- /tags
- /tags/getting-started
- /tags/html
- /tags/images
- /tags/javascript
- /tags/long-form
- /tags/media
- /tags/meta
- /tags/seo
- /tags/structure