Crawltest Industries

Crawltest Industries

A synthetic website that exists only to be crawled.

Every page here is generated from a static dataset, so the crawl is finite and deterministic. The site deliberately mixes route shapes (static, dynamic, catch-all, paginated), content types (prose, tables, lists, images, forms, structured data), and crawler edge cases (redirects, canonicals, robots rules, slow responses, client-only content).

Start here

Crawler edge cases

Machine-readable

Full page index (61 crawlable HTML pages, excluding the robots-disallowed/private/*)

A crawler that follows links from /, obeys robots.txt, and de-duplicates on the canonical URL should converge on this set. (The 307 from /blog to /blog/page/1 and the 301 from /old-page to /about mean a few extra URLs get *visited* before collapsing.)