Crawltest Industries

External & non-HTML links

A crawler scoped to http://localhost:3005 should not follow any of the links below into another origin (it may record them as outbound references). The non-HTTP schemes should be skipped entirely.

For contrast: internal links it *should* follow

A protocol-relative link

//example.com/protocol-relative — resolves to the current scheme, still a different origin.

A link with a fragment to another page

/kitchen-sink#anchor — same page as /kitchen-sink, just a different fragment; should not be a separate URL.