The Kitchen Sink
If a content extractor handles this page, it handles most of the web. Inline bits: strong, bold, emphasis, italic, underline, strikethrough, highlighted, small print, subscript, superscript, inline code, Ctrl+C, sample output, x, an HTML abbreviation, a real external link, an internal link, a mailto link, a tel link, and a same-page anchor. There is also a line break here →
← and a short inline quotation
.
Lists
Unordered, nested
- Top item
- Nested item
- Doubly-nested item
- Another nested item
- Nested item
- Second top item
Ordered, with start offset
- Third
- Fourth
- Fifth
Definition list
- Crawler
- A program that follows links and records pages.
- Fixture
- A controlled input used for testing.
Table
| Quarter | Widgets | Gizmos | Total |
|---|---|---|---|
| Q1 | 120 | 80 | 200 |
| Q2 | 150 | 95 | 245 |
| Total | 270 | 175 | 445 |
Quotes & code
The web is a worse-is-better system. — Somebody, probably
function crawl(url) {
const seen = new Set();
const queue = [url];
while (queue.length) {
const next = queue.shift();
if (seen.has(next)) continue;
seen.add(next);
// ... fetch, parse, enqueue links
}
return seen;
}Media & figures

A <picture> element with multiple <source> candidates:
Disclosure widgets
Click to expand a <details> block
Hidden-by-default content. The text is still in the HTML, so extractors should see it.
A small form
Progress & meter
70%
Disk usage:
— end of the kitchen sink.