Header

Nav

  1. Link to section index
  2. Link to structured page 01
  3. Link to structured page 02
  4. Link to structured page 03

02 Structured page 02 - table layout

Sidebar

Wrapper logo

Content

This HTML4 (not XHTML) page is laid out in an old table style.

The content area is still quite easy to identify, although there are no semantic clues to use. You need to look at the page structure and invent a rule.

In this case, we want the first td in the third tr of the root table. XPath: /html/body/table/tr[3]/td[1].

In this case we need to be careful about anchoring the pattern to the top of the document, or nested tables may also be found.

A table with content in the page
may also trigger matches.
You don't want this cell. So the pattern should be quite
Explicit

A plugin like XPath Checker for Firefox can help you find the pattern you need.
But take care to examine the actual source, the in-browser view of the document works on a 'corrected' version of the source. In this case, the XPath checker thought there was a tbody element in the page, though there was not.

Footer