When semantic markup goes bad
In HTML, there are semantic elements and presentational elements. Semantic elements (like a href
, cite
, code
, em
, p
, samp
, and strong
) have specific meanings. Presentational elements (like br
, font
, tt
, b
, and i
) do not - they only alter visual presentation.
Using semantic elements is good because software can interpret them and do useful things. For example:
- Google’s Web search interprets
a href
links as a measure of the importance of Web pages. - Google’s definition function interprets
dl
,dt
, anddd
(among other things) to find definitions for words and phrases. - Technorati interprets
a href
links to Amazon to determine what books people are talking about. It could also interpretcite
elements to include references from people who mentioned the title but didn’t link to Amazon. - Google Sets uses
ul
,ol
, andli
to find words and phrases related to each other.
So the more people use semantic HTML properly, the more we all benefit. But this benefit isn’t immediate, and it’s hard to explain. So in their eagerness to be good Web citizens, people sometimes use semantic markup when they shouldn’t.
Most commonly, this happens with bold and italics. HTML has presentational elements for these: b
and i
. It also has semantic elements for some of the things you might want presented in bold or italics.
Reason for using italics | Example | Semantic HTML element |
---|---|---|
Emphasized text | I’m really annoyed. | em |
Citing a work | Jared Diamond’s Guns, germs, and steel | cite |
Mathematical variables | E = m c2 | var |
Terms defined in-line | In the Bronze Age, bronze replaced stone as the primary material for tools and weapons. | dfn |
Taxonomical names | Homo sapiens sapiens | (none) |
Text in other languages | She has a je ne sais quoi about her. | (none) |
(The Safari browser also italicizes abbreviations, abbr
and acronym
. This is a bug in Safari, because hardly any designers or readers ever want them italicized.)
Reason for using bold | Example | Semantic HTML element |
---|---|---|
Strongly emphasized text | I’m not just annoyed, I’m furious! | strong |
Terms defined in-line | In the Bronze Age, bronze replaced stone as the primary material for tools and weapons. | dfn |
Headings |
Introduction |
h1 , h2 , h3 , h4 , h5 , h6 |
Header cells in tables | Total | th |
Vectors and vector spaces | The set of all vectors with two real-number components is R2. | (none) |
(dfn
is rendered sometimes in italics, sometimes in bold. If you want to specify one or the other — or even something else, like colored text — you can do that in a style sheet; then if you change your mind later, you need only change the style sheet.)
These aren’t exhaustive lists, but as you can see, some reasons for using bold and italics don’t have their own semantic HTML elements. This is why b
and i
exist. Nevertheless, some people try to shoe-horn all uses of bold and italics into HTML’s few semantic elements. I have been guilty of this myself — using cite
for text in another language, for example, which wasn’t a citation at all. I should have been using i
.
More often, people switch from i
to em
for everything they want italicized, and from b
to strong
for everything they want in bold. Authoring software often does this — tools like the original Wiki, Wikipedia, and Markdown make it easy to use em
and strong
, and harder (or impossible) to use i
, b
, or any of the other italic or bold elements.
This is bad. It’s bad because using the wrong semantic element confuses tools trying to use that semantic information. In advanced speech browsers, for example, pages that use i
when they mean cite
will merely sound flat; but pages that use em
when they mean cite
will sound stupid. And Google’s definition collector is smart enough to tell when b
is being used to mean dt
, but (as far as I can tell) it assumes strong
must be something other than a glossary item, so definition terms using strong
won’t be included.
So if you want to use bold or italics, and HTML doesn’t have a semantic element for what you mean, use b
or i
. If you’re not sure which semantic element to use, use b
or i
. And if you’re creating an authoring tool for people who won’t know or care about semantics, please leave the semantic markup alone, and just stick to b
and i
. Thankyou.