When semantic markup goes bad

In HTML, there are semantic elements and presentational elements. Semantic elements (like a href, cite, code, em, p, samp, and strong) have specific meanings. Presentational elements (like br, font, tt, b, and i) do not - they only alter visual presentation.

Using semantic elements is good because software can interpret them and do useful things. For example:

So the more people use semantic HTML properly, the more we all benefit. But this benefit isn’t immediate, and it’s hard to explain. So in their eagerness to be good Web citizens, people sometimes use semantic markup when they shouldn’t.

Most commonly, this happens with bold and italics. HTML has presentational elements for these: b and i. It also has semantic elements for some of the things you might want presented in bold or italics.

Reason for using italics Example Semantic HTML element
Emphasized text I’m really annoyed. em
Citing a work Jared Diamond’s Guns, germs, and steel cite
Mathematical variables E = m c2 var
Terms defined in-line In the Bronze Age, bronze replaced stone as the primary material for tools and weapons. dfn
Taxonomical names Homo sapiens sapiens (none)
Text in other languages She has a je ne sais quoi about her. (none)

(The Safari browser also italicizes abbreviations, abbr and acronym. This is a bug in Safari, because hardly any designers or readers ever want them italicized.)

Reason for using bold Example Semantic HTML element
Strongly emphasized text I’m not just annoyed, I’m furious! strong
Terms defined in-line In the Bronze Age, bronze replaced stone as the primary material for tools and weapons. dfn
Headings

Introduction

h1, h2, h3, h4, h5, h6
Header cells in tables Total th
Vectors and vector spaces The set of all vectors with two real-number components is R2. (none)

(dfn is rendered sometimes in italics, sometimes in bold. If you want to specify one or the other — or even something else, like colored text — you can do that in a style sheet; then if you change your mind later, you need only change the style sheet.)

These aren’t exhaustive lists, but as you can see, some reasons for using bold and italics don’t have their own semantic HTML elements. This is why b and i exist. Nevertheless, some people try to shoe-horn all uses of bold and italics into HTML’s few semantic elements. I have been guilty of this myself — using cite for text in another language, for example, which wasn’t a citation at all. I should have been using i.

More often, people switch from i to em for everything they want italicized, and from b to strong for everything they want in bold. Authoring software often does this — tools like the original Wiki, Wikipedia, and Markdown make it easy to use em and strong, and harder (or impossible) to use i, b, or any of the other italic or bold elements.

This is bad. It’s bad because using the wrong semantic element confuses tools trying to use that semantic information. In advanced speech browsers, for example, pages that use i when they mean cite will merely sound flat; but pages that use em when they mean cite will sound stupid. And Google’s definition collector is smart enough to tell when b is being used to mean dt, but (as far as I can tell) it assumes strong must be something other than a glossary item, so definition terms using strong won’t be included.

So if you want to use bold or italics, and HTML doesn’t have a semantic element for what you mean, use b or i. If you’re not sure which semantic element to use, use b or i. And if you’re creating an authoring tool for people who won’t know or care about semantics, please leave the semantic markup alone, and just stick to b and i. Thankyou.

Comments are closed.