Skip to: Navigation | Content | Sidebar | Footer

Weblog Entry

A Roadmap to Standards

April 30, 2004

This afternoon I was asked by a friend what I would recommend to an old designer who wants to learn more about web standards, CSS, XML, and XHTML.

This is a perfect example of when an email response is better posted here for a wider audience (and Google). So here’s my answer: this is a comprehensive, informal, and somewhat long-winded roadmap for anyone who has heard about web standards, thinks they might want web standards, but doesn’t know where to start.

Stop! Before you do anything, the most important thing you can do for your learning process is accept that a) it’s going to take time, and b) you will be frustrated along the way.

But you’re not alone. Plenty of us who have taken the plunge into standards went through the same, and there is a growing body of work devoted to helping make your life easier. The old-timers had to figure out the hard way all the tricks and techniques we now take for granted; lucky folks who came in later (myself included) can benefit from their sweat and tears.

In the end, when your skill using standard-based design eclipses your skill using old-school table-based methods, you’ll look back and marvel at how much more sense it makes to layout a page with CSS. Oh sure, the actual methods used might not make much sense (considering there are better options in the CSS2 spec than what we currently use) but we have a major browser manufacturer to blame for their lack of support.

Uh, I seem to have wandered. Sorry, that’s another thing you’ll have to get used to: unbridled hostility against Internet Explorer. It’ll take some experience to understand why, but you’ll know exactly what we mean in short order.

So. On to the practical information you can start using immediately. First, run and buy Designing With Web Standards. Don’t even think about it, just do it. Got it? Good, now don’t let it collect dust — read it. Everything I’m about to cover is detailed in DWWS. It’s equal parts manifesto (WHY would you want to do this?) and tutorial (HOW do you do this?). You need it.

Now. The first thing to do is get into an XHTML mindset. Whether you choose HTML 4.01 or XHTML 1.0 Strict (there are reasons to go with either; ignore them for now, and then ignore them even more, until you’re ready for some mind-numbing drudgery), it all begins with a DOCTYPE. Telling a browser which language your document is marked up with isn’t only a good idea, it prevents unwanted rendering glitches that will otherwise drive you crazy. Think of it like this: if I want to fly to Chicago, I have to tell my travel agent of choice where I want to go. Sure, it might be fun to take my chances and see where I end up, but it’s not exactly practical. Setting a DOCTYPE ensures I get to my destination of choice without unintended side-trips to Vienna.

Next goal: well-formed markup. This is pretty easy to grasp, actually. Always quote your attributes (ie. <a href="link">, and <input type="text" />). Don’t improperly nest tags; close them in the order you opened them. And if you open a tag, close it again; every tag, or element, requires an opening and a closing correspondent.

A quick diversion here: somewhere along the way, tags became ‘elements’. Same syntax, different theory. Call them what you will, the proper label is now element; maybe it always has been. I don’t know. No one ever explained this to me.

So anyway, each element needs to be closed properly. If you’re using HTML 4.01, this doesn’t include stand-alones like <br>, <hr>, and <input>. With XHTML, it does. There’s a hack-ish, but practical syntax that degrades gracefully in older browsers: just add a space and a slash to the end. <br> becomes <br />, for example.

Now there’s one more little confusing thing about XHTML attributes: they always need to have a value, even when that value doesn’t make sense. Example: <input type="radio" checked="checked" />. You can get away with just checked in HTML 4.01, but XHTML needs the redundancy.

Finally, XHTML requires all your code to be in lowercase. HTML doesn’t differentiate, but XHTML uses XML syntax, which is case-sensitive. And the case ends up being lower.

That’s it for markup! You’re done! Take a breather, grab a cerveza, and chill out for the afternoon. Because that’s only step number one.

Now that we’ve got you writing proper HTML/XHTML, try running it through the W3’s validator. If you’ve crossed your i’s and dotted your t’s (or in this case, quoted your attributes and nested your elements) you’ll achieve a nice yellow-on-blue success message. Learn to love this colour/text combination, it can be your best friend.

Why is validating important, or even relevant? Because poorly-written markup is completely unpredictable; you end up relying on a browser’s error-handling, and even though most browsers are very good about this, it’s a bad practice to rely on it. Hey, it’s what got us into the non-standard, proprietary browser wars in the first place. (Microsoft was able to compete with Netscape in the Netscape-dominated world of 1995 because IE was built to handle errors exactly the same as Netscape.)

The single point is, validating helps you spot errors in your code, which in turn ensures more consistent rendering of your page. The very first technique I try when debugging problematic layouts is validating my code. You should too.

Yeah, okay, it’s tough when you first validate your first site and get back a list of 78 arcane errors. Unfortunately, though the validator helps, it’s not perfect. It’s run by volunteers who are doing their best, but sometimes the error messages are as helpful as Vogon poetry. The good news is that problems cascade, and if you can find one missing </p> tag, for example, a lot of the time you bump that number down to 24 errors without doing anything else. The short of it: it looks bad, but it’s often not.

Now that you’re validating, you’re following the letter of the law. But as with many real-world laws, at this point you’re adhering to only the rigid guidelines of the rules and missing the bigger picture about why those guidelines are there in the first place.

The next step is to take this well-formed markup structure you’ve built for your document, strip out the presentational attributes that have been deprecated in some of the more recent DOCTYPES, and move the presentation into a completely separate file. This is the infamous separation of structure and presentation, and this is where CSS comes into the picture.

It’s like this: your text is content. Content is nice, but without any hints about the content’s structure (which includes things like spaces and headers and lists) you end up with a jumbled mess of text. Completely unusable. Structure is an extra layer which breaks down that messy text into logical groupings and organizes them in a way that conveys extra information about individual elements in that document. How that information looks may be implied by the structure (for example, you’ll most often find that a primary page heading will be larger than the body text) but it doesn’t dictate it any further.

That’s where presentation comes in. Presentation is the formatting cue that tells the primary page header to be red, italicized, and 150% of the body copy’s size. Presentation is an extra layer of information on top of a document’s structure, that builds up the (non-visual) structure into something far more appealing to the eye. CSS is the presentational layer, and it can take a very simpy marked up document, and turn it into something amazing — view the css Zen Garden for a live demonstration of this in action.

So what’s the best way to start separating your presentation out of your structure? Consider any HTML element or attribute that offers a visual cue to be an offending piece of legacy code. It’s time to start killing those bgcolors and <center> tags. Here’s a pop quiz:

In each of the following examples, which attributes and tags would you remove to eliminate all traces of presentational structure?

  1. <center><h1><font face="Verdana">This is my first web site.</font></h1></center>
  2. <table border="0" cellpadding="0" cellspacing="0">
  3. <body bgcolor="#ffffff" topmargin="0" leftmargin="0" marginwidth="0" marginheight="0">
  4. <td bgcolor="#ffffff" valign="top" align="center"><p>They're coming to take me away...</p></td>

Got your answers ready? Great, compare to the list below. These are proper structural elements without a trace of structural formatting:

  1. <h1>This is my first web site.</h1>
  2. <table>
  3. <body>
  4. <td><p>They're coming to take me away...</p></td>

That’s it? That’s it.

While it’s not explicitly stated in any spec, a further goal of this separation is to use the proper elements for the job. Using a table to layout a page is then, by this definition, an improper use of a table. In the above example, it might have even been prudent to go further and remove the <table> and <td> elements, but it’s hard to say without surrounding context. Tables aren’t deprecated; they can still be quite useful. But they should be used properly — to contain data that’s structurally tabular.

So we’ve stripped the formatting from our page. Hooray. What now? Those are some ugly elements, all Times-New-Romaned and linear. Where’s the interest? Where’s the compelling visuals we were promised?

Head back to the Zen Garden example. See the lovely designs? See how different they are? The key here is that underneath each is the same XHTML, just as bland as your currently-unformatted document. No really, it is.

Having a bland and ugly base is a good thing, in fact. What you may have noticed is that this unformatted HTML looks an awful lot like the web of 1994 (if you were around back then). In fact, with a few notable exceptions, that’s what it is. This stuff is as old as the web itself. <h2>’s have been around since the days of Mosaic. Which, as you may have guessed by now, will still handle a properly-structured page with surprising fidelity. Try saying that about a late-90’s table-fest.

The benefits don’t end there of course. Accessibility to those with special needs is almost free, search engine optimization is built-in, bandwidth costs go way down (along with development times), and on and on and on. Jeffrey Veen wrote up the Business Value of Web Standards late last year, and Roger Johansson expanded on the benefits and techniques of standards-based design with his recent Developing With Web Standards.

CSS is well-supported across all the major browsers today, and there are countless resources for learning the syntax, the basics of CSS layout, and the advanced theory behind high-impact CSS-based design. I’ll point you to a few of the better ones: WestCiv offers an ongoing, free CSS course that will help you get started and bring you up to speed in a hurry. Andrew Fernandez has indexed a huge listing of CSS resources that should help you no matter where your skill level is. Eric Meyer has written a bunch of books that you should have on your desk, including the project-based Eric Meyer on CSS and its follow-up More Eric Meyer on CSS (no, really, that’s what it’s called). His O’Reilly-published reference CSS: The Definitive Guide is in its second edition, and should be on your desk. Also check out Molly Holzschlag’s CSS: The Designer’s Edge, and Chris Schmitt’s Designing CSS Web Pages.

Going into the ins and outs of applying CSS and building layouts could take me many times the space I’ve already used up so far and some financial motivation. We’ll cut it short here, considering it’s taken a Herculean effort on your part to even get this far…

Getting plugged in is probably the single biggest piece of advice I can give anyone looking to get a start with web standards. Through ongoing reading and sharing of what you know, we all grow as a community. There are many of us active in the development community, and there are bound to be many more times coming on board over the next few years. We have a global communication network at our disposal; make sure to use it.

And if nothing else, we’re here to commiserate when you hit the wall. Hey, it happens to all of us.

Reader Comments

Bart says:
April 30, 01h

I’ve started designing with XHTML & CSS for about a half year now. Everyone told me XHTML and CSS was so great I would never turn back to tables. I never believed them because tables were great but now I love CSS and havn’t used a table for layout in months. Once past the frustrating first months, it gets real easy.

April 30, 02h

Great article. I’d just add a couple more resources that have been invaluable in getting up to speed and staying current with all of the recent developments.

1. Get yourself a good RSS aggregator (FeedDemon, Shrook, etc.) and start subscribing to all of the various blogs your likely to encounter during your research. I’ve found this to be the single best way to stay current with the many developments that are taking place in the standards community.

2. Take a look at O’Reilly and Associates Safari service. Many of the books mentioned in Dave’s article are available online. It’s a great way to affordably sample a large number of books on the various technologies you’ll be researching.

April 30, 03h

Great work, very useful to have this around.

I suggest putting this up on the Essentials page here, so that everyone can easily find this handy-dandy resource / introduction to moving towards proper, standards-aware webdevelopment.

Yannick says:
April 30, 03h

I must say well written article and it contains lots of info. That is exactly what I was looking for. I too want to take the plunge into web standards design and so this was definitely a worthwile read and helpful. Thank you very much.

scrapmonkey says:
April 30, 04h

great little article. thanks. but this tangent caught my attention:

“A quick diversion here: somewhere along the way, tags became ‘elements’. Same syntax, different theory. Call them what you will, the proper label is now element; maybe it always has been. I don’t know. No one ever explained this to me.”

…am i the only one who is concerned about this? the proper way to refer to things just seems to change every now and then, and even though no one seems to know why, or even how, they are quick to make sure everyone else does it.

there almost seems to be a subtle strain leaning towards groupthink in the standards community which could prove to be a liability in the future, not to mention simply unfortunate for all the reasons that groupthink anywhere is unfortunate :)

April 30, 05h

HTML has used the term “element” since it was first formalised as an SGML application in HTML 2.0, and probably before that. The proper term has been “element” for at least the past nine years:

It seems to me that the proper terms are coming into more widespread use as people actually start paying attention to the specifications. The past few years seem to have started a shift away from the “bung a load of code together, and if it works in Both Browsers™, sell it” attitude.

Georg says:
April 30, 05h

Great article…

BTW: HTML Tidy is still a good tool, and gives you more feedback than the validator.

Denis says:
April 30, 08h

Great job! Thanks a lot for summarizing the WS process so well.

April 30, 09h

This is really well put Dave, an excellent resource for all beginners. I especially like the bit about Vogon poetry, kind of drew it all home for me ;->

Andrew says:
April 30, 09h

Thanks Dave, I’ve been trying to explain this sort of thing to people for ages, and have always made a bit of a mess of it. What with the Zen Garden, which has always been a fantastic resourse for showing what can be done with CSS, this has just made my life a whole lot easier!

April 30, 09h

All roads start with Zeldman.

web says:
April 30, 09h

This is a great resource for beginners, I have a few friends that have started this journey this month. They just received an email link.

Thanks Dave, U da man!

Manuzhai says:
April 30, 09h

Forwarded it to a friend who was asking about this.

April 30, 10h

That’s a really good explanation, Dave. A couple of comments:

Tags vs elements vs element types:

<p> is an element type.

A particular <p>…</p> section of a particular document is an element.

The tags are the actual delimiters; it’s rarely useful to talk about them, people usually mean “element” or “element type” when they talk about “tags”.

Think of the relationship between “element type” and “element” being the same as the relationship between “class” and “instance”, and “tags” being the equivelent of curly braces.

XHTML attribute redundancy:

I believe you need things like ‘checked=”checked”’, because in HTML, a bare ‘checked’ is actually the attribute value, not the attribute name. It would make more sense to have something like ‘checked=”true”’, but that would void the HTML compatibility.

Sean King says:
April 30, 10h

Thank you for this entry! I’ve been working on an outline to teach Standards to my co-workers, and this will definitely help.

I’ve also found that Paul Scrivens’ Resources page on CSS Vault has a bunch of great links, and Roger Johansson has put together ver comprehensive

April 30, 10h

comprehensive and succinct. not bad at all…

Sean King says:
April 30, 10h

Sorry, I didn’t read you’r comment policy before putting in the links. Here they are again:

Roger Johansson “Developing With Web Standards” -

CSS Vault Resource page -

Mike P. says:
April 30, 10h

‘you will be frustrated along the way’

For sure, but it’s worth it. In the end, things make sense. Most of the time ;-]

Great write up Dave.

April 30, 10h

Great standards intro. But if you’ve crossed your i’s and dotted your t’s you better look into other, more basic training.

April 30, 10h

Thanks Dave. I guess at this point I can say that you can teach an old designer new tricks. (Ans you should!)

I appreciate all the info. I have ordered Zeldman’s book, and it will be here soon. It’s going to be a lot of fun/pain/angush.

I guess I’m a sucker for punishment.

Thanks again!

Craig says:
April 30, 11h

Isn’t the correct answer to #2 to remove the entire tag? ;)

theresa says:
April 30, 11h

I think this really needed to be written. The best part is that it’s in plain english, so that anyone can understand it. Great job.

April 30, 11h

Tags vs elements:

<p> and </p> are tags.

<p>Some text</p> is an element.

In HTML some elements (in some circumstances) may omit their closing tags.

In X(HT)ML, all elements must have closing tags, though empty elements can have abbreviated “short” tags: <p></p> is equivalent to <p />. Note that this abbreviation, is strictly speaking not legal HTML. In HTML, the closing tag is often optional (and hence can just be omitted), but it can’t be abbreviated as above.

“Whether you choose HTML 4.01 or XHTML 1.0 Strict (there are reasons to go with either; ignore them for now, and then ignore them even more, until you’re ready for some mind-numbing drudgery)”

Aside from the bit about short tags, your advice holds equally well for both HTML 4 and XHTML 1.0. But once you tell people to write <img /> instead of simply <img>, you’d better tell them to slap an XHTML DOCTYPE on that puppy.

April 30, 11h

N.b.: I’m not advocating that they *should* use XHTML. Rather, whichever DOCTYPE they choose, they should adhere to the syntax of that specification.

If they’re authoring HTML 4, they should write <img> rather than <img /> and conversely if they are authoring XHTML.

April 30, 11h

Great write-up. Now we need two more articles from you. 8^)

The first is a clear, concise introductory explanation of the box model for CSS formatting rules. The CSS and XHTML specs do a woeful job here.

The second is a simple introduction in where the main gotchas are with default attribute values. For example, which element types are block and which are inline by default. What elements by default need to be contained within a <p> or <div> tag to be standards compliant without changing that the display attribute in the CSS for the element. What elements have attributes that you must clear in the CSS in order to get the desired result?

These two things created issues that tripped me up the most when switching to standards compliance.

April 30, 12h

I would mention Layout-o-Matic, at as this gives a good start to CSS layout.

I would also have mentioned List-o-Matic, at which is based on Listamatic, at

These are both really good tools to let people try stuff quickly and have a starting point to play around with CSS.

May 01, 01h

Dave S. wrote:

“I purposely glossed over the MIME type issue because it’s completely moot in 2004. This isn’t the time or place to argue the merits of HTML over XHTML.”

I just brought it up so that people who want to serve their XHTML with the proper MIME type can go research more themselves. It was meant more as a note than anything else.

Besides, you can use server side magic to serve it as text/html to legacy UAs and application/xhtml+xml to the rest.

May 01, 03h

It should be noted that as of right now, XHTML should be avoided and you should stick to HTML 4.01 Strict unless you have a real need for XHTML (like MathML). XHTML doesn’t provide any advantages over HTML (as far as I’ve seen) besides aformentioned ability to include other XML namespaces in it.

In addition, XHTML should be sent as application/xhtml+xml, which Everyone’s Favorite Browser chokes on.

In short, don’t use XHTML unless you grok it.

May 01, 09h

Re: XHTML vs HTML. According to XHTML 1.0 MAY be served as text/html, XHTML 1.1 SHOULD NOT, unless you have a very good reason. It isn’t MUST NOT, which means something quite different.

I serve my XHTML 1.1 as text/html. My very good reason is that Everybody’s Favourite Browser would otherwise choke on it :-)

I prefer XHTML 1.1 as it is the most strict flavour - which helps to keep me on the straight and narrow! When I was converting to a CSS layout, using XHTML 1.1 helped me pick out most of the layout stuff in the HTML so I could put it where it belonged - in the style sheet.

Dave S. says:
May 01, 09h

Um, hello?

“(there are reasons to go with either; ignore them for now, and then ignore them even more, until you’re ready for some mind-numbing drudgery)”

I purposely glossed over the MIME type issue because it’s completely moot in 2004. This isn’t the time or place to argue the merits of HTML over XHTML.

Let it go.

May 01, 10h

“I purposely glossed over the MIME type issue because it’s completely moot in 2004. This isn’t the time or place to argue the merits of HTML over XHTML.”

It’s not moot, it’s simply irrelevant to what *most* people want to do with (X)HTML. Those who care about the *technical* benefits of using XHTML care about MIME-type issues.

For the other 99% of the web-authoring population, MIME-types are irrelevant and only reason to prefer XHTML over HTML is that the former has a simpler syntax, presumably making it easier to learn to author *correctly*.

However, as Evan Goer discovered,

even that is ridiculously hard for most people.

Michael Williams says:
May 01, 10h

Hi Dave,

thanks for this. I was about to write a similar article with a slightly different audience in mind. In fact, reading your article, I think it’s suitable for both the table-jockey and the web virgin, so I probably won’t bother! I’ll paste the foreword here though, so you can see what I had in mind. Please don’t inerpret any of this as a dig at your fantastic roadmap, which I hadn’t read when I wrote this! I’d be interested in any comments.


Imagine this is a school textbook. If you’re the kind of person who didn’t read prefaces in textbooks, don’t read this.

This article is written for an audience free of the emotional baggage of table-jockeys and the histrionics of gurus; a virginal audience that just wants to learn how to design modern web sites. This is me, one year ago.

It neglects to mention that there was ever a way of doing things other than the accessible, usable, semantically-meaningful, valid way striven for today. There are no snide references to broken browsers (except when the extant version is still broken). It doesn’t impeach the reader for sins they never committed.

One year ago, all these things elicited from me were, “Huh?”

It avoids these history lessons to provide a well-marked path of least resistance to modern web design for the untainted but computer-literate. If you initially take this path but wander off into the pretty woodland and dangerous swamps then your mileage may vary. But if, like me, you get off on acquiring useless knowledge then you may have more fun.

For me, one year ago, the many references to not doing it like we used to, or to educating clients stuck in 1997, or to doing the Right Thing were strictly *unnecessary*. It is *possible* to teach the Only Thing.

As an example, mentioning that tables were ever used for layout to someone who didn’t previously know this serves three purposes, one good, and two bad:

- It’s a useful counter-example when explaining semantic design
- It buries a seed that table layout is quick and easy (which is the oft-cited response of the Platonic antagonist the writer is attempting to convert).
- It takes up time that is initially better spent learning the basics.

I’m not suggesting the bad old days should be forgotten or hidden. Security through obscurity does not work. I’m not saying there isn’t a very large audience that needs and wants to be *re*-educated.

I’m merely trying to provide a quick and easy route to our way of doing things for web virgins, lest they do it another way, or perhaps even worse, they don’t bother at all.

Ruth says:
May 01, 10h

Fantastic intro to web standards. I wish I had read something like this a few years ago; it would have saved me a lot of time and frustration. Thanks for this post.

[offtopic] “Vogon poetry!” I can’t get this grin off my face. I wonder how many of your readers have read Douglas Adam’s increasingly inaccurately named Hitchhiker’s trilogy? [/offtopic]

Dave S. says:
May 01, 10h

Point taken, wrong word used. This article is for the 99%, so we’ll go with “irrelevant”.

May 01, 12h


So, let me expand on my other point: XHTML is easier for people to learn than HTML.

Consider the following three elements:
a) <img>
b) <img></img>
c) <img />
Which is valid HTML? Which is valid XHTML? And how can you remember the answer?

In XHTML, the rule is simple: every element has a start tag and an end tag. Empty element can be abbreviated with short tags. So b) and c) are valid XHTML, but a) is invalid.

In HTML, some elements have mandatory start and end tags. Some elements (in some circumstances) have optional start/end tags. For some elements, the end tag is actually *forbidden*.

The image tag is one of the latter. So a) is valid HTML, and b) is invalid. [c) is never valid HTML. That one, at least, is easy.]

I’ve made this sound a bit worse than it actually is. The elements for which the end tag is FORBIDDEN are, to the best of my knowledge, just those elements declared EMPTY in the HTML Spec.

But, for God’s sake, poring over the Spec should not be a prerequisite for learning (X)HTML. The syntax of (X)HTML is complicated enough [The <blockquote> element contains block-level content. The <q> element contains inline content. Geez! Which elements are block-level? What’s inline content? …] Complicated rules for start and end tags don’t help matters.

May 02, 08h

That was a great intro, and those articles that you linked to will be a great help to sell web standards to others. I’ve been working with web standards for a couple months now, and it can be frustrating, but once you get the hang of it, and get used to closing tags, quoting attributes etc… it comes pretty quick and easily.

For people who think it is too hard, or takes longer to code… it really isn’t once you get the hang of it… plus it looks so much prettier when you view the source of your page and it is all nicely formed as opposed to the blob of text that tables and dynamic sites can generate.

May 02, 09h

Oh yeah, the other thing that is great about using web standards, is that you get used to not using browser-specific tags, and it is much much easier to create cross-browser compatible sites…. now IE is the real pain in the ***…. it would be nice if Microsoft added some standards support to IE 6 with XP SP2…. but it probably isn’t too likely… so I think we’ll be dealing with it for a while to come.

Jimmy C. says:
May 02, 09h

Jacques Distler,

The XML flavor of HTML has many benefits over SGML-ized HTML. I’m currently writing some JavaScripts* which fix IE’s CSS handling to some extent… but it’s necessary to use some XML features in the script**. That’s impossible, by definition, in SGML.

Furthermore, XHTML is more portable beyond the web because it is “just XML”: XSL (FOoT) style sheets can used to make PDF, Docbook, or other XHTML web pages. Several of the high-quality and cheaper (than SGML) XML editors can be used to edit and store those documents. And there are no document structure ambiguities (unlike HTML which could have no html, head, or closing tags among others), which makes XHTML easier to parse in the future (when you have to deal with legacy document in XHTML 1.0, and transform them to XHTML 10.01A, etcetera).

Jimmy C.

* (Inspired and partially stolen from Thank you Dean.)

** (If you must know, I have to add custom elements via xml namespaces - it breaks validity, but that’s not as important for JavaScript-generated content as it is for human-generated content.)

May 02, 10h

This is a tremendous article, and addresses some of the issues that I’ve been debating on a recent project of mine ( in which I’m trying to work out a basic standards-compliant “template” of sorts.

Standards should be fundamental to web development. I’ve been heartened by the noise made in the past couple of years about standards-compliant design. Generally no longer seen as an obscure arguing-point by fringe elitists, standards are becoming an accepted part of the web development scene. CSS-based design has achieved a firm and permanent position alongside table- and graphic-based design – and one is slowly outpacing the other.

May 02, 11h

“but it’s necessary to use some XML features in the script**. That’s impossible, by definition, in SGML.”

I’m not sure what you mean, as XML is a strict subset of SGML.

Furthermore, unless you serve your document with an XML MIME-Type (application/xhtml+xml), then it will be processed by the browser’s SGML parser in *all* current browsers.

“Furthermore, XHTML is more portable beyond the web because it is ‘just XML’: XSL (FOoT) style sheets can used to make PDF, Docbook, or other XHTML web pages. Several of the high-quality and cheaper (than SGML) XML editors can be used to edit and store those documents.”

This is an imortant point. Just because it is *possible* to do everything with SGML that you can do with XML, doesn’t mean there will be off-the-shelf tools for doing so.

Just as the simpler syntax of XML makes it easier for humans to learn (IMO), it has also become more popular to write tools for.

Thus there are many more technologies and automated tools for manipulating XML documents than SGML documents. In the long run, that certainly makes it a better platform (unless you are willing to write your own tools, or pay through the nose for someone else to write them).

That’s a powerful argument, but not one that is relevant to the “99%” Dave was aiming at. As a rule, the people who care about namespaces and other advanced features of XHTML are the same ones who care about MIME-Types, etc. — the sort of esoterica that Dave explicitly wanted to sidestep.

“And there are no document structure ambiguities (unlike HTML which could have no html, head, or closing tags among others)…”

These are *not* document structure ambiguities. The rules of SGML syntax are more complicated than XML, but they are not ambiguous.

See, eg,

for a discussion inspired by this comment thread.

Jon Hicks says:
May 03, 02h

At last! Something to redirect all those people emailing me for an answer to “How?”. You’ve saved me a lot of time…

May 03, 02h

Have you seen this post Robert Scoble will pass comments on to the IE team!!!!

Gotta be worth organising some rational postings there and maybe, just maybe, stuff will happen!

May 03, 06h

“Robert Scoble will pass comments on to the IE team!!!!”

And what, the IE team is sequestered in a deep underground bunker with no access to the Internet?

I mean, the bugs in IE are well-known. Entire cottage industries are devoted to developing workarounds for IE’s broken CSS support, IE’s lack of support for PNG transparency, etc.

The IE team is *surely* already aware of the serious bugs in their browser. They’d have to be either brain-dead, or cut off from the Internet to be unaware.

I don’t doubt Scoble’s sincerity, but this is a little like those non-functioning thermostats they install in individual offices in large office buildings to give the occupants an illusory sense of control over their environment.

Trent says:
May 03, 08h

Re: Jacques [43]

Personally, I can’t even get excited about the IE team “fixing” bugs. So what if they do? Is that going to change our coding habits? Probably not. IE6 is going to be the defacto browser for YEARS. The way I see it, IE7 will just be another browser that needs to be tested (although I don’t know *how* I will do that, since I don’t plan on upgrading…)

My assumption is that we better get used to the current browsers, because this is the way it is going to be for awhile. Firebird 3 and Opera 15 might be fantastic, but I still fear we will have to deal with IE6 in a decade. Heck, many organizations that should know better still have Netscape 4.7 installed.

May 04, 01h

Like many other posters (we’re all up on a wall somewhere?!), I recently discovered the thrill of standards compliance. Now I wonder, how can we get more web developers to abandon the non-compliant practices that are so prevalent?

brad says:
May 04, 06h

Thanks for the summary, Dave. I am now adhering as best i can to standards, using xhtml 1.0 trans and an external stylesheet. my sites validate … for now. but i have only recently begun to see references to MIME types. can someone point me to a reference about what a MIME type is and why it is important? if i don’t consider / incorporate MIME types now, am i missing something? thanks in advance.

May 04, 07h


and references therein. The summary table

tells you the way things SHOULD be done. Ian Hickson’s famous, if somewhat misleading, article

explains *why*. (The bottom line is that, if you serve XHTML as text/html, *even* if you validate everything, your site may still break horribly if you were to switch to application/xhtml+xml. The only way to ensure your site will work when served with the correct MIME type is to actually serve it that way.)

Right now, Gecko-based browsers (Mozilla, Firefox, …) and IE6 with the MathPlayer 2.0 plugin installed can handle application/xhtml+xml. Most everyone else should still get text/html.

May 04, 07h

Great article, I shall be pointing a few colleagues in this direction. I have one little comment to make, I thought this line in parapgraph 9 was a little confusing “close [tags] in the order you opened them”.

This should read “close [tags] in the REVERSE order you opened them”. If you closed your tags in the order you opened them, you may end with something like this:

<p><b><i>Hello World!</p></b></i>

brad says:
May 04, 08h

thanks, jacques. very helpful.

k. says:
May 05, 06h

*how can we get more web developers to abandon the non-compliant practices that are so prevalent?*

Good freakin’ question. What do we do with a web designer like my boss who says things like “Don’t reinvent the wheel”? You know what? If no one ever reinvented the wheel, we’d all be driving around in Flintstones cars.