Skip to: Navigation | Content | Sidebar | Footer

Weblog Entry

Semantics and Bad Code

August 26, 2003

Jason Kottke has written a thoughtful piece on the difference between standards and semantics, and if you’re just learning this stuff by way of my site (heaven forbid) or some of the many others out there, it’s worth your time to read it and consider his points.

Let me start out by saying that Jason is absolutely correct. You are allowed to use <font> tags in your valid XHTML. You are allowed to continue designing with tables. As long as you lowercase your tags and watch your nesting, there is nothing stopping you from building your XHTML sites without a lick of CSS.

But if this is all you’re doing, you are like the driver of the VW Beetle carefully pulling to a stop at the light, failing to move for the Mack truck barrelling toward him at 120km/h. Sure he’s following the letter of the law, but the light was put there for his safety. He’s completely missing the spirit in which it was written.

The W3C has a commitment to backwards compatibility. XHTML is meant to work back to 4.x and older browsers. (Which is debatable given the 2.0 issue, but please, let’s focus.) This requires a certain amount of support for deprecated elements. Tables will always be with us, because they were created for tabular data; you will be able to author table-based sites for years to come. But stay with me now, here’s where it gets tricky.

XHTML is a clean slate: it’s HTML, but it’s XML. It’s the best of both worlds, but it’s halfway between. HTML has a long and ugly history of visual hacks, proprietary extensions, and general abuse. The browsers that support it are lax, overly tolerant, and encourage sloppy coding. XML, on the other hand, is defined by a tight set of rules; it is structurally complex, rigidly demanding, and doesn’t offer any presentation cues whatsoever. It is completely reliant on external styling, and in this case that styling happens to be CSS.

The move from HTML to XML requires a huge shift in developer mindset. There are a lot of obstacles to overcome yet, not the least of which being solid browser support. We’ve only started down the road, and XHTML is how we’ll get there. Here’s the path, in 3 steps:

  1. XHTML allows for easy introduction to proper coding methods by weeding out bad techniques, while requiring no more than a simple change in syntax. Drop your tags’ case, watch your nesting, and close everything. Easy. The first step is one which most are willing to make because it requires the least.
  2. Once developers latch on to this, they will realize sooner or later that the next step is to move toward a separation of content and presentation. This is where most are stuck today, and until the baseline browser moves up a generation from 5 to 6 (or 7), there will generally be dissent and lack of cooperation. Some are aware of this step, some aren’t, but it’s the next big paradigm shift and it has already started happening en masse.
  3. After making the prior transition, the next step is full-on XML rendering, which is going to be a real doozy. Validating only your home page and allowing lax code on other less-prominent pages is a practice that will have to stop. What happens when the switch is flicked between HTML rendering (which, chances are, is how your site is being parsed right now) and XML is that if you have an error in your code, just the slightest little bug, your site won’t render at all. You will see a happy little error message, and that will be that until you fix it. Seems a bit harsh doesn’t it? Well, consider that programmers have been dealing with compile errors from sloppy code for decades; welcome to their world. Not only does your own code have to pass muster, but code from third parties and external sources must play ball too. This one will take years, folks.

Standards, as Jeffrey Zeldman notes in the highly recommended Designing With Web Standards, are a continuum. You’ve gotta get on the bus sooner or later, but once you do there are a lot of stops along the way. You’ve made the transition to XHTML? Great! How’s your separation of content and presentation coming along?

Don’t stop with the easy stuff, this is one rabbit hole that goes real deep.

You’ve got help along the way - hey, I’m just finding out about this stuff myself. I don’t have all the answers, but the more I read the further I go into Wonderland. There are a lot of very knowledgable authors publishing on the subject, and as long as you keep learning, sooner or later we’ll all get there.

Reader Comments

Eric says:
August 27, 01h

Lots of good comments here, but I think you have missed the point that XHTML is not “HTML in XML.” XHTML as it is now is simply the foundation of the next generation of web pages. Ones with embedded SVG, audio links, meta information, etc. The W3C is just testing the waters, getting people to think in XML, letting browsers catch up, and then all these good things will come.

August 27, 02h

Well said. It’s going to take a long time but we’ll get there. Of course the more people who are convinced to use a safe, stable and up to date browser the better.
I’m of to convert some more, have fun.

August 27, 03h

“What happens when the switch is flicked between HTML rendering and XML”

i would posit that, as long as there are any major (read: generating income and/or owned by large corporations etc) sites out there which do not use clean xhtml, there will not be a “switch” to anything. even after the majority of the web has transitioned to xhtml, there will still be backwards compatibility requirements.
what i could imagine happening, however, is the creation of completely mew browsing paradigms, new software that may or may not look like a traditional browser, which leverages the strengths of xhtml (with all the rich possibilities on the horizon with xforms and co.). or perhaps hybrid applications which switch from modular xhtml/xml to “quirks mode” style html (not just in their rendering, but in the whole way a document is treated and processed).
yes, it will be a long process, requiring years…and by the time we think we’re done the standards will - as you touched on - already have evolved into something completely different.
however, the benefit of moving to clean xhtml/xml will be that then, hopefully, any valid documents will be easily (?) convertable programmatically to whatever the new standard may be. heck, you can already go a long way with XSLT today…
that’s were the strength of xhtml lies, in my opinion…it makes content far easier to repurpouse and transform compared to traditional tag-soup.

August 27, 03h

This article explains the principles behind XHTML perfectly. This really is one everyone interested in standards should read.

What are your thoughts on full-on XML rendering though? The advantage of the web is that any man, woman or child can make a webpage. The ease of HTML (and XHTML) are the reason for this freedom. Will the introduction of XML reduce accessability to the Web in terms of writing pages? As you said, any error will make the page unrenderable. This obviously makes it tougher on the casual designer.
Are we to rely on the implementation of XML web editing tools? FrontPage-Xml? I know I’m looking far into the future here…
HTML is flexible. More flexible than it should be, but it is this flexibility which has allowed it to grow at the enormous rate it has over the last decade. The internet never would have expanded as it has if the code required was as strict as C++ or Java.

And what is to happen to the reems of information that has been put in place in badly rendered sites over the history of the net? The fact that the site has been inexpertly created does not mean that it’s information is not valuable.

Just some thoughts…

M says:
August 27, 03h

Why not just write clean HTML, with all tags lowercase, nested properly, and closed, and content separate from presentation? Then you don’t have to deal with the mess that is Appendix C of XHTML1.0…

Dave S. says:
August 27, 04h

Eric - that’s the ‘Why’, as in, ‘Why are we moving to XML anyway?’ There are benefits for doing it, but so far it’s gone unexplained why shifting to an XML-based web is desirable. I guess it’s assumed to be obvious; thanks for pointing out that it’s not.

anon says:
August 27, 06h

I think the idea of css and xml is great, but web designers should know that there are things which presently are a real pain or impossible with css, which are easy with tables. To not use tables is to limit yourself with web design. Certainly you know this, and css proponents know this, but encourage people to use css exclusively. Fact is designing now is no less of a big hack than it was several years ago, if you want to go beyond columns of text. css is a great additonal tool is all, which makes doing borders, padding, and margins easier, and formatting text and backgrounds easier. People should not be given the impression otherwise. If you read a lot of css books and material people are encouraged to go all css. That is a bad idea if you are a designer, imo. Maybe I’ll come out with a book on how to convert an all-css site to tables…
In other words a mix of tables and css is the best for me now, but I had to figure that out after trying to go all css, after reading all the religious arguments. People look for a black and white solution I guess. That would be nice.

Jai says:
August 27, 06h

I’m inclined to think that this shift will not negate HTML, nor will it make the “next generation” browsers stop rendering badly coded sites. What I do see, though, is the possibility that this will cause a much needed casm (in my opinion) between the “real deal” web designers/developers and the casual “I want to show grandma pictures of her grandkids” designer/”developer”. Right now, employers often can’t tell a coding “rocket scientist” from a coding “kid with a chemestry set”. This shift will make a nice clean separation.

XML- either you code right, or you go home.

HTML - <sarcasm>you need to code this? I thought that’s what Word–> save as HTML was for! </sarcasm>

free says:
August 27, 06h

there’s much more to the subject than you guys are hitting here. first of all, dave, a few things you say are flat inaccurate:
1) “You are allowed to use tags in your valid XHTML” - not true in STRICT XHTML DTD…which is what a developer should probably try to achieve if they’re writing XHTML (unless they have to use TRANSITIONAL DTD)
2) “if you have an error in your code, just the slightest little bug, your site wonít render at all.” That was true years ago - and theoretically that’s how a standards-compliant browser is _supposed_ to react to invalid XHTML. But in practice modern browsers are still parsing and rendering bad XHTML (I think they are just falling back on HTML parsing)…
3) “[XHTML is] the best of both worlds, but itís halfway between.” No, it’s not “halfway between”…XHTML is XML through and through. It’s resemblance to HTML is there only so the community could hit the ground running with this new dialect…

And while it’s true the W3C is committed to backwards-compatibility, that is a secondary goal with respect to XHTML. In fact, the XHTML spec is more concerned with fowards-compatibility since it looks towards devices like mobile phones, or bluetooth enabled gadgets - and attempts to build a spec those cutting-edge browsers can utilize.

And I can speak from experience, if you code an XHTML/CSS site for standards-compliant browsers - you will have a big headache trying to make the site render correctly in Netscape 4.X. That browser sucks (=chocked full of bugs) and should be buried by the community (AOL did why can’t we?). For example, Netscape 4.X often drops CSS for no apparent reason. Or take the proprietary LAYER tag for example, and the lack of support for SPAN/DIV elements (which are crucial for heavy CSS layouts). Netscape 4.X just doesn’t cut the standard-based muster and should be completely abandoned - to the chagrin of all 486 users - and to the delight of web developers across the globe.

anyway coding in XHTML/CSS requires a new approach to web site development. you’ll find that as you code XHTML you will learn new tags that you never existed. you’ll learn principles about HTML, such as “block” and “inline” elements that have always been there - but you never knew about. regardless of what J Kottke says - you’ll learn about making semantically correct code as well as syntactically correct code. this will make the content on your site(s) more accessible, more portable to other devices, and easier to distribute to other publishers. On top of it the QA cycle for browser compatibility will be greatly reduced - and what developer wouldn’t want that!

August 27, 06h

The world of XHTML 1.0/1.1 is still unclear. Even (almost all) the X-Philes are still violating the specification (off course a lot less the i.e. But at least XHTML document are convertible, with XHTML1.1 or XHTML basic you can use modules and switch to XFrames/XForms easily. Or if you are some math/physics guy you could code MathML. This isnít possible with HTML, not that easy and backwards/forward compatible and above all accessible.

Owen says:
August 27, 06h

I can’t agree completely that XHTML is a “clean slate”. It is just a reformulation of HTML using XML’s rigour. But that’s fine, I think, and it will form the basis of the Web for many years to come, regardless of other XML-related developments. Browsers will continue to support HTML as well as add and improve support for newer, shinier Web technologies.

There is a need, though, to keep the notions of markup validation and robust document structure together in the minds of new (and not so new) Web designers, not as ends in themselves nor as sticks to beat each other over the head with, but as methods of ensuring that content is available and comprehensible to as many people as possible.

August 27, 07h

I’ve often wondered about the ability for the everyman to continue creating websites, as we more more towards a XML world. It concerns me. However, I always look at the programming world – remember when ANYONE could write a program? Remember when we did it in school? My personal favorite was:

20 GOTO 10

Anyway, the point is that this has happened before, and we got through it. Still, it concerns me. What I hope will happen is that HTML will be frozen at some point. Or maybe even parred down, and then frozen. Then, average Joes and janes who want to create a personal site can use the relativley simple HTML to make a relativley simple page. If they grow out of it, they’ll just have to learn XHTML or XML. It’s an option, anyway.

I also think that CMS systems are key here. Something like TypePad, LiveJournal, etc. will keep the everyday person able to have a voice. We need CMS systems like those that aren’t so blog-oriented, though.

August 27, 07h

There’s nothing about the future that will prevent ordinary folks from creating crappy web pages with their family pictures…Frontpage is still going to exist (God help us all…) as will other WYSIWYG editors. And browsers, even good, forward compatible ones, will still render old tag-soup pages.

This doesn’t make the web less accessible. Errors will make the page unrenderable, unless the errors are in HTML 4, for example. There is no browser, even the really compliant ones, that will refuse to render the billions of non-compliant web pages in the tag-soup world.

There are good architects in the world, but you can still buy land and build a really ugly house with a really bad floor plan. But those in the know can tell the difference between the good and bad architecture.

I think Jai is right on that the chasm will be more defined, and this will be better for the entire web community.

Adam says:
August 27, 07h

That’s a good point Jeff, as we move farther and farther away from Grandma’s HTML, CMS’s will, in a way, be the bridge for the common folk who couldn’t care less about this discussion we all enjoy having so much. Today, web-based CMS’s mainly deal with blogs because that’s the obvious starting point; 90% text, a header, a footer, a sidebar - it’s easy to control that type of layout with simple textfields and checkboxes. The challenge will be making an attractive, fast, usable, web-based CMS that could control a university’s site, or a small business, etc. There’s a lot of functionality questions with something like that (how would Grandma design the templates? how would she add a section?) that I don’t believe TypePad or TextPattern have even begun answering. But, I guess we should just be patient. First the blogs, and then … THE WORLD!!!

Jeremy says:
August 27, 08h

this kottke article is generating a lot of buzz.

i have already seen it mentioned on several sites this morning.

Dave S. says:
August 27, 08h

Re: XML and amateurs.

I touched on this in my DMXZone interview -

To quote myself:

“Services have been filling this gap for years. What we can’t lose sight of is in publishing HTML, you first have to grasp FTP and directory structures. This in itself has been a significant barrier to entry, and most amateurs never bother. Web sites like Fotki and Kodak’s Picture Playground offer a solution for photo storage. Now that weblogs are the Big Thing, all-in-one services like BlogSpot, TypePad, and AOL Journals will continue the trend.”

Patrick - the metaphorical switch is a site-specific difference in rendering mode. Some browsers (Mozilla) are fully capable of native XML rendering - but you have to turn it on. This is accomplished through special attention to MIME types.

Anne - ” Even (almost all) the X-Philes are still violating the specification” - how? By serving XHTML 1.1 as text/html to IE and lesser browsers? A necessary evil, and the alternative - destroying backwards compatibility - is a silly purist ideal that doesn’t solve any real world problems. Not that serving XHTML 1.1 does in most cases, but I digress.

MikeyC says:
August 27, 08h

free wrote: “That was true years ago - and theoretically thatís how a standards-compliant browser is _supposed_ to react to invalid XHTML. But in practice modern browsers are still parsing and rendering bad XHTML (I think they are just falling back on HTML parsing)Ö”

I think you need to get your facts straight before you start telling people that they are “flat inaccurate”. In point #2 Dave specifically stated: “full-on XML rendering” which implies sending out documents with a proper “application/xhtml+xml” MIME type and not “text/html” in which case a single mistake results in an error and the page doesn’t render. These “modern” browsers you are referring to aren’t “falling back on HTML rendering” but simply rendering the documents as HTML because they have a “text/html” MIME type.

Jai says:
August 27, 08h

Back to the tables argument again? Tables are called Tables because they should be used for Tabular Data. Sure, you COULD use them for layout, but it makes more sense to code your conent separate from your design. It’s not Gospel, but it mat as well be- it makes more sense and is far more flexible than table based layouts. Here’s a challenge for you “Anon”- take the CSS zen garden and make it table based. See how far you get. Trust me, it’s not worth the time.

But I’m not dissing tables at all. Tables are the PERFECT solution for tabular data. CSS doesn’t fly when it comes to these applications.

But for flexibility in design… tables don’t cut it.

MikeyC says:
August 27, 09h

Jai: “[CSS-P] makes more sense and is far more flexible than table based layouts.”

While you are correct and I completely agree that tables shouldn’t be used for layout purposes but instead to display tabular data, Anon’s point:

“but web designers should know that there are things which presently are a real pain or impossible with css, which are easy with tables”

His point is very, very true. There are certain limitations with CSS (at the moment) not only in terms of browser support for the recommendations but also in terms of the recommendations themselves. There are issues with centering horizontally + vertically on a page in CSS which is a piece of cake in table-based designs. There are issues with positioning objects below other objects when they are absolutley positioned (trying to create a footer below a 3-column layout that doesn’t result in overlapping, for example) and, of course, let us not forget general browser bugginess like floats and fixed positioning in IE.

CSS, as it stands, works very well when laying things out horizontally across a page, but there are some very real limitations to vertical layout.

August 27, 10h

Now I’m not talking about that IE thing that’s not us, that’s the browser. It’s about some hidden specifications:

First time a saw that page I confused SHOULD with MUST. If it was MUST it was a bigger problem (I’m talking about the xml way to include style sheets). But then again, using the correct mime-type is also specified as “should”:

A good thing of uncovering those “hidden” specifications is that we get to know what we have to do to validate against them.

Besides that we may validate our XHTML and CSS, but we never can follow the spec to the letter I think. Only the abbr and acronym are so difficult sometimes to separate. And if you all did that correct you always have the accessibility guidelines to fail on.

August 27, 11h

“if you have an error in your code, just the slightest little bug, your site wonít render at all.”

As much as I’m happy with progress in standards, I don’t believe browsers should ever behave like that.

A validator’s job is to validate against a standard. It gives pass/fail grades

A browser’s job is to do its best to render markup–even when that markup is broken. It can give warnings, or create goofy-looking layout. Basically, it can grade however it wants.

These two concepts should be kept separate.

Sure, you can have a browser that validates. But its main function (displaying stuff) shouldn’t be dependent on the validation.

August 27, 11h

You are talking about HTML this is not the case in XML.

You think it is a disadvantage but in fact it is not. If it is send as XML the browser only have to think right or wrong and not: “how do I fix this mess”, this will result in a much faster browser experience.

Besides that the errors are quickly noted by the authors of the site, which is a good thing IMO.

August 27, 12h

“this will result in a much faster browser experience.”

That’s not a huge consideration in real terms surely. My “browser experience” is limited by my 56k modem bandwidth.
Once I get a larger bandwidth, the browser delay will still not be noticable. I mean the speed with which a browser renders a page is negligable in terms of time taken to download the page, etc…

August 28, 01h

It’s a lot more fun than “click here”…

Seriously, I agree it is not good practice for accessibility. I think that in a forum like this, it is interesting and I enjoy exploring the links. To the uninitiated, I think it would probably be extremely confusing.

In fact, the reason for not using “click here” is to make the subject of the link the point of attention, not the way that it is arrived at (and not all users click!). The action is not important, the content is.

In this light, the “some” “of the” “many” thing is the same violation. The text of the link should not have higher importance than the link destination.

I wouldn’t use this technique on anything other than this sort of web forum - even I look at my status bar to see where the link will take me before I click it.

Jakob would not be happy with you, Dave… ;)

Eugene says:
August 28, 03h

I believe that one thing that will make CSS a better styling language (e.g., making the common footer-below-3-column-layout easier to do) is to push for the adoption of CSS-based websites. As more and more people use CSS in their sites, then there will be an impetus to improve the succeeding versions of CSS. If we all stick to table-based layouts just because many popular layout designs are easier to do with it, then there would be no reason at all to improve CSS.

August 28, 06h

To those talking about how tables allow things CSS doesn’t allow: Design is _always_ going to be limited by many variables.

Yes, there are a few things that CSS doesn’t do well. However, my site has a footer at the bottom that doesn’t overlap anything, and is done totally with CSS. It is possible. Even if something isn’t possible, I choose to limit myself to CSS because the benefits of this far outweigh a table-based layout. I maybe can’t do the exact same things that I could with tables, but I can do new things, and some limitations often allow creativity to be at its best.

anon again says:
August 28, 08h

“Hereís a challenge for you ďAnonĒ- take the CSS zen garden and make it table based. See how far you get. Trust me, itís not worth the time.

But Iím not dissing tables at all. Tables are the PERFECT solution for tabular data. CSS doesnít fly when it comes to these applications.

But for flexibility in designÖ tables donít cut it.”

Actually, for flexibility in design, dogmatism doesn’t cut it. I tried to make the point as diplomatically as possible but someone was bound to jump up and down I guess.

Now, when I wrote my comment, I had never seen the css zen garden. After writing it, I discovered it. What a great resource! But it doesn’t change my opinion. And my opinion is beware of those preaching an approach to design as the one true way.

But I admit those who encouraged me to push the limits of css are responsible for the css skills I do have - and I would be almost incapable of working without css. So the enthusiasm is great.

It is arguable that almost all data, or “content” as it is often called, is “tabular” if you’re getting it out of a database (you are aren’t you?) and putting it on web pages. But I know what you mean and it’s a useful distinction. In fact it’s sort of the key to the solution I’ve found for a particular database driven site - the tables are almost part of the data, and are then styled by css. Again, I found this by first doing as much as possible with css. But it’s nowhere near the ideal, and it is a worthwhile cause.


August 28, 08h

Talking about semantics, I am thinking of one more aspect, which some people seem to have missed. I was told not to use “Click Here” kind of phrases for links. On the same point, I’d like to draw your attention to the set of links at the top of this page:
“if you’re just learning this stuff by way of my site (heaven forbid) or some of the many others out there, it’s worth your time to read it and consider his points.”

Does this link text “some”, “of the”, “many”… make any more sense than “Click Here”?

Instead of doing this, shouldn’t we rather use citation style used in printed technical papers:
“some of the many others out there [StopDesign, SimpleBits, …]”?


mini-d says:
August 29, 07h

Correct me if I’m wrong, but FONT tag it’s deprecated since HTML 4.01, XHTML 1.0 Transitional it doesn’t support it.

September 01, 10h

I read this post and thread earlier in the day and, after pondering it, have come to the conclusion that Jai’s comment is, with all due respect, 100% misguided.

You will recall that he said that XHTML hopefully will “cause a much needed casm [sic] between the ‘real deal’ web designers/developers and the casual ‘I want to show grandma pictures of her grandkidsĒ designer/’ developerĒ. This is a troubling statement, and I hope XHTML and new design patterns are not being implemented for the sake of a web slinger pissing contest, but, rather, to arrive at a simpler, more elegant, more powerful method of creating content for this still emerging medium.

HTML was designed to be usable by everyone, and truly landed very near the mark (admittedly, before “everyone” was onboard and back when the web was pretty straight forward) until the browser wars made the matter overly confusing. If you are a web designer (with the emphasis on “design”) then you should be able to appreciate the beauty in making a simple task simple. If Jai’s prediction holds true then XHTML will be a complete failure in that it will have neglected to meet one of it’s basic goals.

As to Jai’s elitist tone, I fear that Grandma’s pictures of her beloved grandchildren, Uncle Larry’s poetry, and Aunt Gertrude’s recipes and all the “amateur” odds and ends that exist are more central to the web than pages about how to make pages and the personal sites of the weenies that make them. Give me one good reason why Grandma shouldn’t be able to easily post her pictures? Why does there need to be a more advanced tier of web development? It’s freakin’ markup, after all, not assembly code! Get yer kicks elsewhere, dood, if the simple, powerful web of the future isn’t making you the man! (I heard Chinese algebra is a real bitch! Juggling chainsaws! Breathin’ fire! Those’r real tough-like!)

After all, the web is about content, first and foremost. And lots of it! You can use whatever approach you think is best, but you’d best base those decisions upon their own technical merits as I can guarantee you that 99.9% of your audience will never know or care to know just how clever the code behind the content is. And why the hell would they?

Power to the People, Jai!

Joe Clark says:
September 02, 07h

Noel D. Jackson recoded in CSS for him. Time for somebody to eat his own dogfood.

Web says:
September 02, 11h

I think that developing as a whole is getting trickier and trickier. Especially since the addition of XHTML and CSS into the melting pot. Furthermore try to explain how to use div’s and linking style sheets to set up imaginary boundaries to the non-savvy - weekend developer. You might as well start speaking greek. There is defiantly a steeper learning curve when introduced to such things as opposed to HTML and table tags.

Unfortunately we can abandon almost all hope of Microsoft or any wysiwyg tool rendering semantically correct, XHTML complaint, human friendly code.

I can say for a fact that the general populous of developers worth their salt will start to develop better more accessible websites. But the weekenders will be producing their trash for a long.. long.. time.

qwerty says:
September 08, 09h

please, see my comment #25 at