Menu
My Say: Rebooting HTML for the Semantic Web

My Say: Rebooting HTML for the Semantic Web

"Making standards is hard work," writes Tim Berners-Lee in a recent blog post. And he should know. The creator of the World Wide Web, Berners-Lee is responsible for developing and popularising some of the most significant open standards in computing. His current project, the Semantic Web, is an attempt to carry Web standards to a level beyond anything we've known so far. Its goal is to transform today's Web into a semi-intelligent network of information resources, machines analyse and understand the meaning of information. If successful, it will absolutely revolutionise information retrieval. And the key to its success is the rigorous application of standards.

But there's a catch: It's hard enough to get people to comply with the standards we have already. HTML, in particular, has a troubled past. The headaches began in the bad old days of the browser wars, when competing browser makers would implement the specifications in dubious ways and add nonstandard features to their software. Confounded by conflicting results, developers got into the habit of writing code that worked, no matter what sins against the standards they would have to commit.

Some years ago, the engineers at the W3C reasoned the best way to get back on track would be to start with Web developers. Get developers to write HTML that adheres to the published standards and end-users would naturally gravitate toward browsers that did a better job of implementing the standards. In turn, this would create incentives for vendors to make standards compliance a top priority. It was a logical enough plan. The folly lay in its execution. Because the way the W3C chose to reach developers was with -- you guessed it -- another standard.

Enter XHTML. A successor to the original browser markup language, XHTML combined the vocabulary of HTML with the syntax of XML, and in the process it stripped away many inconsistencies and bad coding practices HTML developers had accumulated. XHTML actually has a lot going for it. Because of its strict syntax, it encourages more rigorous coding. It is also easy to use automated tools, so that Web developers can know when they've made errors, as programmers do. What's more, it encourages the use of CSS (Cascading Style Sheets), which helps to keep actual Web content separate from the details of how it is presented onscreen.

The problem? "The attempt to get the world to switch to XML ... all at once," writes Berners-Lee, "didn't work". In other words, very few Web developers use XHTML. Or if they do, they don't use it properly. Berners-Lee blames the browsers for not requiring well-formed code, but his colleague, CTO of Opera Software, and inventor of CSS, Hakon Wium Lie, believes there's more to it than that. Lie suspects XHTML is unpopular because it tends to "punish the good guys" by being too rigid and unforgiving in its syntax. Writing good XHTML is laborious, a pursuit better suited to engineers or library science majors than Web designers.

So it's back to the drawing board. In his post, Berners-Lee announced a brand-new working group within the W3C that would once again try to address the challenges and shortcomings of HTML, while working on the XHTML standards in parallel. The new group will take input from engineers, browser vendors, and Web developers, and make incremental improvements to the standards.

It's a good step. But it does make me wonder about the future of Berners-Lee's vision of the Semantic Web. The lesson learned from XHTML is that, when it comes to standards, just because you build it doesn't mean they will come. And yet, XHTML is only the beginning of the standards compliance that the Semantic Web would require. If the Semantic Web is to succeed, it will have to find ways to accommodate human nature, and not just good engineering.


Follow Us

Join the newsletter!

Error: Please check your email address.
Show Comments