Usher in a New Era of Web with HTML 5


The Internet and its usage are constantly evolving. Every single day marks the launch of new and interesting ways of accessing data and interacting with people,  pushing the boundaries of HTML in every vertical. The current version of HTML, 4.01, has been in use for almost a decade now. Yet, the publishers are constantly looking out for more evolved techniques to provide enhanced functionalities that have, till now, been restrained by the programming languages as well as the browsers.

To give authors more flexibility and interoperability, and enable more interactive websites and applications, HTML 5 introduces and enhances a wide range of features, including form controls, APIs, multimedia, structure and semantics.

HTML 5 will be the first major change to our lingua franca since XHTML 1.0 in 2000 (the latest being XHTML 2.0, released in 2002). You must have already seen the ‘HTML 5 Working Draft’ [] at the start of this year. The W3C HTML Working Group and WHATWG (Web Hypertext Application Technology Working Group) have been working extra time, trying to satisfy everyone in an open process. Not an easy task. Sometimes, amongst the concerns and the questions, it’s easy to forget that fact.

A lot of us believe that the introduction of the new specifications is just another preposterous attempt by the bigwigs in the browser arena to foist what they want, onto the developers. But then again, there are others who see it as the way forward to develop powerful multimedia Web apps on an open architecture, without Flash, Silverlight or similar proprietary technologies. As Doug Schepers, the W3C’s Team Contact for the SVG and Web Apps Working Group says, “HTML 5 is not a technical achievement, it’s a social movement.”

The varied opinions regarding HTML 5 are because it is much more than just a mark-up syntax for documents. The very name of the language’s specifications working group, Web Hypertext Application Technology Working Group, suggests how much HTML means to the Web. The original goal for HTML 5 was to make it easier to develop Web applications. There’s evidence of this in the rash of new JavaScript APIs and support for offline development, some of which are already available in a browser near you.

The elements

So, what is this set of new ‘elements’ of HTML 5 that’s making waves on the Web and elsewhere?

While there is a beautiful article on the Web [], by Elliotte Rusty Harold, that I would strongly recommend you read for a complete overview on the new elements of HTML 5, some of the more important aspects that I plan to discuss in this article are:

  • Canvas—inline SVG and MathML
  • Audio interface
  • Video elements
  • Offline Web applications
  • Drag and drop
  • getElementsByClassName
  • Web Forms 2.0

However, you can consult Table 1 to figure out which of the elements are already supported by today’s browsers.

The latest HTML mark-up

We’ll start by thinking about marking up a typical blog. Like the vast majority of sites on the Web, blogs comprise a header, some navigation (often a sidebar or two), a main content area, and a footer.

Currently, there are no ways in HTML 4 to mark up these elements in a semantic fashion—i.e., HTML 4 offers no footer or header elements of its own. Instead, they’re usually wrapped in a generic div element, a technique that is described in the HTML 4 specification []: “The DIV and SPAN elements, in conjunction with the id and class attributes, offer a generic mechanism for adding structure to documents. These elements define content to be inline (SPAN) or block-level (DIV), but impose no other presentational idioms on the content. Thus, authors may use these elements in conjunction with style sheets, the lang attribute, etc, to tailor HTML to their own needs and tastes.”

When developing the HTML 5 specs, the editor, Ian Hickson of Google, analysed over a billion Web pages to find out how authors were actually using these elements. He discovered that in the top 20 class names used in the mark-up for this huge set of data, were classes for common requirements: footers, headers, nav, menus, content, and main.

So the HTML 5 specs have a host of new, structural tags such as header, footer, nav, article, and section, which fit these common requirements and allow us to mark up our archetypal blog with more meaningful elements.

Refurbished HTML 5 Semantics
Refurbished HTML 5 Semantics

The <header> element

The header element is for page or section headings. This is not to be confused with a traditional masthead, which often holds just a logo mark, and should also contain one of <h1> to <h6> in hierarchical rank. It could also contain meta information for that page or section of a page like “last updated”, a version number, or blog entry information like published date, author, etc.

A simple example for a page using a semantic class name that corresponds to the HTML 5 header might be:

<div class="header">
    <h1>Page Title</h1>

You could include the logo mark and other meta information within the layer. The next example for blog articles includes information on the author and the date published (as well as an example of referencing the section and article elements with semantic class names):

<div class="section">
    <div class="article">
        <div class="header">
            <h1>Page Title</h1>
            <p>By <a href="#">Author</a> on [date]</p>
        [Article content]

The <nav> element

The nav element should contain a set of navigation links, either to other pages, or fragment identifiers in the current page. Referencing it with semantic class names is simple:

<div class="nav">
        <li><a href="*">Menu item 1</a></li>
        <li><a href="*">Menu item 2</a></li>

The <aside> element

The aside element is for content that is tangentially related to the content around it, and is typically useful for marking up sidebars.

        <li><a href="/2007/09/">Sep 09</a></li>
        <li><a href="/2007/08/">Aug 09</a></li>
        <li><a href="/2007/07/">Jul 09</a></li>

The <section> element

The section element represents a generic section of a document or application, such as a chapter, for example.

    <h1>Chapter 1: The Period</h1>
    <p>It was the best of times, it was the worst of times,
    it was the age of wisdom, it was the age of foolishness,
    it was the epoch of belief, it was the epoch of incredulity,
    it was the season of Light, it was the season of Darkness,

The <figure> element

The figure element contains embedded media like <img> and the new elements of <audio> and <video>. It also contains an optional <legend> element performing the function of a caption. Our semantic class name version could be like what follows:

<div class="figure">
    <img src="#" alt="*">
    <p class="legend">[…]</p>
HTML 5 is being written in two syntaxes: html and XML (Source: html5-is-html-and-xml.html)
HTML 5 is being written in two syntaxes: html and XML (Source: html5-is-html-and-xml.html)

Extensible semantics

Many pages around the Web use micro-formats to add more structured semantics than what’s available in HTML’s impoverished set of elements and attributes. In this case, the values used for the class attribute come from agreed-upon vocabularies, sometimes adopted from other standards, such as vCard, and sometimes from newly minted vocabularies where no solid pre-existing standard exists.

This is a real problem that needs to be solved here. We need mechanisms in HTML that clearly and unambiguously enable developers to add richer, more meaningful semantics—not pseudo semantics—to their mark-up. This is perhaps the single most pressing goal for the HTML 5 project.

But it’s not as simple as coming up with a mechanism to create richer semantics in HTML content: there are significant constraints on any solution. Perhaps the biggest one is backward compatibility. The solution can’t break the millions of browsing devices in use today, which will continue to be used for years to come. Any solution that isn’t backward compatible won’t be widely adopted by developers for fear of excluding readers.

Compatibility structure of HTML 5 elements with today’s browsers
Chrome Firefox Internet Explorer Opera Safari
contentEditable Yes Yes Yes Yes Yes
Stylable Elements (new) Yes Yes No Yes Yes
getElementsByClassName Yes Yes No Yes Yes
Cross-Document Messaging Yes Yes Yes Yes Yes
Web Forms 2.0 Partial No No Yes Partial
Drag and Drop Yes Yes Yes No Yes
<audio> No 3.5 No No Yes
<video> 3.0 3.5 No Labs release, partial Yes
<canvas> Yes Yes No Yes Yes
Server-sent DOM Events No No No Partial No
Client-side Storage (name/value) 2.x Yes Yes No Yes

Canvas—inline SVG and MathML

The <canvas> element is another exciting addition to the HTML 5 specifications, second only to <video> (which will be discussed later in the article). HTML 5 Canvas gives you an easy and powerful way to draw  using JavaScript. For each canvas element you can use a ‘context’ (think about a page on a drawing pad), into which you can issue JavaScript commands to draw anything you want. Browsers can implement multiple canvas contexts and the different APIs provide the drawing functionality.

Most of the major browsers include the 2D canvas context capabilities—Opera, Firefox, Konqueror and Safari. In addition, there are experimental builds of Opera that include support for a 3D canvas context.

A demo of HTML 5 Canvas
A demo of HTML 5 Canvas

The baby steps

Creating a canvas context on your page is as simple as adding the <canvas> element to your HTML document. Here’s an example:

<canvas id="CanvasID" height="300" width="500">
    Show this text if the browser is not Canvas compatible

It is advisable to define an element ID. This will be helpful later in referencing the element in the JS code. Moreover, the height and width of the canvas also needs to be defined.

Now that you have managed to create your very first canvas, let’s shake things up a bit. To draw inside your canvas, you need to use JavaScript. First find your canvas element using getElementById, then initialise the context you want. Once you do that, you can start drawing into the canvas using the available commands in the context API. Let us now learn to draw a little rectangle inside the canvas we just created.

// Referencing the 'CanvasID' element
var elem = document.getElementById('CanvasID');

// Checking for browsers
if (elem && elem.getContext) {
// You can only initialize one context per element.
var context = elem.getContext('2d');
if (context) {
// Here comes the rectangle, now. Remember to mention the X and Y co-ordinates — then the width and height
    context.fillRect(0, 0, 3000, 250);

Borders and key-strokes

Now, the rectangle can be made a bit more exciting by using the fillStyle and strokeStyle properties. The rectangle can also be filled with a certain colour using the fillrect element. Here’s an example:

context.fillStyle   = '#00f';
context.strokeStyle = '#f00';
context.lineWidth   = 4;

// a few more rectangles
context.fillRect  (0,   0, 150, 50);
context.strokeRect(0,  60, 150, 50);
context.clearRect (30, 25,  90, 60);
context.strokeRect(30, 25,  90, 60);

Exciting opportunities with Canvas

The drawImage method allows you to insert other images (img and Canvas elements) into your Canvas context. In Opera you can also draw SVG images inside your canvas. <canvas> allows pixel-based manipulation as well. The 2D Context API provides you three methods that help you draw pixel-by-pixel: createImageData, getImageData, and putImageData. Moreover, the fillStyle and strokeStyle properties can also have CanvasGradient objects assigned to them, instead of CSS ‘colour’ strings—these allow you to use colour gradients to colour your lines and fills instead of solid colours.

For a few demonstrations of the capabilities of the canvas element, you can visit the following URLs:

Video elements

The increasingly competitive browser market has at last created an environment in which emerging Web standards can flourish. One of the harbingers of the open Web renaissance is HTML 5, the next major version of the W3C’s ubiquitous HTML standard. Although HTML 5 is still in the draft stage, several of its features have already been widely adopted by browsers like Firefox, Safari and Chrome. Among the most compelling is the ‘video’ element, which has the potential to free Web video from its plug-in prison and make video content a native first-class citizen on the Web—if codec disagreements don’t stand in the way.

Video is one of the most significant areas where this trend will have a major impact. Some of the giants of Internet video are exploring standards-based solutions as means to break free from the constraints imposed by proprietary browser plug-ins. During the Google I/O conference in the middle of July, the search giant demonstrated a YouTube mock-up built with HTML 5. In addition to using the HTML 5 video element, it also uses new HTML structural elements and other features introduced in the upcoming version of the standard. The demonstration illustrates how open technologies can be used to deliver a high-quality user experience to stream video playback.

This is how a video looks after embedding
This is how a video looks after embedding

For content providers like YouTube and DailyMotion, the HTML 5 video element offers numerous advantages. It integrates seamlessly with conventional HTML content and can be manipulated with JavaScript and CSS. This enables Web developers to build video player interfaces that are more consistent with the rest of their website. The ability to control playback with JavaScript allows video to be a more native part of the user experience in interactive Web applications.

This has given way to an absolutely dedicated HTML tag called <video>. The <video> tag defines video, such as a movie clip or other video streams. All one needs to do is embed a URL to a video file and the video gets embedded into the page, all by itself. The various attributes available for the <video> tags are reproduced in the table below.

Optional Attributes of the <video> tag
Attribute Value Description
autoplay true|false If true, then the audio will start playing as soon as it is ready
controls true|false If true, the user is shown some controls, such as a play button.
end numeric value Defines where in the audio stream the player should stop playing. As default, the audio is played to the end.
height pixels Sets the height of the video player
loopend numeric value Defines where in the audio stream the loop should stop, before jumping to the start of the loop. Default is the end attribute’s values
loopstart numeric value Defines where in the audio stream the loop should start. Default is the start attribute’s values
playcount numeric value Defines how many times the audio clip should be played. Default is 1.
poster url The URL of an image to show before the video is ready
src url The URL of the audio to play
start numeric value Defines where in the audio stream the player should start playing. As default, the audio starts playing at the beginning.
width pixels Sets the width of the video player

Explaining Ogg Theora and H.264 codecs

Ogg Theora is an open format that is thought to be unencumbered by patents. The primary reference implementation is distributed under an open source licence and is being developed by the non-profit with funding from Mozilla. Ogg is strongly preferred by the FOSS community because it can be freely redistributed without requiring licensing fees.

H.264 is a high-performance codec that is maintained by the ISO/IEC Moving Picture Experts Group (MPEG) as part of the MPEG-4 family. It is emerging as the dominant codec for both streaming video and optical media, as it is said to deliver the visual quality of MPEG-2 (used on DVDs) at roughly half the bit-rate. The MPEG LA consortium manages licensing of the underlying patents that cover H.264 compression algorithms and other software methods needed to implement the codec. In order to use the format, adopters have to pay licensing fees to MPEG LA.

Debates over patents on video codecs

Patent encumbrance is one of the driving forces behind the HTML 5 video codec controversy. The patent licensing requirements mean that H.264 codecs can’t be freely redistributed, making the format a non-starter for Mozilla and most other open source browser vendors. Opera also objects, saying that the licensing fees are too high. Mozilla and Opera strongly advocate Ogg Theora as an alternative because its freedom from known patents could ensure that there are no licensing barriers that prevent ubiquitous adoption.

Apple objects to Ogg Theora, claiming that the lack of known patents on Theora doesn’t rule out the threat of submarine patents that could eventually be used against adopters. Apple is also concerned about the lack of widespread support for hardware-based Theora decoding, a factor that diminishes the format’s viability on mobile devices. Google shares Apple’s scepticism about the potential of Theora in the marketplace. The search giant claims that Theora’s lack of quality relative to H.264 will make it an impractical choice for large-scale streaming video services such as YouTube. (For the complete e-mail sent by Ian Hickon, visit

A solution that seems logical on the surface is to simply expose each platform’s underlying media playback engine through the HTML 5 video element—DirectShow on Windows, GStreamer on Linux, and QTKit on Mac OS X. This would make it possible for the browser to play any video formats that are supported natively on the user’s computer.

Now, there’s a push for hardware decoding that makes Theora on mobiles technically possible and working well. If Apple is satisfied on the legal aspects and jumps on board, that changes the game. I think Google is mostly ambivalent since it supports both right now. Opera doesn’t want H.264 anyway. IE 8 can likely be handled by a plug-in. Apple really is the deciding factor. So, Theora seems to be, pretty much, the future.

Audio interface

It is just as simple to embed audio into a page using the audio element. Most of the attributes are common between the video and audio elements, although for obvious reasons, the audio element lacks the width, height, and poster attributes.

<audio src="audio.oga" controls>
    <a href="audio.oga">get the track</a>

HTML 5 provides the source element to specify alternative video and audio files that the browser may choose from, based on its media type or codec support. The media attribute can be used to specify a media query for selection based on the device limitations and the type attribute for specifying the media type and codecs. Note that when using the source elements, the src attribute needs to be omitted from the parent video or audio element, or the alternatives given by the source elements will be ignored.

<video poster="VidPost.jpg">
    <source src="vid.3gp" type="video/3gpp" media="handheld">
    <source src="vid.ogv" type="video/ogg; codecs=theora, vorbis">
    <source src="vid.mp4" type="video/mp4">
    <source src="audio.oga" type="audio/ogg">
    <source src="audio.mp3" type="audio/mpeg">

For authors who want a little more control over the user interface so that they can make it fit the overall design of the Web page, the extensive API provides several methods and events to let scripts control the playback of the media. The simplest methods to use are the play(), pause(), and setting currentTime to rewind to the beginning.

Drag and drop

Drag and drop is one of the most fundamental interactions afforded by graphical user interfaces. In one gesture, it allows users to pair the selection of an object with the execution of an action, often including a second object in the operation. It’s a simple yet powerful UI concept used to support copying, list reordering, deletion (a la the Trash/Wastebin), and even the creation of link relationships.

Since it’s so fundamental, offering drag-and-drop in Web applications has been a no-brainer ever since browsers first offered mouse events in DHTML (Dynamic HTML). But, although mousedown, mousemove, and mouseup made it possible, the implementation has been limited to the bounds of the browser window. Additionally, since these events refer only to the object being dragged, there’s a challenge to find the subject of the drop when the interaction is completed.

Of course, that doesn’t prevent most modern JavaScript frameworks from abstracting away most of the problems and throwing in some flourishes while they’re at it. But, wouldn’t it be nice if browsers offered first-class support for drag-and-drop, and maybe even extended it beyond the window sandbox?

As it turns out, this very wish is answered by the HTML 5 specification section on new drag-and-drop events, and Firefox 3.5 includes an implementation of those events.

The latest events

The latest drag and drop events specified for HTML 5 are:

  • dragstart – A drag has been initiated, with the dragged element as the event target.
  • drag – The mouse has moved, with the dragged element as the event target.
  • dragenter – The dragged element has been moved into a drop listener, with the drop listener element as the event target.
  • dragover – The dragged element has been moved over a drop listener, with the drop listener element as the event target. Since the default behaviour is to cancel drops, returning false or calling preventDefault() in the event handler indicates that a drop is allowed here.
  • dragleave – The dragged element has been moved out of a drop listener, with the drop listener element as the event target.
    drop – The dragged element has been successfully dropped on a drop listener, with the drop listener element as the event target.
    dragend – A drag has been ended, successfully or not, with the dragged element as the event target.

For a detailed list and for explanations, read

Consider the following example of jQuery:

<div id="newschool">
    <div class="dragme">Drag me!</div>
    <div class="drophere">Drop here!</div>

<script type="text/javascript">
    $(document).ready(function() {
        $('#newschool .dragme')
            .attr('draggable', 'true')
            .bind('dragstart', function(ev) {
                var dt = ev.originalEvent.dataTransfer;
                dt.setData(“Text", “Dropped in zone!");
                return true;
            .bind('dragend', function(ev) {
                return false;
        $('#newschool .drophere')
        .bind('dragenter', function(ev) {
            return false;
        .bind('dragleave', function(ev) {
            return false;
        .bind('dragover', function(ev) {
            return false;
        .bind('drop', function(ev) {
            var dt = ev.originalEvent.dataTransfer;
            return false;

Since a detailed explanation of the features and demonstrations is out of the scope of this article, you can always refer to a lot of interesting content on:


Last, but definitely not the least

If you fancy getting more involved, there’s still time—try using HTML 5 and give your  feedback to the specification group via the WHATWG mailing lists []. The editor, Ian Hickson, has put out a call for people to review the specs— looking for confusing items, typos, and other small problems. If you find one, you’ll be mentioned in the acknowledgements. There’s plenty to do, so dive in!



Please enter your comment!
Please enter your name here