Dealing with unqualified HREF values

When I was building my extension for finding unused CSS rules, I needed a way of qualifying any href value into a complete URI. I needed this because I wanted it to support stylesheets inside IE conditional comments, but of course to Firefox these are just comments — I had to parse each comment node with a regular expression to extract what’s inside it, and therefore, the href value I got back was always just a string, not a property or a qualified path.

And it’s not the first time I’ve needed this ability, but in the past it’s been with predictable circumstances where I already know the domain name and path. But here those circumstances were not predictable — I needed a solution that would work for any domain name, any path, and any kind of href format (remembering that an href value could be any one of several formats):

  • relative: "test.css"
  • relative with directories: "foo/test.css"
  • relative from here: "./test.css"
  • relative from higher up the directory structure: "../../foo/test.css"
  • relative to the http root: "/test.css"
  • absolute: ""
  • absolute with port: ""
  • absolute with different protocol: ""

When are HREFs qualified?

When we retrieve an href with JavaScript, the value that comes back has some cross-browser quirks. What mostly happens is that a value retrieved with the shorthand .href property will come back as a qualified URI, whereas a value retrieved with getAttribute('href') will (and should, according to specification) come back as the literal attribute value. So with this link:

<a id="testlink" href="/test.html">test page</a>

We should get these values:

document.getElementById('testlink').href == ''; document.getElementById('testlink').getAttribute('href') == '/test.html';

And in Opera, Firefox and Safari that is indeed what we get. However in Internet Explorer (all versions, up to and including IE7) that isn’t what happens — for both examples we get back a fully-qualified URI, not a raw attribute value:

document.getElementById('testlink').href == ''; document.getElementById('testlink').getAttribute('href') == '';

This behavioral quirk is documented in Kevin Yank and Cameron Adams’ recent book, Simply JavaScript; but it gets quirkier still. Although this behavior applies with the href of a regular link (an <a> element), if we do the same thing for a <link> stylesheet, we get exactly the opposite behavior in IE. This HTML:

<link rel="stylesheet" type="text/css" href="/test.css" />

Produces this result:

document.getElementById('teststylesheet').href == '/test.css'; document.getElementById('teststylesheet').getAttribute('href') == '/test.css';

In both cases we get the raw attribute value (whereas in other browsers we get the same results as for an anchor — .href is fully qualified while getAttribute produces a literal value).


Behavioral quirks aside, I have to say that IE‘s behavior with links is almost always what I want. Deriving a path or file name from a URI is fairly simple, but doing the opposite is rather more complex.

So I wrote a helper function to do it. It accepts an href in any format and returns a qualified URI based on the current document location (or if the value is already qualified, it’s returned unchanged):

//qualify an HREF to form a complete URI function qualifyHREF(href) {     //get the current document location object  var loc = document.location;    //build a base URI from the protocol plus host (which includes port if applicable)  var uri = loc.protocol + '//' +;   //if the input path is relative-from-here   //just delete the ./ token to make it relative  if(/^(./)([^/]?)/.test(href))   {       href = href.replace(/^(./)([^/]?)/, '$2');  }   //if the input href is already qualified, copy it unchanged     if(/^([a-z]+):///.test(href))   {       uri = href;     }   //or if the input href begins with a leading slash, then it's base relative     //so just add the input href to the base URI    else if(href.substr(0, 1) == '/')   {       uri += href;    }   //or if it's an up-reference we need to compute the path    else if(/^((../)+)([^/].*$)/.test(href))    {       //get the last part of the path, minus up-references        var lastpath = href.match(/^((../)+)([^/].*$)/);        lastpath = lastpath[lastpath.length - 1];       //count the number of up-references         var references = href.split('../').length - 1;          //get the path parts and delete the last one (this page or directory)       var parts = loc.pathname.split('/');        parts = parts.splice(0, parts.length - 1);          //for each of the up-references, delete the last part of the path       for(var i=0; i<references; i++)      {           parts = parts.splice(0, parts.length - 1);      }       //now rebuild the path      var path = '';      for(i=0; i<parts.length; i++)        {           if(parts[i] != '')          {               path += '/' + parts[i];             }       }       path += '/';        //and add the last part of the path         path += lastpath;       //then add the path and input href to the base URI      uri += path;    }   //otherwise it's a relative path,   else    {       //calculate the path to this directory      path = '';      parts = loc.pathname.split('/');        parts = parts.splice(0, parts.length - 1);      for(var i=0; i<parts.length; i++)        {           if(parts[i] != '')          {               path += '/' + parts[i];             }       }       path += '/';        //then add the path and input href to the base URI      uri += path + href;     }   //return the final uri  return uri; }

One more for the toolkit!


Time: 2007-08-10

