I could just say “Use trailing slashes!” and be done with it. But that would leave you, dear reader, underwhelmed and grumbly. You may have already read this article on A List Apart regarding using trailing slashes. In that article the author mentions three reasons for using trailing slashes when linking to directories (and I quote):
- We’re doing ourselves a favor, as this is the correct way to do things.
- We’re doing our server a favor, as this means less disk access.
- And most importantly, we’re doing our visitors a favor, because they’re no longer losing a few seconds while our server tries to find first a file and then a directory. And in this industry, you and I both know that a few seconds is a long, long time.
Now this article was written in 2002 when most everyone was still on dialup and servers were much slower in general. So number 3 doesn’t really apply anymore. In this article, I’m going to give you a new reason number 3, and go into more detail on number 1, to help you understand why this is the correct way to do things.
Let’s first look at what happens when your browser requests a normal page. We’ll mimic a simple browser session using telnet from the command line.
We first initiate the telnet session with the server:
Aletheia:~ wknechtel$ telnet www.sheldoncomics.com 80
When we’re connected the server, it responds thus:
Trying 208.122.50.173... Connected to sheldoncomics.com. Escape character is '^]'.
We then issue the commands that a browser would. This is a little simplistic as a browser would also tell the server what sorts of encodings and content it can accept, but this will work:
GET / HTTP/1.1 Host: www.sheldoncomics.com
The request we’ve just issued breaks down like this: GET is the request method. There are other request methods, you’re probably most familiar with GET and POST. Then we specify the URI we’re requesting. In this case we use a slash to indicate that we’re looking for the top-most root document the server will hand us. Then We specify the protocol of HTTP, version 1.1.
HTTP 1.1 introduced the concept of the virtual host, so that you could tie more than one domain to an IP address. This brings us to the second line. Since we’re using HTTP/1.1, we have to declare which host we’re looking for as well. Make sure you hit enter twice after this, so that the server knows you’re done entering the request. Now the server will begin to deliver your request:
HTTP/1.1 200 OK Date: Sat, 15 Nov 2008 16:32:37 GMT Server: Apache/2.2.10 (Unix) Vary: Host Content-Type: text/html; charset=UTF-8 Transfer-Encoding: chunked 1e4c <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> ... </html> 0 Connection closed by foreign host.
We’re only really interested in the first few lines, called the header, and I’ve snipped out the vast majority of the returned HTML. Specifically we look at that “200 OK”. The 200 Status code tells us that everything is good-to-go: the document we’ve requested exists, we have permission to view it, and nothing went wrong internally while trying to retrieve it. It then tells us other things like the type of transmission to expect, how large the document is, and the version of the server software.
Now let’s look at what happens when we request a directory without a trailing slash:
GET /store HTTP/1.1 Host: www.sheldoncomics.com
This time the response from the server is different:
HTTP/1.1 302 Found Date: Sat, 15 Nov 2008 18:40:42 GMT Server: Apache/2.2.10 (Unix) Location: http://www.sheldoncomics.com/store/ Content-Length: 334 Content-Type: text/html; charset=iso-8859-1 ...
This is what’s known as a 302 redirect. It was originally implemented so that webmasters could change the structure of their website and redirect visitors from the old (perhaps bookmarked) page, to the new page that has the same content. This allows you to actually change the names of your HTML files or change out whatever dynamic back-end your using and not worry about visitors getting lost during changes.
Now with our URI request that didn’t include a trailing slash, Apache couldn’t find what we were looking for, because it though we were requesting a file, not a directory. So trying to be nice before issuing a status of 404 (Not Found), Apache figured it would check – just in case – to see if there’s a directory matching the requested URI. Since there was in this case, it issued a 302 automatically to let us know the location of what we were really looking for. This is the reason it gave us a “Location: ” header entry in its response to us.
So why use trailing slashes when linking to directories? Because the 302 is not supposed to compensate for an incorrectly structured link. This, thinking back to reason number 1 above, is the correct way to do it. Also, thinking back to number 2 above, it keeps apache from doing unnecessary work. And now for your new reason number 3: its good for the search engines. Proper structure – everything from proper and valid (X)HTML to correctly formed links – is more easily digestible by the search engine’s spiders. If you want good placement, you should make it as easy as possible for the engines to crawl your site. A trailing slash may seem insignificant, but it’s easy to do and makes web servers happy :-) Enjoy!
Oh, by the way, you really should check out http://www.sheldoncomics.com/, a masterfully written and illustrated comic about a boy genius, his talking duck, and their adventures in life; by Dave Kellett. I’ve been following this strip for about six years now, and abusively used his server in my examples for this article. Tell Dave I said hello when you stop by.