[Reader-list] URLs exposing server architecture
Kiran Jonnalagadda
jace at pobox.com
Wed Mar 23 18:06:39 IST 2005
Follow-up to my last post on The URL as User Interface:
http://mail.sarai.net/pipermail/reader-list/2005-February/005074.html
Part 3: URLs exposing server architecture.
A HTML file typically carries an extension of ".html". However,
consider these examples:
1. http://www.bbc.co.uk/worldservice/index.shtml
2. http://www.royal.gov.uk/output/Page1.asp
3. http://www.xanga.com/register.aspx
4. http://gimp-print.sourceforge.net/MacOSX.php3
5. http://www.fanniemae.com/index.jhtml
6. http://www.poets.org/index.cfm
7. http://squishdot.org/987802018/index_html
8. http://www.telegram.com/apps/pbcs.dll/frontpage
All of these have different extensions, revealing the technology
platform in use. In order: Apache Server Side Includes, Microsoft ASP,
ASP.net, PHP 3, Java, Cold Fusion, Zope and Windows Dynamic Link
Libraries. The trouble with including such a blatant platform signature
in the URL is, should you choose to switch to a different platform, all
your URLs change. Some platforms like Zope are insensitive to file
extensions. You can use whatever you want and it'll still work. (In a
case of taking this insensitivity too far, Zope is littered with
index_html URLs.) Others like Apache-based platforms can be configured
to use different extensions, but this typically requires a system-wide
configuration change which your ISP may not be willing to do for you.
It is best to avoid identifying platform in your URLs. These examples
are even worse:
1.
http://www.amazon.com/exec/obidos/subst/home/home.html/104-0744072
-3248744
2. http://store.apple.com/1-800-MY-APPLE/WebObjects/AppleStore.woa
3. http://plone.org/search?SearchableText=plone&b_start:int=30
4.
http://www.telegraph.co.uk/news/main.jhtml?xml=/news/2005/01/30/
wgerm30.xml
Notice that all URLs at amazon.com begin with "/exec/obidos", making
that part of the URL semantically meaningless and unnecessary cruft in
the URL. Further, home.html is followed by a slash and another path
component. This breaks the file and folder hierarchy that the Web is
built around. Browsers expect that folders contain other folders and
files, and that files contain no sub-items. This is required for links
with relative references to resolve properly. (When folder/a.html links
to b.html, is it referring to folder/b.html or folder/a.html/b.html?)
When a path component can behave like a file at times and a folder at
other times, it risks confusing the browser. (Zope's object database
also has this problem. Zope solves it by inserting a base href tag in
all HTML pages. This works well but is not an elegant solution.)
The second is the home page of Apple's online store, listing all their
products. It seems like a simple matter to copy a link to any of the
products listed there, but you'll find the link does not work when used
anywhere else. Jim Roepcke deconstructs the Apple Store URL [1] to find
that this is because it includes session related data that is not valid
for anyone but the user it was generated for.
[1] http://jim.roepcke.com/1721
The third exhibits a characteristic of the Zope platform (which Plone
is built on). The "b_start:int" in the URL signifies that b_start is an
integer parameter. Zope includes several others like ":list" and
":tokens". These are matters of internal architecture and should not
appear in the URL.
The final, from the UK Telegraph, is rather interesting. main.jhtml is
taking a parameter that appears to refer to a file on disk. What if you
change the path and make it read another file, one that was not
supposed to be shown to the public? This may seem a humorous hack, but
it could be worse. Philip Greenspun describes a case of Harvard
Business School rejecting 119 applicants [2] who edited a URL to check
their application status.
[2] http://blogs.law.harvard.edu/philg/2005/03/08#a7726
Further Reading
---------------
Matthew P. Thomas documents cruft in URLs generated by various
weblogging systems:
http://mpt.phrasewise.com/2003/07/26#a534
Mark Pilgrim documents the process to make Movable Type generate
cruft-free URLs (warning! technical jargon):
http://diveintomark.org/archives/2003/08/15/slugs
Nathan Ashby-Kuhlman presents more real world examples:
http://www.ashbykuhlman.net/blog/2003/07/27/2227
http://www.ashbykuhlman.net/blog/2003/08/02/2224
Conclusion
----------
We have looked at various ways to construct a URL and what roles they
serve. Should there be a doubt yet on how URLs are relevant to
community, that is simple. To discuss the content of any web page, you
need a URL that can be shared. Without a URL, you are left attempting
to reproduce the content (which may be non-trivial for graphical or
Flash content), and have no reference that others can visit. A simple
URL is friendlier, and therefore a better URL.
My next few posts will explore the human side of the UI-Community
linkup.
--
Kiran Jonnalagadda
http://www.pobox.com/~jace
More information about the reader-list
mailing list