12.31.07

RFC2396: Per-Path-segment URI Parameters.

Posted in Coding, Protocols at 4:22 pm by elliot

OK, just a short rant.

RFC2396 specifies support for “URI parameters” within segments of a URI “path”.  For those of you who don’t enjoy reading RFCs with your morning coffee, URI parameters are the “;SESSION=f12aa” looking thingys that sometimes appear in HTTP URLs.  Yahoo and some other well-known websites use them, but most don’t.

I see the usefulness of URI parameters, but making them allowed on a per-segment basis just seems like overkill to me.  I’ve also seen no production environment that makes use of per-segment parameters (if you have, let me know!).

With per-segment URI params, you can do lovely stuff like this:

/foo;blah=x/bar;blah2=y/abc.txt;blah3=z

Which converts down to an actual URI path of:

/foo/bar/abc.txt (params: blah=x, blah2=y, blah3=z)

sheesh!

Looking through URI parsing routines in some common open source code, it seems not everybody is handling this scenario.  It’s stated as allowed, though not really expanded upon with any meaningful examples, in RFC2396 (Section 3.3).  Many folks who are parsing URI params seem to be assuming they can only appear at the last URI segment, but this is definitely not the case.

These sort of encoding/decoding tediums remind me of my days building Network Intrusion Detection Systems, where even a slight encoding error can have truly disastrous results (see the ’98 Ptacek/Newsham paper on NIDS Insertion/Evasion for the reasons why).  I’ve gone ahead and implemented per-path-segment URI parameter support in our AlchemyPoint core code, but I shudder to think of how many HTTP application layer proxies, firewalls, and NIDS systems aren’t handling this sort of thing correctly.

12.19.07

A Car In Every Garage..

Posted in API, Orchestr8, Scraping at 12:42 pm by elliot

And a Content Scraping API On Every Desktop :)