URLDecoder framework for Webware

The current implementation

The code is here: urldecode.tar.gz.

This code has progressed a lot from the discussion below, and it does work in an old, slightly-hacked version of webware, but it needs a thorough review and a good unit test before it is ready for prime time.

The original design thought-process

From the discussion it seems desireable to have more than one way to
decode a URL and into a server side path.  The suggestion to subclass
Application and override serverSidePathForRequest seems more complex
than it needs to be.  Applying the Strategy [1] pattern gives an easy
way to encapsulate the behavior and decouple it from the Application
class.  After a long design journey, I came up with 11 short pieces
of pseudocode that form the basis of a very flexible design.
(For the impatient: see {10} - {21} below.)  After some feedback from
the list, I will proceed to implement this design and make it available
to whomever is interested.

What remains to be solved, is the creation of these strategies.
Application.addContext() may be a good place to add them to Application.



Here we go...

First Attempt: Apply Strategy: move the decoding behavior into a
separate class.

{1}

class URLDecoder:
    ...
    def decodeURL( self, pathinfo, request ):
        """return the server side path of the script for the URL

        May produce side effects in request.

        Return value may be None if pathinfo is not in the domain
        of this decoder.
        """
        ...

Now we can say

{2}
    def serverSidePathForRequest( self, pathinfo, request ):
        return self.decoder.decodeURL( pathinfo, request )


Problem:  If decodeURL has side-effects, the resulting path is not
cacheable.  As the path lookup may be expensive, caching is desirable.
Furthermore, caching should be independent of the concrete decoding
strategy.

Solution: Apply Memento [2]

{3}
class URLDecoder:
    ...
    def getURLDecoding( self, pathinfo, request ):
        """return a function that will decode the URL

        The callable object return value will return the server
        side path of the script for the URL, and apply any side-effects.

        Return value may be None if pathinfo is not in the domain
        of this decoder.
        """
        ...

Now we can say

{4}
    def serverSidePathForRequest( self, pathinfo, request ):
        decoding = self.decoder.geturldecoding( pathinfo, request )
        return decoding( pathinfo, request )

Or we can cache the result

{5}
    def serverSidePathForRequest( self, pathinfo, request ):
        cache = self.URL_decoding_cache

        if cache.has_key( pathinfo ):
            decoding = cache[pathinfo]
        else:
            decoding = self.decoder.getURLDecoding( pathinfo, request )
            cache[pathinfo] = decoding

        return decoding( pathinfo, request )

Or better yet, we can encapsulate even the caching behavior

{6}
class CachingURLDecoder(URLDecoder):

    def getURLDecoding( self, pathinfo, request ):

        if self.cache.has_key( pathinfo ):
            decoding = self.cache[pathinfo]
        else:
            decoding = self.decoder.getURLDecoding( pathinfo, request )
            self.cache[pathinfo] = decoding

        return decoding

and use {4} where decoder happens to be an instance of CachingURLDecoder.


But what if some Decodings should not be cached? Let's extend the
interface given in {3}.

{7}
class URLDecoder:
    ...
    def getURLDecoding( self, pathinfo, request ):
        """return a pair (decoding, cacheable)


        decoding is a callable object that will return the server
        side path of the script for the URL, and apply any side-effects.

        if cacheable is true, the decoding may be cached.

        decoding may be None if pathinfo is not in the domain
        of this decoder.
        """
        ...

{8}
class CachingURLDecoder(URLDecoder):

    def getURLDecoding( self, pathinfo, request ):

        if self.cache.has_key( pathinfo ):
            decoding = self.cache[pathinfo]
        else:
            decoding,cacheable = self.decoder.getURLDecoding(
                pathinfo, request )
            if cacheable:
                self.cache[pathinfo] = decoding

        return decoding

{9}
class Application:
    ...
    def serverSidePathForRequest( self, pathinfo, request ):
        decoding = self.decoder.getURLDecoding( pathinfo, request )[0]
        return decoding( pathinfo, request )



We might add one more feature to our caching scheme. In a long running
process, we might want to expire some decodings from the cache.

{10}
class URLDecoder:
    ...
    def getURLDecodingAndTTL( self, pathinfo, request ):
        """return a pair (decoding, time_to_live)


        decoding is a callable object that will return the server
        side path of the script for the URL, and apply any side-effects.

        time_to_live is the maximum number of milliseconds that decoding

        decoding is a callable object that will return the server
        side path of the script for the URL, and apply any side-effects.

        time_to_live is the maximum number of milliseconds that decoding
        should be cached.  -1 indicates INFINITY.

        decoding may be None if pathinfo is not in the domain
        of this decoder.
        """
        ...

{11}
class CachingURLDecoder(URLDecoder):

    def getURLDecodingAndTTL( self, pathinfo, request ):

        if self.cache.has_key( pathinfo ):
            decoding = self.cache[pathinfo]
        else:
            decoding,ttl = self.decoder.getURLDecodingAndTTL(
                pathinfo, request )
            if ttl < 0: # ignore all TTLs except INFINITY
                self.cache[pathinfo] = decoding

        return decoding, 0 # no point in caching a cache

{12}
class Application:
    ...
    def serverSidePathForRequest( self, pathinfo, request ):
        decoding = self.decoder.getURLDecodingAndTTL(
            pathinfo, request )[0]
        return decoding( pathinfo, request )


{11} is still the simple kind of cache but it does conform to the
docstring in {10}.  A more sophisticated cache is left as an exercise.
Is this change worthwhile?  "You Aren't Gonna Need It" [3]
says no, but see {21} and comments for why we might need it.


Now let's look at some consequences.  Here are some possible URLDecoder
strategies:

{13}
class SimpleURLDecoder:
    # implements the original algorithm, looking in the filesystem
    ...

{14}
class ExtraPathInfoURLDecoder:
    # implements the current extra path info algorithm
    ...
    # implements the current extra path info algorithm
    ...

{15}
class LocalRegularExpressionURLDecoder:
    # implements my translate_path algorithm
    ...

{16}
class GlobalRegularExpressionURLDecoder:
    # implements something like mod_rewrite
    ...

Since we now have several strategy objects, we can now compose them:

CachingURLDecoder # see {11}

{17}
PrefixURLDecoder:
    # easily supports the current Context system
    def __init__( self, contexts ):
        self.contexts = contexts

    def getDecodingAndTTL( self, pathinfo, request ):
        prefix = pathinfo.split("/")[1]
        if self.contexts.has_key(prefix):
            decoder = self.contexts[prefix]
            return decoder( pathinfo , request )
        # else: return None

{18}
MultiplexURLDecoder:
    # This is why None is so important.
    # MultiplexURLDecoder
    # goes through a sequence of URLDecoders until it finds one
    # that returns somthing other than None
    ...


Now let's look at some of the memento classes:

{19}
class SimplePath:
    # just a path: no side-effects
    def __init__(self,path):
        self.path = path

    def decodeURL( self, pathinfo, request ):
        return self.path



{20}
class PathAndFields:
    # used by EPI and LRE and GRE decoders to record additions to
    # request
    def __init__(self,path,fields):
        self.path = path
        self.fields = fields

    def decodeURL( self, pathinfo, request ):
        for key,value in self.fields:
            request[key]=value
        return self.path


{21}
ExceptionalPath:
    """
    examples:
        NotFoundPath (raise 404)
        ServerError  (raise 50?)
        AccessDenied ( should not be cached: The cache key is pathinfo.
            determining accessiblity usually requires more stuff from
            request.)
        Redirect  ( raise HTTPRedirect( location ) )
    """
    def __init__( self, exception ):
        self.exception = exception

    def decodeURL( self, pathinfo, request ):
        raise self.exception

Caveat:  At this point we are not actually handling the request. We are
only decoding the URL.  The decoder should be careful to not do too much.

With ExceptionalPath, the time-to-live caching strategy becomes more
important. We can now cache negative information. This can save time
if it is cached, but it certainly should not live forever in most cases.

One small change: since our URLDecoder interface is so simple (one
method), we can simplfy the clients by using a single callable object
(maybe a function, maybe a bound method, maybe a class) instead of an
object with one method.
Thus
    decoding = self.decoder.getURLDecodingAndTTL(pathinfo,request)[0]
becomes
    decoding = self.decoder(pathinfo,request)[0]

One more: I got the signature for serverSidePathForRequest() wrong--
easy to fix.


-------------
[1] Gamma, Erich, et. al., Design Patterns: Elements of Reusable Object-
    Oriented Software. Addison-Wesley, 1995. ISBN:0-201-63361-2. p. 315+

[2] Design Patterns. p. 283+

[3] http://c2.com/cgi/wiki?YouArentGonnaNeedIt

SourceForge.net