unsafe.host

really quite safe!
March 27, 2018 at 10:00 PM

Efficient route matching for micro-frameworks

Route matching refers to the process of a web server (or front end framework, given the times) selecting the appropriate code path given a URL.

Route matching can be complex, as routes these days may contain arguments within their paths. Generally, frameworks use some form of regular expression, or abstraction on top of regexs, to represent these dynamic paths.

Right now, the general approach to route matching in micro-framworks like Flask seems to be to build a list of routes (usually via decorators) mapped to controllers, and then iterate through them checking for matches. When a match is found, arguments are extracted, and passed on to the appropriate controller.

This seems a bit naive to me. Let's see if we can't squeeze some better performance out of such a system.

A simple test

Say we have a list of regular expressions for matching routes (ignoring that in a real routing framework, they would each need to be tied to a handler).

expressions = [
    '^/user/(\w*)$',
    '^/user/(\w*)/posts$',
    '^/user/(\w*)/posts/(\w*)$',
    # ...
]

When running a match, a naive approach would simply iterate through them for each route and check against its expression.

for e in expressions:
    result = re.search(e, route)
    if result:
        return result.groups()

Timing a quick test on 300 routes with 300 unique requests, we see 0.232 actual user mode CPU seconds. This is pretty abysmal.

real  0m0.261s
user  0m0.232s
sys   0m0.022s

Switching to a slightly less naive, complied approach

compiled_expressions = map(lambda e: re.compile(e), expressions)

We see 0.154s

real  0m0.198s
user  0m0.154s
sys   0m0.028s

Not bad!

But I have a hunch we can actually do better.


Let's try something different. Instead of compiling each of our expressions, but still running our O(n) cost naive iteration, lets see if we can take advantage of some of our regex engine's features.

Python's regular expression library in particular supports the (?P<name>...) special sequence. Using this, we can compose all of our routes into a single regular expression. We simply have to assign each route an identifier, which we will use for our argument extraction and controller routing (a simple dict of functions would do nicely).

We compose our singular expression as follows:

(?<id_1>{exp_1}) | (?<id_2>{exp_2}) | ...

Trying this this out...

0.104s, including the overhead of building the expression!

real  0m0.132s
user  0m0.104s
sys   0m0.022s

Is this actually practical?

There exist some caveats with this technique, of course.

Still, 32.47% improvement per request from the non-naive approach is not bad in my opinion. Not bad at all...