Pooling

Author:Ludvig Ericson <ludvig circled-a lericson dot se>
See also:Pooling with pylibmc (this document, first revision)
See also:Pooling with pylibmc pt. 2 (follow-up)

Note

This was originally a blog post. Edited and provided here for your convenience.

I was discussing how to implement pooling for pylibmc when I realized what libmemcachedutil‘s pooling is - or rather, what it isn’t.

It’s not a magical solution for concurrently doing anything at all, it’s not anything like that – it just helps you with thread-safety.

In Python, however, we’ve got the global interpreter lock, the GIL. This lock must always be held by the thread that is dealing with anything Python. The Python interpreter itself isn’t thread-safe, or rather, it is with the GIL.

This means that whenever Python code is running, you’ll be sure to have exclusive access to all of Python’s memory (unless something is misbehaving.) In turn, this means that the usecase for using libmemcachedutil in a Python library is rather slim.

An example with Werkzeug

This is a Werkzeug-based WSGI application which would be run in multiple threads concurrently and still not have issues with races:

# Configuration
n_threads = 12
mc_addrs = "10.0.1.1", "10.0.1.2", "10.0.1.3"
mc_pool_size = n_threads

# Application
import pylibmc
from contextlib import contextmanager
from pprint import pformat
from werkzeug.wrappers import Request, Response
from werkzeug.exceptions import NotFound

class ClientPool(list):
    @contextmanager
    def reserve(self):
        mc = self.pop()
        try:
            yield mc
        finally:
            self.append(mc)

mc = pylibmc.Client(mc_addrs)
mc_pool = ClientPool(mc.clone() for i in xrange(mc_pool_size))

@Request.application
def my_app(request):
    with mc_pool.reserve() as mc:
        key = request.path[1:].encode("ascii")
        val = mc.get(key)
        if not val:
            return NotFound(key)
        return Response(pformat(val))

if __name__ == "__main__":
    from werkzeug.serving import run_simple
    run_simple("0.0.0.0", 5050, my_app)

It’s fully-functional example of how one could implement pooling with pylibmc, and very much so in the same way that people do with libmemcachedutil. Paste it into a script file, it runs out of the box.

FIFO-like pooling

The aforementioned type of pool is already implemented in pylibmc as pylibmc.ClientPool, with a couple of other bells & whistles as well as tests (hint: don’t implement it yourself.) Its documentation speaks for itself:

class pylibmc.ClientPool(mc=None, n_slots=0)

Client pooling helper.

This is mostly useful in threaded environments, because a client isn’t thread-safe at all. Instead, what you want to do is have each thread use its own client, but you don’t want to reconnect these all the time.

The solution is a pool, and this class is a helper for that.

>>> from pylibmc.test import make_test_client
>>> mc = make_test_client()
>>> pool = ClientPool()
>>> pool.fill(mc, 4)
>>> with pool.reserve() as mc:
...     mc.set("hi", "ho")
...     mc.delete("hi")
... 
True
True
fill(mc, n_slots)

Fill n_slots of the pool with clones of mc.

reserve(block=False)

Context manager for reserving a client from the pool.

If block is given and the pool is exhausted, the pool waits for another thread to fill it before returning.

The use is identical to what was demonstrated above, apart from initialization, that would look like this:

mc = pylibmc.Client(mc_addrs)
mc_pool = pylibmc.ClientPool(mc, mc_pool_size)

Thread-mapped pooling

Another possibility is to have a data structure that remembers the thread name (i.e. key it by thread ID or so.)

Each thread would reserve its client in the dict on each request. If none exists, it would clone a master instance. Again, the documentation:

class pylibmc.ThreadMappedPool(master)

Much like the ClientPool, helps you with pooling.

In a threaded environment, you’d most likely want to have a client per thread. And there’d be no harm in one thread keeping the same client at all times. So, why not map threads to clients? That’s what this class does.

If a client is reserved, this class checks for a key based on the current thread, and if none exists, clones the master client and inserts that key.

Of course this requires that you let the pool know when a thread is done with its reserved instance, so therefore relinquish must be called before thread exit.

>>> from pylibmc.test import make_test_client
>>> mc = make_test_client()
>>> pool = ThreadMappedPool(mc)
>>> with pool.reserve() as mc:
...     mc.set("hi", "ho")
...     mc.delete("hi")
... 
True
True
reserve()

Reserve a client.

Creates a new client based on the master client if none exists for the current thread.

relinquish()

Relinquish any reserved client for the current context.

Call this method before exiting a thread if it might potentially use this pool.

A note on relinquishing

You must be sure to call ThreadMappedPool.relinquish() before exiting a thread that has used the pool, from that thread! Otherwise, some clients will never be reclaimed and you will have stale, useless connections.