Behaviors
libmemcached is a lot more flexible than python-memcached, and has provisions
for configuring so-called behaviors. pylibmc wraps these in a Python
interface.
Not all of the available behaviors make sense for Python, or are hard to make
use of, and as such some behaviors have been intentionally hidden or exposed in
some other way (UDP and the binary protocol are examples of this.)
Generally, a behavior’s value should be an integer value. The exceptions are
hashing and distribution, which pylibmc translates with the C constants’
string equivalents, for readability.
Other than that, the behaviors are more or less one to one mappings of
libmemcached behavior constants.
- "hash"
- Specifies the default hashing algorithm for keys. See Hashing for more
information and possible values.
- "distribution"
- Specifies different means of distributing values to servers. See
Distribution for more information and possible values.
- "ketama"
- Setting this behavior to True is a shortcut for setting "hash" to
"md5" and "distribution" to "consistent ketama".
- "ketama_weighted"
- Exactly like the "ketama" behavior, but also enables the weighting
support.
- "ketama_hash"
- Sets the hashing algorithm for host mapping on continuum. Possible values
include those for the "hash" behavior.
- "buffer_requests"
- Enabling buffered I/O causes commands to “buffer” instead of being sent. Any
action that gets data causes this buffer to be be sent to the remote
connection. Quiting the connection or closing down the connection will also
cause the buffered data to be pushed to the remote connection.
- "cache_lookups"
- Enables the named lookups cache, so that DNS lookups are made only once.
- "no_block"
- Enables asychronous I/O. This is the fastest transport available for storage
functions.
- "failure_limit"
- Setting this behavior will remove a server from the server list after it has
failed continuously for the specified number of times.
- "tcp_nodelay"
- Setting this behavior will enable the TCP_NODELAY socket option, which
disables Nagle’s algorithm. This obviously only makes sense for TCP
connections.
- "cas"
- Enables support for CAS operations.
- "verify_keys"
- Setting this behavior will test if the keys for validity before sending to
memcached.
- "connect_timeout"
- In non-blocking, mode this specifies the timeout of socket connection.
- "receive_timeout"
- “This sets the microsecond behavior of the socket against the SO_RCVTIMEO
flag. In cases where you cannot use non-blocking IO this will allow you to
still have timeouts on the reading of data.”
- "send_timeout"
- “This sets the microsecond behavior of the socket against the SO_SNDTIMEO
flag. In cases where you cannot use non-blocking IO this will allow you to
still have timeouts on the sending of data.”
Hashing
Basically, the hasher decides how a key is mapped to a specific memcached
server.
The available hashers are:
- "default" - libmemcached’s home-grown hasher
- "md5" - MD5
- "crc" - CRC32
- "fnv1_64" - 64-bit FNV-1
- "fnv1a_64" - 64-bit FNV-1a
- "fnv1_32" - 32-bit FNV-1
- "fnv1a_32" - 32-bit FNV-1a
- "murmur" - MurmurHash
If pylibmc was built against a libmemcached using
--enable-hash_hsieh, you can also use "hsieh".
Hashing and python-memcached
python-memcached up until version 1.45 used a CRC32-based hashing algorithm not
reproducible by libmemcached. You can change the hasher for python-memcached
using the cmemcache_hash module, which will make it not only compatible with
cmemcache, but also the "crc" hasher in libmemcached.
python-memcached 1.45 and later incorporated cmemcache_hash as its default
hasher, and so will interoperate with libmemcached provided the libmemcached
clients are told to use the CRC32-style hasher. This can be done in
pylibmc as follows:
>>> mc.behaviors["hash"] = "crc"
Distribution
When using multiple servers, there are a few takes on how to choose a server
from the set of specified servers.
The default method is "modula", which is what most implementations use.
You can enable consistent hashing by setting distribution to "consistent".
Modula-based distribution is very simple. It works by taking the hash value,
modulo the length of the server list. For example, consider the key "foo"
under the "crc" hasher:
>>> servers = ["a", "b", "c"]
>>> crc32_hash(key)
3187
>>> 3187 % len(servers)
1
>>> servers[1]
'b'
However, if one was to add a server or remove a server, every key would be
displaced by one - in effect, changing your server list would more or less
reset the cache.
Consistent hashing solves this at the price of a more costly key-to-server
lookup function, last.fm’s RJ explains how it works.