Keeping extension docstrings in sync with the native implementation

August 9, 2010

In snakeoil, a core rule of our codebase is that extensions must be optional- this is primarily done to make the initial steps of porting pkgcore to a platform easier, and to enable some usages that aren’t extension friendly.

One problem I’ve ran into however is how best to keep your extension docstrings in sync with the native implementations docstrings. I’m not a fan of relying on developers to update docstrings in two places, and I’m not a fan of just storing the documentation on a website or something similar (I generally like my pydoc invocations to be useful, rather than stubs telling me to load a webpage- personal preference however).

Anyone got any suggestions? One thought I’d had was to split the docstrings out into a module the native and extension implementations could import, but that seems a bit too much like a kludge to me- plus it complicates the code a fair bit. The other thought I’d had was to add a set of tests checking extensions vs native, to ensure the two are in sync, but that doesn’t solve the core problem- just make it easier to do the dual maintainence.

So how are others solving this?

reducing dict footprints

August 3, 2010

In pkgcore back around 2006/2007, we ran into a bit of an issue.  Specifically, during certain usage scenarios it was possible to get all package instances loaded into memory, including their cache contents- essentially we’d get 24000 some package instances in memory, each referencing a dictionary holding cache values.

Obviously, we ran into some memory issues.  Via guppy, we did the usual analysis and leveled __slots__ generally all over the place (most instances were immutable, so this was a good thing also).  For the cache entries, I came up with something of a trick.

For the cache dicts, they always had a guranteed set of keys- they didn’t necessarily always have a key/value pair, but the possible keys were fully known.  If you’re familiar w/ the implementation of hashes, implicitly they have some unused space in them- this is intentional, and usually a good thing (perfect hash functions are rather rare after all).  With python dictionaries however, it’s pretty easy to have the dictionary consuming a fair bit more memory than is actually used by it’s contents due to the bounds of it’s resizing.

What we did is actually pretty simple- slotted instances store their values w/in an array that roughly follows the PyObject struct bits.  This is how they get the memory savings- they don’t use a dictionary for storing instance attributes, they store it w/in that trailing array itself.  In addition to that, they add a getter to the class definition- in a way, it’s essentially a single dict that’s stored in the class itself (to get the getter) instead of a dict per instance.  What I wound up doing is just creating some functionality to automatically create such a class, one that mimics the mapping/dictionary protocol, but has a limited set of keys that can be stored to it.   The end result is snakeoil.obj.make_SlottedDict_kls; given a set of allowed keys, it generates and returns just such a class.

What prompted this posting is that at the time of writing that code (python2.4 or 2.5), there was no sys.getsizeof, so we couldn’t track exactly how much of a saving it leveled (we knew it was a 25% reduction of pmerge invocations, but couldn’t pin down the per instance saving).  In writing out docs for snakeoil, I did some testing using the lovely sys.getsizeof: for a dict w/ a thousand keys, it’s a reduction from 49432 bytes per instance to 8048- roughly 84%.   Around 100 items, it’s near 95% if memory serves- and for the case of only a couple of keys (3 in this testing), it’s a 74% reduction.  Note these stats are from a 64bit machine.

While the differing storage definitely helps, there is a non-obvious memory optimization afoot here- python interns strings up to a certain length, then stops trying (I thought the value was a couple of chars, but in looking at 2.6.5 it looks to be 1 char… weak sauce).  This means that it’s definitely possible to wind up with the two dictionaries that have the same keys, but each key is a seperate string in memory.  With the slotteddict approach, you’re guranteed to get an interned string- so you pay the memory cost of the keys only once, rather than (worst case) per instance.

There are two caveats to using this class:

  1. The allowed set of keys is locked- if you create one of these classes with allowed keys “1”, “2”, “3”,  you cannot do instance[“4”] = some_value; that throws a KeyError. This isn’t an implementation quirk, it’s a raw restriction of cpython’s __slots__ implementation.
  2. These instances are not serializable- this however is an arbitrary limitation, patches welcome. Only reason this is there is I didn’t think of that when I first implemented it- this is a limitation I’ll be removing before snakeoil 0.4 is released.

Presuming those limitations aren’t a problem for people using it, I’d suggest taking a hard look at using it- in snakeoil we have an extension that is used to render this basically at builtin dict speeds in addition.

__del__ without the gc issues

August 3, 2010

I’ve been meaning to post about this for a while, but over the years I’ve wound up in some code situations where the best solution for a resource reclamation/finalization was a __del__ method- to be clear, not all situations are addressable via context managers, nor necessarily atexit.register. There are some cases where a __del__ really is the best solution and they’re damn annoying to deal with in a way that doesn’t involve bad compromises and having to leave warnings in the docstrings about the potential.

That said, as most folk know, this however means that object must never participate in a cycle- doing so means that cpython’s garbage collector can’t automatically break that cycle and do reclamation (details here, look for object.__del__), leaving it up to the developer to explicitly break the cycle.

Frankly this situation sucks from my standpoint (although I fully understand why it is the way it is and agree w/ the __del__ limitation, even if I dislike said limitation)- developers are fallible thus trying to rely on them to always do something is suboptimal. Further, for some cases I’ve dealt with __del__ was the only sane option.

Getting to the point, people know that weakref finalization is the best alternative, but anyone who has tried it knows that you wind up having to do some nasty seperation of your finalizer (and the data it needs) from the object that you’re waiting for to die. In reality you wind up having to implement a solution per usage usually.

The alternative is to tweak the class specifying the attributes that must be available to the finalizer- activate state has several such attempts. I find some faults with these attempts-

  • For the attempts that rely on binding a set of values to the finalizer, that’s a partial solution that can bite you in the ass if you ever inadvertantly replace that reference.
  • Said attempts also make it a serious pain in the ass if you’re deriving from such a class, and need to add one more attribute into what’s bound to the finalizer.
  • The next evolution of this is a class attribute listing what all must be bound in. Step in the right direction, but critically, it’s reliant on people maintaining it perfectly. People screw up, and if you have a complex finalizer pathway this can quickly prove to bite you in the ass.

In snakeoil, we’ve got a nasty bit of voodoo in snakeoil.obj that is designed for basically transparent proxying to another object. This includes slot methods (which most implementations miss), lieing about class to isinstance, and a whole bunch of other things that I’m reasonably sure people will hate me for. Either way, the sucker works, and at least for native classes you have to go out of your way to spot it’s presence.

Leading into the point of this blog, snakeoil.weakrefs.WeakRefFinalizer. This metaclass works by rewriting the class slightly (primarily shifting it’s __del__ to a method named __finalizer__ to avoid the gc marking it as unbreakable if somehow cyclic), and abusing the proxy I’d mentioned.

The trick behind this is that when you create an instance of a target class, you’re not actually getting that instance- you get the proxy. The real instance is jammed into a strong ref mapping  hidden away on the class object itself.   The trick is that the weakref is created for the proxy– when the proxy falls out of memory, the weakref finalizer fires invoking the real instances __finalizer__ method.   Since that instance is still strongly ref’d, the original __del__ has access to all attributes of the instance- you don’t have to track what you want during finalization. After that’s invoked, it then wipes the classes strong reference to it- meaning the instance falls out of memory.

Basically, best I can tell in a fair bit of experimenting with this, you get __del__ w/out the gc issues, at the cost of a slightly increased attribute/method access, the inability to resurrect instances from deletion (by the time the __del__ fires, the proxy is dead- thus you can’t really resurrect it anywhere), and one caveat.

The caveat’s an annoyance I’ve not yet figured out how to address, nor frankly have I decided if it’s worth the time to do so- if you have a method that returns another method, you’re not returning the proxied method- you’re returning the real instances method. This means it’s possible for the proxy to be deleted from memory while a ref effectively exists, leading to an early finalization invocation. I’ve yet to see this in any real usage, but thought experiment wise, I know it’s possible.

Finally, I apologize to anyone who looks in obj. Read the docstrings, there is a very good reason it has some voodoo in it- it’s the only way I could come up with to ensure that the proxy behaved exactly like the proxied target, leading to the vm executing the same codepaths.

So which python version you want?

June 1, 2010

For pkgcore, we run a pretty comprehensive set of buildslaves targets for testing pkgcore. Specifically

  • python 2.4
  • python 2.5
  • python 2.6
  • unladen swallow; python 2.6 based
  • python 2.7 snapshot (20100523)
  • python 3.1
  • python 3.2 VCS snapshot (20100523)

Originally, I’d ran this as separate KVM instances. This is nonoptimal however, since each instance is basically the exact same OS just w/ a differing python version overlaid, and w/ buildbot’s buildslave running within each. So via bastardizing some LXC work from diego, we now run a single kvm instance w/ each python version (and it’s buildslave) tucked away into their own container. Each container (including the raw parent) is intentionally externally addressable so that developers can reach into the container and tinker as needed, or experiment w/ that particular version of python.

Couple of folk have poked me for access to a copy of the vm image, so the puppy was stripped down (buildslave machinery removed among other things) and was posted to gentoo mirrors (and here is a direct link to allpython-amd64-qemu-20100531.qcow2.xz); still is propagating in full, but hit whatever your favorite local mirror is and raid it from there.

Few things to note about this vm image:

  1. it’s configured for, and expects to get access to it’s block device as virtio; this is tweakable, but really not recommended (virtio performance is quite nice)
  2. the containers are currently named buildbot-py$VER; this is a hold over from stripping down pkgcore’s buildslave vm… and my own lazyness in not changing the names
  3. each container root is actually an AUFS2 union of the raw parent FS. This was done so that the container could still share dentry cache for it’s libs w/ the parent, and to keep each containers footprint minimal.
  4. these are *full* containers, intentionally so to keep them from screwing up the parent/eachother in any fashion.
  5. the root password is ‘python’. Strongly suggest you change that if you expose this puppy publically.
  6. this is a bit of a custom setup- this is a patched version of lxc (backport adding init shutdown support), and a nasty little trick in the lxc init scripts to allow the parent to cleanly tell the guest container to shutdown (lxc-ps –lxc auxf # is a good command to look at- note that init is not the first process in each container).
  7. if you’re just running this for your own usage in a non deployed manner, feel free to remove acpid. acpid runs by default since the buildslave VM this was derived from is ran via init scripts, so there needed to be a way to tell it to shutdown (monitor shutdown events trigger acpi events, thus acpid).
  8. This was an oversight on my part during the scrub/releasing, but the default python in each container is still set to python-2.6. Feel free to run `eselect python set` to change the default version. In our buildslave usage, we leave the default python as 2.6 and force the target python via the buildbot step’s themselves; this was done to avoid having to modify buildbot bits hardcoding the python version into it’s shebang.
  9. Bugger is running a snapshot of pkgcore/snakeoil; mainly wanted a couple of unreleased fixes in there. VM and containers have been maintained/created via pkgcore in addition (issues/bug reports welcome).

Finally, there is *zero* support for this. I’m interested in bugs mind you, but I’m not supporting this- I’m just putting it out there since people have asked for it. Also if you’re interested in building your own buildslaves setup targeting multiple python versions based on this, feel free to either find me in irc or email me. I’ve been quite happy w/ this setup, including it’s minimal resource usage.

Hope it’s useful.