reducing dict footprints

August 3, 2010

In pkgcore back around 2006/2007, we ran into a bit of an issue.  Specifically, during certain usage scenarios it was possible to get all package instances loaded into memory, including their cache contents- essentially we’d get 24000 some package instances in memory, each referencing a dictionary holding cache values.

Obviously, we ran into some memory issues.  Via guppy, we did the usual analysis and leveled __slots__ generally all over the place (most instances were immutable, so this was a good thing also).  For the cache entries, I came up with something of a trick.

For the cache dicts, they always had a guranteed set of keys- they didn’t necessarily always have a key/value pair, but the possible keys were fully known.  If you’re familiar w/ the implementation of hashes, implicitly they have some unused space in them- this is intentional, and usually a good thing (perfect hash functions are rather rare after all).  With python dictionaries however, it’s pretty easy to have the dictionary consuming a fair bit more memory than is actually used by it’s contents due to the bounds of it’s resizing.

What we did is actually pretty simple- slotted instances store their values w/in an array that roughly follows the PyObject struct bits.  This is how they get the memory savings- they don’t use a dictionary for storing instance attributes, they store it w/in that trailing array itself.  In addition to that, they add a getter to the class definition- in a way, it’s essentially a single dict that’s stored in the class itself (to get the getter) instead of a dict per instance.  What I wound up doing is just creating some functionality to automatically create such a class, one that mimics the mapping/dictionary protocol, but has a limited set of keys that can be stored to it.   The end result is snakeoil.obj.make_SlottedDict_kls; given a set of allowed keys, it generates and returns just such a class.

What prompted this posting is that at the time of writing that code (python2.4 or 2.5), there was no sys.getsizeof, so we couldn’t track exactly how much of a saving it leveled (we knew it was a 25% reduction of pmerge invocations, but couldn’t pin down the per instance saving).  In writing out docs for snakeoil, I did some testing using the lovely sys.getsizeof: for a dict w/ a thousand keys, it’s a reduction from 49432 bytes per instance to 8048- roughly 84%.   Around 100 items, it’s near 95% if memory serves- and for the case of only a couple of keys (3 in this testing), it’s a 74% reduction.  Note these stats are from a 64bit machine.

While the differing storage definitely helps, there is a non-obvious memory optimization afoot here- python interns strings up to a certain length, then stops trying (I thought the value was a couple of chars, but in looking at 2.6.5 it looks to be 1 char… weak sauce).  This means that it’s definitely possible to wind up with the two dictionaries that have the same keys, but each key is a seperate string in memory.  With the slotteddict approach, you’re guranteed to get an interned string- so you pay the memory cost of the keys only once, rather than (worst case) per instance.

There are two caveats to using this class:

  1. The allowed set of keys is locked- if you create one of these classes with allowed keys “1”, “2”, “3”,  you cannot do instance[“4”] = some_value; that throws a KeyError. This isn’t an implementation quirk, it’s a raw restriction of cpython’s __slots__ implementation.
  2. These instances are not serializable- this however is an arbitrary limitation, patches welcome. Only reason this is there is I didn’t think of that when I first implemented it- this is a limitation I’ll be removing before snakeoil 0.4 is released.

Presuming those limitations aren’t a problem for people using it, I’d suggest taking a hard look at using it- in snakeoil we have an extension that is used to render this basically at builtin dict speeds in addition.


__del__ without the gc issues

August 3, 2010

I’ve been meaning to post about this for a while, but over the years I’ve wound up in some code situations where the best solution for a resource reclamation/finalization was a __del__ method- to be clear, not all situations are addressable via context managers, nor necessarily atexit.register. There are some cases where a __del__ really is the best solution and they’re damn annoying to deal with in a way that doesn’t involve bad compromises and having to leave warnings in the docstrings about the potential.

That said, as most folk know, this however means that object must never participate in a cycle- doing so means that cpython’s garbage collector can’t automatically break that cycle and do reclamation (details here, look for object.__del__), leaving it up to the developer to explicitly break the cycle.

Frankly this situation sucks from my standpoint (although I fully understand why it is the way it is and agree w/ the __del__ limitation, even if I dislike said limitation)- developers are fallible thus trying to rely on them to always do something is suboptimal. Further, for some cases I’ve dealt with __del__ was the only sane option.

Getting to the point, people know that weakref finalization is the best alternative, but anyone who has tried it knows that you wind up having to do some nasty seperation of your finalizer (and the data it needs) from the object that you’re waiting for to die. In reality you wind up having to implement a solution per usage usually.

The alternative is to tweak the class specifying the attributes that must be available to the finalizer- activate state has several such attempts. I find some faults with these attempts-

  • For the attempts that rely on binding a set of values to the finalizer, that’s a partial solution that can bite you in the ass if you ever inadvertantly replace that reference.
  • Said attempts also make it a serious pain in the ass if you’re deriving from such a class, and need to add one more attribute into what’s bound to the finalizer.
  • The next evolution of this is a class attribute listing what all must be bound in. Step in the right direction, but critically, it’s reliant on people maintaining it perfectly. People screw up, and if you have a complex finalizer pathway this can quickly prove to bite you in the ass.

In snakeoil, we’ve got a nasty bit of voodoo in snakeoil.obj that is designed for basically transparent proxying to another object. This includes slot methods (which most implementations miss), lieing about class to isinstance, and a whole bunch of other things that I’m reasonably sure people will hate me for. Either way, the sucker works, and at least for native classes you have to go out of your way to spot it’s presence.

Leading into the point of this blog, snakeoil.weakrefs.WeakRefFinalizer. This metaclass works by rewriting the class slightly (primarily shifting it’s __del__ to a method named __finalizer__ to avoid the gc marking it as unbreakable if somehow cyclic), and abusing the proxy I’d mentioned.

The trick behind this is that when you create an instance of a target class, you’re not actually getting that instance- you get the proxy. The real instance is jammed into a strong ref mapping  hidden away on the class object itself.   The trick is that the weakref is created for the proxy– when the proxy falls out of memory, the weakref finalizer fires invoking the real instances __finalizer__ method.   Since that instance is still strongly ref’d, the original __del__ has access to all attributes of the instance- you don’t have to track what you want during finalization. After that’s invoked, it then wipes the classes strong reference to it- meaning the instance falls out of memory.

Basically, best I can tell in a fair bit of experimenting with this, you get __del__ w/out the gc issues, at the cost of a slightly increased attribute/method access, the inability to resurrect instances from deletion (by the time the __del__ fires, the proxy is dead- thus you can’t really resurrect it anywhere), and one caveat.

The caveat’s an annoyance I’ve not yet figured out how to address, nor frankly have I decided if it’s worth the time to do so- if you have a method that returns another method, you’re not returning the proxied method- you’re returning the real instances method. This means it’s possible for the proxy to be deleted from memory while a ref effectively exists, leading to an early finalization invocation. I’ve yet to see this in any real usage, but thought experiment wise, I know it’s possible.

Finally, I apologize to anyone who looks in obj. Read the docstrings, there is a very good reason it has some voodoo in it- it’s the only way I could come up with to ensure that the proxy behaved exactly like the proxied target, leading to the vm executing the same codepaths.