In pkgcore back around 2006/2007, we ran into a bit of an issue. Specifically, during certain usage scenarios it was possible to get all package instances loaded into memory, including their cache contents- essentially we’d get 24000 some package instances in memory, each referencing a dictionary holding cache values.
Obviously, we ran into some memory issues. Via guppy, we did the usual analysis and leveled __slots__ generally all over the place (most instances were immutable, so this was a good thing also). For the cache entries, I came up with something of a trick.
For the cache dicts, they always had a guranteed set of keys- they didn’t necessarily always have a key/value pair, but the possible keys were fully known. If you’re familiar w/ the implementation of hashes, implicitly they have some unused space in them- this is intentional, and usually a good thing (perfect hash functions are rather rare after all). With python dictionaries however, it’s pretty easy to have the dictionary consuming a fair bit more memory than is actually used by it’s contents due to the bounds of it’s resizing.
What we did is actually pretty simple- slotted instances store their values w/in an array that roughly follows the PyObject struct bits. This is how they get the memory savings- they don’t use a dictionary for storing instance attributes, they store it w/in that trailing array itself. In addition to that, they add a getter to the class definition- in a way, it’s essentially a single dict that’s stored in the class itself (to get the getter) instead of a dict per instance. What I wound up doing is just creating some functionality to automatically create such a class, one that mimics the mapping/dictionary protocol, but has a limited set of keys that can be stored to it. The end result is snakeoil.obj.make_SlottedDict_kls; given a set of allowed keys, it generates and returns just such a class.
What prompted this posting is that at the time of writing that code (python2.4 or 2.5), there was no sys.getsizeof, so we couldn’t track exactly how much of a saving it leveled (we knew it was a 25% reduction of pmerge invocations, but couldn’t pin down the per instance saving). In writing out docs for snakeoil, I did some testing using the lovely sys.getsizeof: for a dict w/ a thousand keys, it’s a reduction from 49432 bytes per instance to 8048- roughly 84%. Around 100 items, it’s near 95% if memory serves- and for the case of only a couple of keys (3 in this testing), it’s a 74% reduction. Note these stats are from a 64bit machine.
While the differing storage definitely helps, there is a non-obvious memory optimization afoot here- python interns strings up to a certain length, then stops trying (I thought the value was a couple of chars, but in looking at 2.6.5 it looks to be 1 char… weak sauce). This means that it’s definitely possible to wind up with the two dictionaries that have the same keys, but each key is a seperate string in memory. With the slotteddict approach, you’re guranteed to get an interned string- so you pay the memory cost of the keys only once, rather than (worst case) per instance.
There are two caveats to using this class:
- The allowed set of keys is locked- if you create one of these classes with allowed keys “1″, “2″, “3″, you cannot do instance["4"] = some_value; that throws a KeyError. This isn’t an implementation quirk, it’s a raw restriction of cpython’s __slots__ implementation.
- These instances are not serializable- this however is an arbitrary limitation, patches welcome. Only reason this is there is I didn’t think of that when I first implemented it- this is a limitation I’ll be removing before snakeoil 0.4 is released.
Presuming those limitations aren’t a problem for people using it, I’d suggest taking a hard look at using it- in snakeoil we have an extension that is used to render this basically at builtin dict speeds in addition.