Python Memory Management
So let us begin with the size of basic objects. In Python, there’s not alot of primitive data types: there are ints, longs (an unlimitedprecision version of ints), floats (which are doubles), tuples, strings,lists, dictionaries, and classes.
What is the size of ? A programmer with a C or C++ background willprobably guess that the size of a machine-specific int
is somethinglike 32 bits, maybe 64; and that therefore it occupies at most 8 bytes. Butis that so in Python?
Let us first write a function that shows the sizes of objects (recursivelyif necessary):
We can now use the function to inspect the sizes of the different basicdata types:
- show_sizeof(None)
- show_sizeof(3)
- show_sizeof(2**63)
- show_sizeof(102947298469128649161972364837164)
- show_sizeof(918659326943756134897561304875610348756384756193485761304875613948576297485698417)
If you have a 32-bit 2.7x Python, you’ll see:
- 8 None
- 12 3
- 22 9223372036854775808
- 28 102947298469128649161972364837164
- 48 918659326943756134897561304875610348756384756193485761304875613948576297485698417
and if you have a 64-bit 2.7x Python, you’ll see:
- 16 None
- 24 3
- 36 9223372036854775808
- 40 102947298469128649161972364837164
- 60 918659326943756134897561304875610348756384756193485761304875613948576297485698417
Let us focus on the 64-bit version (mainly because that’s what we need themost often in our case). None
takes 16 bytes. int
takes 24 bytes,three times as much memory as a C int64_t
, despite being some kind of“machine-friendly” integer. Long integers (unbounded precision), used torepresent integers larger than 263-1, have a minimum size of 36bytes. Then it grows linearly in the logarithm of the integer represented.
Python’s floats are implementation-specific but seem to be C doubles.However, they do not eat up only 8 bytes:
- show_sizeof(3.14159265358979323846264338327950288)
Outputs
- 16 3.14159265359
on a 32-bit platform and
- 24 3.14159265359
on a 64-bit platform. That’s again, three times the size a C programmerwould expect. Now, what about strings?
- show_sizeof("")
- show_sizeof("My hovercraft is full of eels")
outputs, on a 32 bit platform:
and
- 37
- 66 My hovercraft is full of eels
An empty string costs 37 bytes in a 64-bit environment! Memory usedby string then linearly grows in the length of the (useful) string.
Other structures commonly used, tuples, lists, and dictionaries areworthwhile to examine. Lists (which are implemented as arraylists, not as , with everything itentails) arearrays of references to Python objects, allowing them to beheterogeneous. Let us look at our sizes:
- show_sizeof([])
- show_sizeof([4, "toaster", 230.1])
outputs
- 32 []
- 44 [4, 'toaster', 230.1]
- 72 []
- 96 [4, 'toaster', 230.1]
on a 64-bit platform. An empty list eats up 72 bytes. The size of anempty, 64-bit C++ is only 16 bytes, 4-5 times less. Whatabout tuples? (and dictionaries?):
- show_sizeof({})
- show_sizeof({'a':213, 'b':2131})
outputs, on a 32-bit box
- 136 {}
- 136 {'a': 213, 'b': 2131}
- 32 ('a', 213)
- 22 a
- 12 213
- 32 ('b', 2131)
- 22 b
- 12 2131
and
- 280 {}
- 280 {'a': 213, 'b': 2131}
- 72 ('a', 213)
- 38 a
- 24 213
- 72 ('b', 2131)
- 38 b
- 24 2131
for a 64-bit box.
This last example is particularly interesting because it “doesn’t add up.”If we look at individual key/value pairs, they take 72 bytes (while their componentstake 38+24=62 bytes, leaving 10 bytes for the pair itself), but thedictionary takes 280 bytes (rather than a strict minimum of 144=72×2bytes). The dictionary is supposed to be an efficient data structure forsearch and the two likely implementations will use more space that strictlynecessary. If it’s some kind of tree, then we should pay the cost ofinternal nodes that contain a key and two pointers to children nodes; ifit’s a hash table, then we must have some room with free entries to ensuregood performance.
The (somewhat) equivalent std::map
C++ structure takes 48 bytes whencreated (that is, empty). An empty C++ string takes 8 bytes (then allocatedsize grows linearly the size of the string). An integer takes 4 bytes (32 bits).
Why does all this matter? It seems that whether an empty string takes 8bytes or 37 doesn’t change anything much. That’s true. That’s true _until_you need to scale. Then, you need to be really careful about how manyobjects you create to limit the quantity of memory your program uses. It isa problem in real-life applications. However, to devise a really goodstrategy about memory management, we must not only consider the sizes ofobjects, but how many and in which order they are created. It turns out tobe very important for Python programs. One key element to understand is howPython allocates its memory internally, which we will discuss next.
To speed-up memory allocation (and reuse) Python uses a number of listsfor small objects. Each list will contain objects of similar size: therewill be a list for objects 1 to 8 bytes in size, one for 9 to 16, etc.When a small object needs to be created, either we reuse a free block inthe list, or we allocate a new one.
There are some internal details on how Python manages those lists intoblocks, pools, and “arena”: a number of block forms a pool, pools aregathered into arena, etc., but they’re not very relevant to the point wewant to make (if you really want to know, read Evan Jones’ ). The important point is thatthose lists never shrink.
Indeed: if an item (of size x) is deallocated (freed by lack ofreference) its location is not returned to Python’s global memory pool (andeven less to the system), but merely marked as free and added to the freelist of items of size x. The dead object’s location will be reused ifanother object of compatible size is needed. If there are no dead objectsavailable, new ones are created.
If small objects memory is never freed, then the inescapable conclusion isthat, like goldfishes, these small object lists only keep growing, nevershrinking, and that the memory footprint of your application is dominatedby the largest number of small objects allocated at any given point.
Therefore, one should work hard to allocate only the number of smallobjects necessary for one task, favoring (otherwise unpythonèsque) loopswhere only a small number of elements are created/processed rather than(more pythonèsque) patterns where lists are created using list generationsyntax then processed.
While the second pattern is more à la Python, it is rather the worstcase: you end up creating lots of small objects that will come populate thesmall object lists, and even once the list is dead, the dead objects (nowall in the free lists) will still occupy a lot of memory.
The fact that the free lists grow does not seem like much of a problembecause the memory it contains is still accessible to the Python program.But from the OS’s perspective, your program’s size is the total (maximum)memory allocated to Python. Since Python returns memory to the OS on theheap (that allocates other objects than small objects) only on Windows, ifyou run on Linux, you can only see the total memory used by your programincrease.
Let us prove my point using memory_profiler, a Python add-on module(which depends on the python-psutil
package) by (the module’s github page). This add-on provides thedecorator @profile
that allows one to monitor one specific functionmemory usage. It is extremely simple to use. Let us consider the followingprogram:
invoking
- Filename: memory-profile-me.py
- Line # Mem usage Increment Line Contents
- ================================================
- 4 @profile
- 5 9.11 MB 0.00 MB def function():
- 6 40.05 MB 30.94 MB x = list(range(1000000)) # allocate a big list
- 7 89.73 MB 49.68 MB y = copy.deepcopy(x)
- 8 82.10 MB -7.63 MB del x
- 9 82.10 MB 0.00 MB return y
This program creates a list of n=1,000,000 ints (n x 24 bytes = ~23 MB) and anadditional list of references (n x 8 bytes = ~7.6 MB), which amounts to a totalmemory usage of ~31 MB. copy.deepcopy
copies both lists, which allocatesagain ~50 MB (I am not sure where the additional overhead of 50 MB - 31 MB = 19MB comes from). The interesting part is : it deletes x
, but thememory usage only decreases by 7.63 MB! This is because del
only deletes thereference list, not the actual integer values, which remain on the heap andcause a memory overhead of ~23 MB.
This example allocates in total ~73 MB, which is more than twice the amountof memory needed to store a single list of ~31 MB. You can see that memory canincrease surprisingly if you are not careful!
Note that you might get different results on a different platform or with adifferent python version.
On a related note: is pickle
wasteful?
is the standard wayof (de)serializing Python objects to file. What is its memory footprint?Does it create extra copies of the data or is it rather smart about it?Consider this short example:
- import memory_profiler
- import pickle
- import random
- def random_string():
- return "".join([chr(64 + random.randint(0, 25)) for _ in xrange(20)])
- @profile
- def create_file():
- x = [(random.random(),
- random_string(),
- random.randint(0, 2 ** 64))
- for _ in xrange(1000000)]
- pickle.dump(x, open('machin.pkl', 'w'))
- @profile
- def load_file():
- y = pickle.load(open('machin.pkl', 'r'))
- return y
- if __name__=="__main__":
- create_file()
- #load_file()
With one invocation to profile the creation of the pickled data, and oneinvocation to re-read it (you comment out the function not to becalled). Using memory_profiler
, the creation uses a lot of memory:
- Filename: test-pickle.py
- Line # Mem usage Increment Line Contents
- ================================================
- 8 @profile
- 9 9.18 MB 0.00 MB def create_file():
- 10 9.33 MB 0.15 MB x=[ (random.random(),
- 11 random_string(),
- 12 random.randint(0,2**64))
- 13 246.11 MB 236.77 MB for _ in xrange(1000000) ]
- 14
- 15 481.64 MB 235.54 MB pickle.dump(x,open('machin.pkl','w'))
and re-reading a bit less:
- Filename: test-pickle.py
- Line # Mem usage Increment Line Contents
- 18 @profile
- 19 9.18 MB 0.00 MB def load_file():
- 20 311.02 MB 301.83 MB y=pickle.load(open('machin.pkl','r'))
- 21 311.02 MB 0.00 MB return y
So somehow, pickling is very bad for memory consumption. The initial listtakes up more or less 230MB, but pickling it creates an extra 230-somethingMB worth of memory allocation.
Unpickling, on the other hand, seems fairly efficient. It does create morememory than the original list (300MB instead of 230-something) but it doesnot double the quantity of allocated memory.
Overall, then, (un)pickling should be avoided for memory-sensitiveapplications. What are the alternatives? Pickling preserves all thestructure of a data structure, so you can recover it exactly from thepickled file at a later time. However, that might not always be needed. Ifthe file is to contain a list as in the example above, then maybe a simpleflat, text-based, file format is in order. Let us see what it gives.
A naïve implementation would give:
- import memory_profiler
- import random
- import pickle
- def random_string():
- return "".join([chr(64 + random.randint(0, 25)) for _ in xrange(20)])
- @profile
- def create_file():
- x = [(random.random(),
- random_string(),
- random.randint(0, 2 ** 64))
- for _ in xrange(1000000) ]
- f = open('machin.flat', 'w')
- for xx in x:
- print >>f, xx
- f.close()
- @profile
- def load_file():
- y = []
- f = open('machin.flat', 'r')
- for line in f:
- y.append(eval(line))
- f.close()
- return y
- if __name__== "__main__":
- create_file()
- #load_file()
Creating the file:
- Filename: test-flat.py
- Line # Mem usage Increment Line Contents
- ================================================
- 8 @profile
- 9 9.19 MB 0.00 MB def create_file():
- 10 9.34 MB 0.15 MB x=[ (random.random(),
- 11 random_string(),
- 12 random.randint(0, 2**64))
- 13 246.09 MB 236.75 MB for _ in xrange(1000000) ]
- 14
- 15 246.09 MB 0.00 MB f=open('machin.flat', 'w')
- 16 308.27 MB 62.18 MB for xx in x:
- 17 print >>f, xx
and reading the file back:
Memory consumption on writing is now much better. It still creates a lot oftemporary small objects (for 60MB’s worth), but it’s not doubling memoryusage. Reading is comparable (using only marginally less memory).
This particular example is trivial but it generalizes to strategies whereyou don’t load the whole thing first then process it but rather read a fewitems, process them, and reuse the allocated memory. Loading data to aNumpy array, for example, one could first create the Numpy array, then readthe file line by line to fill the array: this allocates one copy of thewhole data. Using pickle, you would allocate the whole data (at least)twice: once by pickle, and once through Numpy.
Or even better yet: use Numpy (or PyTables) arrays. But that’s a differenttopic. In the mean time, you can have a look at loading and savinganother tutorial in the Theano/doc/tutorial directory.