Awasu » Embedding Python: Object references
Sunday 30th November 2014 8:10 AM []

Python is a language that offers automatic garbage collection, that is, it automatically frees up memory that is no longer being used. This results in code that is a little slower than if you did things yourself, but it is much more reliable and less prone to errors that would result in memory leaks.

When your program needs to work on some data, it needs to store it somewhere in memory. At some point, we need to release this memory and give it back to the system, otherwise we would just keep using up memory until it was all gone! But how do we know when it's safe to do this?[1]If we release the memory while someone is still using it, our program will crash.

The easiest way is for the program to simply do it itself when it's finished using the data, and this is what languages like C and C++ (that don't offer automatic garbage collection) do, but it is easy to forget to do it, or get it wrong, resulting in memory leaks and/or crashes.

Languages like Python manage memory automatically for you, and the way they know when it's safe to release the memory is by tracking how many times a piece of data is being used i.e. the number of object references. When this number drops to zero, Python knows that no-one is using the memory, and it's safe to release the memory.

The simplest case

This function allocates some memory to hold the string, prints it out, then exits:

def foo() :
    s = "Hello, world!"
    print s

When the function starts, it allocates memory for the string and sets its reference count to 1 (since the only thing using it is the variable s).

When the function exits, the variable s goes out of scope and is no longer usable. Since it can no longer be used to access the block of memory holding the string data, the reference count is decremented i.e. there is one less variable using this piece of data.

Since the reference count is now zero, Python knows no-one is using the memory and can safely release it. It's important to understand that while Python might release the memory immediately after the reference count becomes zero, it doesn't have to. For efficiency, it might wait until it has more inaccessible objects, then clean them all up together. Since the data is inaccessible, it doesn't really make any difference, the only down-side is that the memory is held on to for a little longer than necessary[2]However, this timing issue can be a problem for more complex objects that need to do some clean-up when they are destroyed e.g. to close file handles or database connections. Since you don't know exactly when this will happen, you might not be able to re-open the file, because while the object that had it open has now gone out of scope, it hasn't been cleaned up yet. In such cases, you need to explicitly clean up the object using the del statement..

Passing variables around

Here's a more involved example:

g = None

def foo( arg ) :
    global g
    g = arg

s = "Hello, world!"
foo( s )

del s
g = 42

As in the previous example, we allocate some memory to hold the string data, and since it's being used by the variable s, we set its reference count to 1.

We pass the string into the function foo(), via the arg parameter, so since there is another variable referencing the string data, its reference count is incremented to 2.

The function assigns the global g variable to the string, again incrementing the reference count, this time to 3.

When the function exits, the arg variable goes out of scope, there is one less reference to the string data, and the reference count is now 2.

Back on the main line of execution, the code deletes the s variable. It is no longer usable, what it used to reference is no longer accessible, and the reference count for our string data drops to 1 (the global g variable).

Finally, we set the global g variable to point to something else, so while the variable still exists, it now references something else. The reference count on our string data drops to 0, and since no-one is now using it, it is eligible to be cleaned up.

Borrowing and stealing references

One important aspect of references is who owns them. If you create a new Python object, it will have single reference to it (the one you just created), and you own that reference i.e. you are responsible for releasing it, by decrementing the reference count. If someone else takes a reference to the object (so that there are now 2 references), they own that reference i.e. they are responsible for releasing it[3]You are still responsible for cleaning up your reference..

It is possible to borrow a reference i.e. use an object that someone else has a reference to, without taking your own (new) reference to it. Obviously, you need to be very careful here, since if the other person releases their reference to the object while you are still using it, the object could get cleaned up, and your code will crash.

Python will also sometimes steal a reference, that is take ownership of the reference from you. For example, if you add an object to a tuple, the tuple will take ownership of your reference to the object, so you no longer need to release it, the tuple will do that when it gets destroyed.

// create a tuple (we own the reference to it)
PyObject* pTuple = PyTuple_New( 1 ) ;

// create an int object to put in the tuple (we own the reference to it)
PyObject* pVal = PyInt_FromLong( 42 ) ;

// add the int object to the tuple (the tuple now owns the reference to it)
int rc = PyTuple_SetItem( pTuple , 0 , pVal ) ;

// the tuple has stolen our reference to the int object - we DON'T need to do this!
// Py_DecRef( pVal ) ;

// we still own the reference to the tuple, so we DO need to do this
Py_DecRef( pTuple ) ;

More information about reference counting can be found here.

   [ + ]

1. If we release the memory while someone is still using it, our program will crash.
2. However, this timing issue can be a problem for more complex objects that need to do some clean-up when they are destroyed e.g. to close file handles or database connections. Since you don't know exactly when this will happen, you might not be able to re-open the file, because while the object that had it open has now gone out of scope, it hasn't been cleaned up yet. In such cases, you need to explicitly clean up the object using the del statement.
3. You are still responsible for cleaning up your reference.
Have your say