Binary Layout Of Python Lists
Solution 1:
The list object is defined in Include/listobject.h
. The structure is really simple:
typedefstruct {
PyObject_VAR_HEAD
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
* 0 <= ob_size <= allocated
* len(list) == ob_size
* ob_item == NULL implies ob_size == allocated == 0
* list.sort() temporarily sets allocated to -1 to detect mutations.
*
* Items must normally not be NULL, except during construction when
* the list is not yet visible outside the function that builds it.
*/
Py_ssize_t allocated;
} PyListObject;
and PyObject_VAR_HEAD
is defined as
typedefstruct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
typedefstruct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;
Basically, then, a list object looks like this:
[ssize_t ob_refcnt]
[type *ob_type]
[ssize_t ob_size]
[object **ob_item] -> [object *][object *][object *]...
[ssize_t allocated]
Note that len
retrieves the value of ob_size
.
ob_item
points to an array of PyObject *
pointers. Each element in a list is a Python object, and Python objects are always passed by reference (at the C-API level, as pointers to the actual PyObject
s). Therefore, lists only store pointers to objects, and not the objects themselves.
When a list fills up, it will be reallocated. allocated
tracks how many elements the list can hold at maximum (before reallocation). The reallocation algorithm is in Objects/listobject.c
:
/* Ensure ob_item has room for at least newsize elements, and set
* ob_size to newsize. If newsize > ob_size on entry, the content
* of the new slots at exit is undefined heap trash; it's the caller's
* responsibility to overwrite them with sane values.
* The number of allocated elements may grow, shrink, or stay the same.
* Failure is impossible if newsize <= self.allocated on entry, although
* that partly relies on an assumption that the system realloc() never
* fails when passed a number of bytes <= the number of bytes last
* allocated (the C standard doesn't guarantee this, but it's hard to
* imagine a realloc implementation where it wouldn't be true).
* Note that self->ob_item may change, and even if newsize is less
* than ob_size on entry.
*/staticint
list_resize(PyListObject *self, Py_ssize_t newsize)
{
PyObject **items;
size_t new_allocated;
Py_ssize_t allocated = self->allocated;
/* Bypass realloc() when a previous overallocation is large enough
to accommodate the newsize. If the newsize falls lower than half
the allocated size, then proceed with the realloc() to shrink the list.
*/if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return0;
}
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
/* check for integer overflow */if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
if (newsize == 0)
new_allocated = 0;
items = self->ob_item;
if (new_allocated <= (PY_SIZE_MAX / sizeof(PyObject *)))
PyMem_RESIZE(items, PyObject *, new_allocated);
else
items = NULL;
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
Py_SIZE(self) = newsize;
self->allocated = new_allocated;
return0;
}
As you can see from the comments, lists grow rather slowly, in the following sequence:
0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
Post a Comment for "Binary Layout Of Python Lists"