Memory Management in Lists and Tuples

0
6718
list and tuples python

Python has more than one data structure type to save items in an ordered way.
This article looks at lists and tuples to create an understanding of their commonalities and the need for two different data structure types. It also looks at how the memory is managed for both of these types. This article is written with reference to CPython implementation.

The commonalities between lists and tuples are:

  • Both are sequence data types.
  • Each item stored in a list can be of any data type.
  • Elements can be accessed by indexing and slicing.
  • Both allow duplicates.
    Let’s see how they are different with examples.

Lists
Lists are mutable in nature, and are sortable. A list can be used to save any kind of object. An example is:

# example of list
list1 = [1, 2, “three”, 4, 5]
print(type(list1))
print(list1)

The output is:

<class ‘list’>
[1, 2, ‘three’, 4, 5]

Slicing
You can access the contents of a list in the following ways:

# access elements using index ; first 2 indexes
print(list1[0:2])

# access the last 2 indexes
print(list1[-2:])

The output is:

[1, 2]
[4, 5]

Mutable
We can edit the values in the list as follows:

# list are mutable, which means we can change the value of the list
list1[0] = 100
print(type(list1))
print(list1)

The output is:

<class ‘list’>
[100, 2, ‘three’, 4, 5]

Memory allocation
We have now come to the crux of this article — how memory is managed while storing the items in the list. We will first see how much memory is currently allocated, and later see how the size changes each time new items are allocated. Let’s check the memory allocated currently:

# check the memory allocated
import sys
print(sys.getsizeof(list1))

The output is:

96

Here is a common function to see how much memory is allocated before and after values are appended:

# appending the new item
def append_into_list(value):
print(“address: “, id(list1))
print(“before size of list: “, sys.getsizeof(list1))
list1.append(value)
print(“updated list: “, list1)
print(“address remains the same: “, id(list1))
print(“after sizes of list: “, sys.getsizeof(list1))
print(“”)

Please closely observe the size and memory address of the list before and post update.

Note: In the scenario given below, the memory address didn’t change, but this will not always be the case.
# lets see after adding couple of more values the list size
append_into_list(6)
append_into_list(7)
append_into_list(8)
append_into_list(9)
append_into_list(10)
append_into_list(11)
append_into_list(12)

The output is:

address: 140509666477824
before size of list: 96
updated list: [100, 2, ‘three’, 4, 5, 6]
address remains the same: 140509666477824
after size of list: 128
address: 140509666477824
before size of list: 128
updated list: [100, 2, ‘three’, 4, 5, 6, 7]
address remains the same: 140509666477824
after size of list: 128
address: 140509666477824
before size of list: 128
updated list: [100, 2, ‘three’, 4, 5, 6, 7, 8]
address remains the same: 140509666477824
after size of list: 128
address: 140509666477824
before size of list: 128
updated list: [100, 2, ‘three’, 4, 5, 6, 7, 8, 9]
address remains the same: 140509666477824
after size of list: 128
address: 140509666477824
before size of list: 128
updated list: [100, 2, ‘three’, 4, 5, 6, 7, 8, 9, 10]
address remains the same: 140509666477824
after size of list: 192
address: 140509666477824
before size of list: 192
updated list: [100, 2, ‘three’, 4, 5, 6, 7, 8, 9, 10, 11]
address remains the same: 140509666477824
after size of list: 192
address: 140509666477824
before size of list: 192
updated list: [100, 2, ‘three’, 4, 5, 6, 7, 8, 9, 10, 11, 12]
address remains the same: 140509666477824
after size of list: 192

As you can see, the size of the list first expanded from 96 to 128, but didn’t change for the next couple of items and stayed there for some time. Then the size expanded to 192. The reason is that in CPython the memory is preallocated in chunks beforehand. This is to avoid making frequent heavy system calls. And if you see, the allocation is not static but mild and linear.

The reallocation happens to extend the current memory needed. So when you have a huge array in need and the realloc does not have so much space, it will create new memory and copy; this will be a very expensive operation. To avoid this, we can preallocate the required memory. The code snippet of C implementation of list is given below. The pictorial representation is given in Figure 1.

Figure 1: Memory allocation in list
(Credits: https://www.laurentluce.com/images/blog/list/list_insert.png)
#### Cpython : https://github.com/python/cpython/blob/master/Objects/listobject.c
‘’’
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* Add padding to make the allocated size multiple of 4.
* The growth pattern is: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, 88, 120, 160 ...
* Note: new_allocated won’t overflow because the largest possible value
* is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
*/
new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
/* Do not overallocate if the new size is closer to overalocated size
* than to the old size.
*/
‘’’
### eg: if the current size is 24
### ((size_t)newsize + (newsize >> 3) + 6) == increases the one eighth which is 15 so 24+3 => 24+6 = 30
### ~(size_t)3 = -4
### 30 & -4 = 32

### Note: The calculations vary based on the size of the object used in the list

Empty list
When an empty list is created, it will always point to a different address. It will also hold preallocated memory as well.

# When creating two empty list it will be point different address space
a = []
b = []
if a is not b :
print(“A and B are not mapped to same address”)
print(“address of a: “, id(a))
print(“address of b: “, id(b))

print(“A size of list: “, sys.getsizeof(a))
print(“B size of list: “, sys.getsizeof(b))

The output is:

A and B are not mapped to same address
address of a: 140509666702080
address of b: 140509666510912
A size of list: 56
B size of list: 56

Removal and insertion
When we perform removal, the allocated memory will shrink without changing the address of the variable. While performing insert, the allocated memory will expand and the address might get changed as well.

# pop an element from the between of the array
print(“address before change: “, id(list1))
print(“Remove element from list”, list1.pop(2))
print(“Size of list after removal: “, sys.getsizeof(list1))
print(“Remove element from list”, list1.pop(2))
print(“Size of list after removal: “, sys.getsizeof(list1))
print(“Remove element from list”, list1.pop(2))
print(“Size of list after removal: “, sys.getsizeof(list1))
print(“Remove element from list”, list1.pop(2))
print(“Size of list after removal: “, sys.getsizeof(list1))
print(“Remove element from list”, list1.pop(2))
print(“Size of list after removal: “, sys.getsizeof(list1))
print(list1)
print(“address after change: “, id(list1))

The output is:

The output is:

address before change:  140509666477824
Remove element from list three
Size of list after removal:  192
Remove element from list 4
Size of list after removal:  192
Remove element from list 5
Size of list after removal:  192
Remove element from list 6
Size of list after removal:  192
Remove element from list 7
Size of list after removal:  136
[100, 2, 8, 9, 10, 11, 12]
address after change:  140509666477824

# pop an element from the between of the array
print(“address before change: “, id(list1))
list1.insert(3, 30)
print(list1)
print(“address after change: “, id(list1))
Output
address before change:  140509666477824
[100, 2, 8, 30, 9, 10, 11, 12]
address after change:  140509666477824

Sort
The address of the list doesn’t get changed before and after the sort operation.

# sort the array
print(“address before change: “, id(list1))
list1.sort()
print(list1)
print(“address after change: “, id(list1))

The output is given below.

Note: Even after sort, the address remains the same.
address before change: 140509666477824
[2, 8, 9, 10, 11, 12, 30, 100]
address after change: 140509666477824

Performance

  • Append – O(1)
  • Extend – O(k)
  • Pop last element – O(1)
  • Pop anywhere else – O(n) worst case
  • Insert – O(n) worst case
  • Index – O(1)
  • Slice – O(k)
  • in Operator – O(n)
  • Sort – O(nlogn) – tim sort is used

Here, n = number of elements; k = k’th index; 1 = order of 1.

Tuples
Let’s observe how tuples are defined, and how they differ in the allocation of memory compared to lists. Tuples are:

  • Immutable in nature
  • Consume less memory as compared to lists

Definition
Here’s a quick example of how a tuple is defined:

# example of tuple
tuple1 = (‘one’, ‘two’, ‘three’)
print(type(tuple1))
print(tuple1)

The output is:

<class ‘tuple’>
(‘one’, ‘two’, ‘three’)

Changing the single value
As tuples are immutable in nature, we cannot change their value. You can find the error that comes up while trying to change the value of the tuple as follows:

# tuple are immutable
tuple1[0] = “cherry”
print(type(tuple1))
print(tuple1)

The output is:

TypeError Traceback (most recent call last)
<ipython-input-13-a780f574006b> in <module>
1 # tuple are immutable
----> 2 tuple1[0] = “cherry”
3 print(type(tuple1))
4 print(tuple1)

TypeError: ‘tuple’ object does not support item assignment

Replacing a tuple with a new tuple
We can overwrite the existing tuple to get a new tuple; the address will also be overwritten:

# but we can assign the tuple1 variable to any other value also another tuple
print(“address before the change: “, id(tuple1))
tuple1 = (1, 2, 3, 4)
print(type(tuple1))
print(tuple1)
print(“address after the change: “, id(tuple1))

The output is:

address before the change: 140509666846336
<class ‘tuple’>
(1, 2, 3, 4)
address after the change: 140509667383792

Changing the list inside tuple
We know that the tuple can hold any value. We have tried to save a list inside tuple. Let’s try editing its value. Can we edit? Will it change the list? Let’s find out:

t = (1, 2, [10, 20, 30])
print(id(t), t)
t[2] += [40, 50]

The output is:

140509666702784 (1, 2, [10, 20, 30])

TypeError Traceback (most recent call last)
<ipython-input-15-cc42d1fec8cf> in <module>
1 t = (1, 2, [10, 20, 30])
2 print(id(t), t)
----> 3 t[2] += [40, 50]

TypeError: ‘tuple’ object does not support item assignment

It has clearly thrown an error, so it should not have updated the values as well:

print(id(t), t)

The output is:

140509666702784 (1, 2, [10, 20, 30, 40, 50])

But if you see carefully, the values are appended. This is an edge case where Python behaves strangely. So, putting mutable items in tuples is not a good idea.

Memory allocation
Check the memory allocated – a tuple uses only required memory. It is not over allocated as it is not resizable:

print(sys.getsizeof(tuple1))

The output is:

72

Reuse memory
To reduce memory fragmentation and speed up allocations, Python reuses old tuples. If a tuple is no longer needed and has less than 20 items, instead of deleting it permanently, Python moves it to a free list and uses it later.

Note: Though it’s not certain always, the same memory will be used.
a = (1,2)
print(id(a))
del a
b = (1,2)
print(id(b))

The output is:

140509665739520
140509665739520

Empty tuple
When two empty tuples are created, they will point to the same address space. Empty tuples act as singletons, that is, there is always only one tuple with a length of zero. When creating an empty tuple, Python points to the already preallocated one in such a way that any empty tuple has the same address in the memory. This is possible because tuples are immutable, and sometimes this saves a lot of memory:

a = ()
b = ()
if a is b :
print(“A and B are not mapped to same address”)
print(“address of a: “, id(a))
print(“address of b: “, id(b))

print(“A size of list: “, sys.getsizeof(a))
print(“B size of list: “, sys.getsizeof(b))

The output is:

A and B are not mapped to same address
address of a:  140509600395328
address of b:  140509600395328
A size of list:  40
B size of list:  40

Removal and insertion
We cannot update the existing tuple, but we can create new tuple with it; it will be copied into a new address:

tuple1 = (‘one’, ‘two’, ‘three’)
print(id(tuple1), tuple1)
print()
tuple2 = (‘1’, ‘2’, ‘3’)
print(id(tuple2), tuple2)
print()
tuple3 = tuple1 + tuple2
print(id(tuple3), tuple3)
print()

The output is:

140509667635968 (‘one’, ‘two’, ‘three’)
140509666902208 (‘1’, ‘2’, ‘3’)
140509666693664 (‘one’, ‘two’, ‘three’, ‘1’, ‘2’, ‘3’)

Sort
As tuples are immutable, we cannot implicitly sort them. But we can make use of the sort function to do so.

Note: It will return as list.
sorted_tuple1 = sorted(tuple1)
print(id(sorted_tuple1), type(sorted_tuple1), sorted_tuple1)

The output is:

140509667589312 <class ‘list’> [‘one’, ‘three’, ‘two’]

Named tuple
The named tuple and normal tuple use exactly the same amount of memory because the field names are stored in the class. So we can either use tuple or named tuple. However, named tuple will increase the readability of the program. That’s a bonus!

from collections import namedtuple
City = namedtuple(“City”, “name country status”)
chennai = City(“Chennai”, “India”, “Red”)
print(chennai)
print(“Size of named tuple: “, sys.getsizeof(chennai))

normaltuple = (“Chennai”, “India”, “Red”)
print(normaltuple)
print(“Size of normaltuple tuple: “, sys.getsizeof(normaltuple))

The output is:

City(name=’Chennai’, country=’India’, status=’Red’)
Size of named tuple: 64
(‘Chennai’, ‘India’, ‘Red’)
Size of normaltuple tuple: 64

The performance is:

• Index - O(1)
• Slice - O(k)
• in Operator - O(n)

n = number of elements
k = k’th index
1 = order of 1
Note: As tuples are immutable they have only limited features.

To sum up, we should use lists when the collection needs to be changed constantly. We should use tuples when:

  • Working with arguments
  • Returning two or more items from a function
  • Iterating over a dictionary’s key-value pairs
  • Using string formatting

Lists are complex to implement, while tuples save memory and time (a list uses 3000+ lines of code while tuple needs only 1000+ lines of C code).

LEAVE A REPLY

Please enter your comment!
Please enter your name here