Python Collections
Python Collections
Much of what you need to do with Python can be done using built-in containers like dict, list, set, and tuple. But these aren’t always the most optimal. In this guide, I’ll cover why and when to use collections and provide interesting examples of each. This is designed to supplement the documentation with examples and explanation, not replace it.
from collections import Counter
A counter is a dictionary-like object designed to keep tallies. With a counter, the key is the item to be counted and value is the count. You could certainly use a regular dictionary to keep a count, but a counter provides much more control.
A counter object ends up looking just like a dictionary and even contains a dictionary interface.
ctr = Counter({'birds': 200, 'lizards': 340, 'hamsters': 120}) ctr['hamsters'] # 120
One thing to note is that if you try to access a key that doesn’t exist, the counter will return 0 rather than raising a KeyError as a standard dictionary would.
Counters come with a brilliant set of methods that will make your life easier if you learn how to use them.
Get the most common word in a text file
import re words = re.findall(r'w+', open('ipencil.txt').read().lower()) Counter(words).most_common(1) # [('the', 148)]
Get the count of each number in a long string of numbers
numbers = """ 73167176531330624919225119674426574742355349194934 96983520312774506326239578318016984801869478851843 85861560789112949495459501737958331952853208805511 12540698747158523863050715693290963295227443043557 66896648950445244523161731856403098711121722383113 62229893423380308135336276614282806444486645238749 30358907296290491560440772390713810515859307960866 70172427121883998797908792274921901699720888093776 65727333001053367881220235421809751254540594752243 52584907711670556013604839586446706324415722155397 53697817977846174064955149290862569321978468622482 83972241375657056057490261407972968652414535100474 82166370484403199890008895243450658541227588666881 16427171479924442928230863465674813919123162824586 17866458359124566529476545682848912883142607690042 24219022671055626321111109370544217506941658960408 07198403850962455444362981230987879927244284909188 84580156166097919133875499200524063689912560717606 05886116467109405077541002256983155200055935729725 71636269561882670428252483600823257530420752963450 """ numbers = re.sub("n", "", numbers) Counter(numbers).most_common() [('2', 112), ('5', 107), ('4', 107), ('6', 103), ('9', 100), ('8', 100), ('1', 99), ('0', 97), ('7', 91), ('3', 84)]
most_common is a very valuable method. If you pass in an integer as the first parameter, it will return that many results. If you call it without any arguments, it will return the frequency of all elements. As you can see it returns a list of tuples – the tuple structured like this (value, frequency).
When dealing with multiple Counter objects you can perform operations against them. For instance, you can add two counters which would add the counts for each key. You can also perform intersection or union. If I wanted to compare the values for given keys between two counters, I can return the minimum or maximum values only.
For example, a student has taken 4 quizzes two times each. She is allowed to keep the highest score for each quiz.
first_attempt = Counter({1: 90, 2: 65, 3: 78, 4: 88}) second_attempt = Counter({1: 88, 2: 84, 3: 95, 4: 92}) final = first_attempt | second_attempt final # Counter({3: 95, 4: 92, 1: 90, 2: 84})
from collections import deque
deque stands for “double-ended queue” and is used as a stack or queue. Although lists offer many of the same operations, they are not optimized for variable-length operations.
How do you know when to use a deque verses a list?
Basically if you’re structuring the data in a way that requires quickly appending to either end or retrieving from either end then you would want to use a deque. For instance, if you’re creating a queue of objects that need to be processed and you want to process them in the order they arrived, you would want to append new objects to one end and pop objects off of the other end for processing.
queue = deque() # append values to wait for processing queue.appendleft("first") queue.appendleft("second") queue.appendleft("third") # pop values when ready process(queue.pop()) # would process "first" # add values while processing queue.appendleft("fourth") # what does the queue look like now? queue # deque(['fourth', 'third', 'second'])