21 itertools.groupby
itertools.groupby groups consecutive elements in an iterable that share the same key.
Signature: itertools.groupby(iterable, key=None)
Returns: iterator of (key_value, group_iterator) pairs
21.1 The Critical Rule
Data must be sorted by the same key function first. Otherwise, the same logical group can appear multiple times because
groupbyonly groups consecutive runs.
Sorted by key: [A, A, A, B, B, C, C, C]
|--A--| |-B-| |--C--| ← 3 groups ✅
NOT sorted: [A, B, A, C, B, A, C, C]
A B A C B A |C-| ← 7 groups ❌ (A appears 3 times!)
21.2 Basic Example
from itertools import groupby
data = ["apple", "avocado", "banana", "blueberry", "cherry"]
for key, group in groupby(data, key=lambda x: x[0]):
print(key, "→", list(group))a → ['apple', 'avocado']
b → ['banana', 'blueberry']
c → ['cherry']
The group is a lazy iterator — you must consume it (e.g., list(group)) before moving to the next iteration, or it’s gone.
21.3 Sort-Then-Group Pattern
This is the most common real-world pattern:
from itertools import groupby
from operator import itemgetter
records = [
{"dept": "radiology", "name": "Alice"},
{"dept": "surgery", "name": "Bob"},
{"dept": "radiology", "name": "Charlie"},
{"dept": "surgery", "name": "Diana"},
]
key_func = itemgetter("dept")
# Step 1: Sort by key
records.sort(key=key_func)
# Step 2: Group by key
for dept, members in groupby(records, key=key_func):
print(dept, "→", [m["name"] for m in members])radiology → ['Alice', 'Charlie']
surgery → ['Bob', 'Diana']
21.4 Collecting into a Dict
A handy one-liner to build a grouped dictionary:
grouped = {k: list(g) for k, g in groupby(sorted_data, key=key_func)}21.5 Practical Example: Run-Length Encoding
groupby shines for detecting consecutive runs — no sorting needed here because we want positional grouping:
from itertools import groupby
signal = "AAABBCDDDDA"
encoded = [(char, sum(1 for _ in group)) for char, group in groupby(signal)]
print(encoded)
# [('A', 3), ('B', 2), ('C', 1), ('D', 4), ('A', 1)]21.6 Gotcha: The Exhausted Iterator
from itertools import groupby
data = [1, 1, 2, 2, 3]
result = groupby(data)
for key, group in result:
pass # didn't consume `group`
print(list(group)) # ❌ empty! already advanced
# []
# []
# [3] ← only the last group survivesFix: always consume group immediately (e.g., list(group), loop over it, etc.).
21.7 groupby vs dict Accumulation
Use groupby when ... Use defaultdict(list) when ...
───────────────────────── ──────────────────────────────
• Data is already sorted • Data is unsorted
• You want lazy streaming • You need random access later
• Processing consecutive runs • Building a lookup table
# defaultdict approach (no sorting needed)
from collections import defaultdict
grouped = defaultdict(list)
for r in records:
grouped[r["dept"]].append(r["name"])21.8 Summary
itertools.groupby(iterable, key)
│ │ │
│ │ └─ function to extract group key
│ └─ must be SORTED by key (unless grouping runs)
│
└─ yields (key_value, group_iterator) pairs
│
└─ consume IMMEDIATELY or lose it