21 `itertools.groupby`

itertools.groupby groups consecutive elements in an iterable that share the same key.

Signature:  itertools.groupby(iterable, key=None)
Returns:    iterator of (key_value, group_iterator) pairs

21.1 The Critical Rule

Data must be sorted by the same key function first. Otherwise, the same logical group can appear multiple times because groupby only groups consecutive runs.

Sorted by key:     [A, A, A, B, B, C, C, C]
                    |--A--|  |-B-|  |--C--|    ← 3 groups ✅

NOT sorted:        [A, B, A, C, B, A, C, C]
                    A  B  A  C  B  A  |C-|    ← 7 groups ❌ (A appears 3 times!)

21.2 Basic Example

from itertools import groupby

data = ["apple", "avocado", "banana", "blueberry", "cherry"]

for key, group in groupby(data, key=lambda x: x[0]):
    print(key, "→", list(group))

a → ['apple', 'avocado']
b → ['banana', 'blueberry']
c → ['cherry']

The group is a lazy iterator — you must consume it (e.g., list(group)) before moving to the next iteration, or it’s gone.

21.3 Sort-Then-Group Pattern

This is the most common real-world pattern:

from itertools import groupby
from operator import itemgetter

records = [
    {"dept": "radiology", "name": "Alice"},
    {"dept": "surgery",   "name": "Bob"},
    {"dept": "radiology", "name": "Charlie"},
    {"dept": "surgery",   "name": "Diana"},
]

key_func = itemgetter("dept")

# Step 1: Sort by key
records.sort(key=key_func)

# Step 2: Group by key
for dept, members in groupby(records, key=key_func):
    print(dept, "→", [m["name"] for m in members])

radiology → ['Alice', 'Charlie']
surgery → ['Bob', 'Diana']

21.4 Collecting into a Dict

A handy one-liner to build a grouped dictionary:

grouped = {k: list(g) for k, g in groupby(sorted_data, key=key_func)}

21.5 Practical Example: Run-Length Encoding

groupby shines for detecting consecutive runs — no sorting needed here because we want positional grouping:

from itertools import groupby

signal = "AAABBCDDDDA"

encoded = [(char, sum(1 for _ in group)) for char, group in groupby(signal)]
print(encoded)
# [('A', 3), ('B', 2), ('C', 1), ('D', 4), ('A', 1)]

21.6 Gotcha: The Exhausted Iterator

from itertools import groupby

data = [1, 1, 2, 2, 3]
result = groupby(data)

for key, group in result:
    pass            # didn't consume `group`
    print(list(group))  # ❌ empty! already advanced
# []
# []
# [3]   ← only the last group survives

Fix: always consume group immediately (e.g., list(group), loop over it, etc.).

21.7 `groupby` vs `dict` Accumulation

Use groupby when ...              Use defaultdict(list) when ...
─────────────────────────         ──────────────────────────────
• Data is already sorted          • Data is unsorted
• You want lazy streaming         • You need random access later
• Processing consecutive runs     • Building a lookup table

# defaultdict approach (no sorting needed)
from collections import defaultdict

grouped = defaultdict(list)
for r in records:
    grouped[r["dept"]].append(r["name"])

21.8 Summary

itertools.groupby(iterable, key)
         │            │       │
         │            │       └─ function to extract group key
         │            └─ must be SORTED by key (unless grouping runs)
         │
         └─ yields (key_value, group_iterator) pairs
                                    │
                                    └─ consume IMMEDIATELY or lose it