17  Regex

import re

17.1 Common Functions

17.1.1 re.search()

  • Searches for the first occurrence of a pattern within a string.
  • Returns a match object if the pattern is found; otherwise, returns None.
import re

text = "hello world"
match = re.search(r"hello", text)
print(match)

if match:
    print("Pattern found!")
else:
    print("Pattern not found.")
<re.Match object; span=(0, 5), match='hello'>
Pattern found!

17.1.2 re.match()

  • Checks if the pattern matches at the beginning of the string.
  • Returns a match object if it matches the start of the string, otherwise returns None.
text = "hello world"
match = re.match(r"hello", text)

if match:
    print("Pattern matches the start!")
else:
    print("No match at the start.")
Pattern matches the start!

17.1.3 re.findall()

text = "My phone number is 1234, and my zip code is 56789."
matches = re.findall(r"\d+", text)
matches
['1234', '56789']

17.1.4 re.sub()

  • Substitutes all occurrences of a pattern with a replacement string.
  • Returns a new string with the substitutions.
text = "I have a dog. My neighbor has a dog too."
new_text = re.sub(r"dog", "cat", text)
new_text
'I have a cat. My neighbor has a cat too.'

17.2 Regex Syntax

Regular expressions use special characters to define patterns. Here are some of the most commonly used characters:

17.2.1 Metacharacters:

  • . : Matches any single character except newline (\n).
  • ^ : Matches the start of a string.
  • $ : Matches the end of a string.
  • * : Matches 0 or more repetitions of the preceding character.
  • + : Matches 1 or more repetitions of the preceding character.
  • ? : Matches 0 or 1 occurrence of the preceding character.
  • {} : Specifies the number of repetitions (e.g., {2} means exactly two, {2,4} means between two and four).

17.2.2 Character Classes:

  • \d : Matches any digit (equivalent to [0-9]).
  • \w : Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
  • \s : Matches any whitespace character (spaces, tabs, newlines).
  • \D, \W, \S : Match the opposite of \d, \w, and \s.

17.2.3 Anchors:

  • ^ : Anchors the pattern to the start of the string.
  • $ : Anchors the pattern to the end of the string.

Example:

pattern = r"^\d+"  # Matches digits at the start of the string
text = "1234abc"
match = re.search(pattern, text)
if match:
    print("Found at the start:", match.group())  # Output: Found at the start: 1234

17.2.4 Groups:

  • Parentheses () are used to create groups in regex.
  • You can extract matched groups using .group() or .groups().

Example:

pattern = r"(hello) (world)"
text = "hello world"
match = re.search(pattern, text)

if match:
    print(match.group(1))  # Output: hello
    print(match.group(2))  # Output: world
hello
world

17.2.5 Escaping Special Characters

If you want to match one of the special regex characters literally, you need to escape it using a backslash (\).

Example:

pattern = r"\$100"  # Matches the string "$100"
text = "The price is $100."
match = re.search(pattern, text)

if match:
    print("Price found:", match.group()) 
Price found: $100

17.2.6 Flags in Regex

You can modify the behavior of regex with flags, such as: - re.IGNORECASE or re.I : Makes the regex case-insensitive. - re.MULTILINE or re.M : Allows ^ and $ to match the start and end of each line in a multi-line string. - re.DOTALL or re.S : Makes . match newlines as well.

pattern = r"hello"
text = "HELLO world"
match = re.search(pattern, text, re.IGNORECASE)

if match:
    print("Case-insensitive match found!") 
Case-insensitive match found!