17 Regex

import re

17.1 Common Functions

17.1.1 `re.search()`

Searches for the first occurrence of a pattern within a string.
Returns a match object if the pattern is found; otherwise, returns None.

import re

text = "hello world"
match = re.search(r"hello", text)
print(match)

if match:
    print("Pattern found!")
else:
    print("Pattern not found.")

<re.Match object; span=(0, 5), match='hello'>
Pattern found!

17.1.2 `re.match()`

Checks if the pattern matches at the beginning of the string.
Returns a match object if it matches the start of the string, otherwise returns None.

text = "hello world"
match = re.match(r"hello", text)

if match:
    print("Pattern matches the start!")
else:
    print("No match at the start.")

Pattern matches the start!

17.1.3 `re.findall()`

text = "My phone number is 1234, and my zip code is 56789."
matches = re.findall(r"\d+", text)
matches

['1234', '56789']

17.1.4 `re.sub()`

Substitutes all occurrences of a pattern with a replacement string.
Returns a new string with the substitutions.

text = "I have a dog. My neighbor has a dog too."
new_text = re.sub(r"dog", "cat", text)
new_text

'I have a cat. My neighbor has a cat too.'

17.2 Regex Syntax

Regular expressions use special characters to define patterns. Here are some of the most commonly used characters:

17.2.1 Metacharacters:

. : Matches any single character except newline (\n).
^ : Matches the start of a string.
$ : Matches the end of a string.
* : Matches 0 or more repetitions of the preceding character.
+ : Matches 1 or more repetitions of the preceding character.
? : Matches 0 or 1 occurrence of the preceding character.
{} : Specifies the number of repetitions (e.g., {2} means exactly two, {2,4} means between two and four).

17.2.2 Character Classes:

\d : Matches any digit (equivalent to [0-9]).
\w : Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
\s : Matches any whitespace character (spaces, tabs, newlines).
\D, \W, \S : Match the opposite of \d, \w, and \s.

17.2.3 Anchors:

^ : Anchors the pattern to the start of the string.
$ : Anchors the pattern to the end of the string.

Example:

pattern = r"^\d+"  # Matches digits at the start of the string
text = "1234abc"
match = re.search(pattern, text)
if match:
    print("Found at the start:", match.group())  # Output: Found at the start: 1234

17.2.4 Groups:

Parentheses () are used to create groups in regex.
You can extract matched groups using .group() or .groups().

Example:

pattern = r"(hello) (world)"
text = "hello world"
match = re.search(pattern, text)

if match:
    print(match.group(1))  # Output: hello
    print(match.group(2))  # Output: world

hello
world

17.2.5 Escaping Special Characters

If you want to match one of the special regex characters literally, you need to escape it using a backslash (\).

Example:

pattern = r"\$100"  # Matches the string "$100"
text = "The price is $100."
match = re.search(pattern, text)

if match:
    print("Price found:", match.group())

Price found: $100

17.2.6 Flags in Regex

You can modify the behavior of regex with flags, such as: - re.IGNORECASE or re.I : Makes the regex case-insensitive. - re.MULTILINE or re.M : Allows ^ and $ to match the start and end of each line in a multi-line string. - re.DOTALL or re.S : Makes . match newlines as well.

pattern = r"hello"
text = "HELLO world"
match = re.search(pattern, text, re.IGNORECASE)

if match:
    print("Case-insensitive match found!")

Case-insensitive match found!

17.1 Common Functions

17.1.1 re.search()

17.1.2 re.match()

17.1.3 re.findall()

17.1.4 re.sub()

17.2 Regex Syntax

17.2.1 Metacharacters:

17.2.2 Character Classes:

17.2.3 Anchors:

17.2.4 Groups:

17.2.5 Escaping Special Characters

17.2.6 Flags in Regex

17.1.1 `re.search()`

17.1.2 `re.match()`

17.1.3 `re.findall()`

17.1.4 `re.sub()`