import re
17 Regex
17.1 Common Functions
17.1.1 re.search()
- Searches for the first occurrence of a pattern within a string.
- Returns a match object if the pattern is found; otherwise, returns
None
.
import re
= "hello world"
text = re.search(r"hello", text)
match print(match)
if match:
print("Pattern found!")
else:
print("Pattern not found.")
<re.Match object; span=(0, 5), match='hello'>
Pattern found!
17.1.2 re.match()
- Checks if the pattern matches at the beginning of the string.
- Returns a match object if it matches the start of the string, otherwise returns
None
.
= "hello world"
text = re.match(r"hello", text)
match
if match:
print("Pattern matches the start!")
else:
print("No match at the start.")
Pattern matches the start!
17.1.3 re.findall()
= "My phone number is 1234, and my zip code is 56789."
text = re.findall(r"\d+", text)
matches matches
['1234', '56789']
17.1.4 re.sub()
- Substitutes all occurrences of a pattern with a replacement string.
- Returns a new string with the substitutions.
= "I have a dog. My neighbor has a dog too."
text = re.sub(r"dog", "cat", text)
new_text new_text
'I have a cat. My neighbor has a cat too.'
17.2 Regex Syntax
Regular expressions use special characters to define patterns. Here are some of the most commonly used characters:
17.2.1 Metacharacters:
.
: Matches any single character except newline (\n
).^
: Matches the start of a string.$
: Matches the end of a string.*
: Matches 0 or more repetitions of the preceding character.+
: Matches 1 or more repetitions of the preceding character.?
: Matches 0 or 1 occurrence of the preceding character.{}
: Specifies the number of repetitions (e.g.,{2}
means exactly two,{2,4}
means between two and four).
17.2.2 Character Classes:
\d
: Matches any digit (equivalent to[0-9]
).\w
: Matches any alphanumeric character (equivalent to[a-zA-Z0-9_]
).\s
: Matches any whitespace character (spaces, tabs, newlines).\D
,\W
,\S
: Match the opposite of\d
,\w
, and\s
.
17.2.3 Anchors:
^
: Anchors the pattern to the start of the string.$
: Anchors the pattern to the end of the string.
Example:
= r"^\d+" # Matches digits at the start of the string
pattern = "1234abc"
text = re.search(pattern, text)
match if match:
print("Found at the start:", match.group()) # Output: Found at the start: 1234
17.2.4 Groups:
- Parentheses
()
are used to create groups in regex. - You can extract matched groups using
.group()
or.groups()
.
Example:
= r"(hello) (world)"
pattern = "hello world"
text = re.search(pattern, text)
match
if match:
print(match.group(1)) # Output: hello
print(match.group(2)) # Output: world
hello
world
17.2.5 Escaping Special Characters
If you want to match one of the special regex characters literally, you need to escape it using a backslash (\
).
Example:
= r"\$100" # Matches the string "$100"
pattern = "The price is $100."
text = re.search(pattern, text)
match
if match:
print("Price found:", match.group())
Price found: $100
17.2.6 Flags in Regex
You can modify the behavior of regex with flags, such as: - re.IGNORECASE
or re.I
: Makes the regex case-insensitive. - re.MULTILINE
or re.M
: Allows ^
and $
to match the start and end of each line in a multi-line string. - re.DOTALL
or re.S
: Makes .
match newlines as well.
= r"hello"
pattern = "HELLO world"
text = re.search(pattern, text, re.IGNORECASE)
match
if match:
print("Case-insensitive match found!")
Case-insensitive match found!