What Are Regular Expressions?
Regular expressions (regex or regexp) are powerful sequences of characters that define search patterns. They're used for finding, matching, and manipulating text in virtually every programming language and many text editors.
Think of regex as a super-powered "Find and Replace" feature. Instead of searching for exact text, you can search for patterns—like "any email address" or "all phone numbers" or "words that start with 'pre'".
- Validate user input (emails, phone numbers, passwords)
- Search and replace text efficiently
- Extract data from logs and documents
- Parse and process structured text
- Essential skill for developers and data analysts
While regex can look intimidating at first (patterns like ^[\w.-]+@[\w.-]+\.\w{2,}$), once you understand the building blocks, they become remarkably logical and useful.
Basic Regex Syntax
Let's start with the fundamental building blocks of regular expressions:
Literal Characters
The simplest regex patterns are literal characters. The pattern cat matches the exact text "cat" wherever it appears.
Example: Literal Match
Special Characters (Metacharacters)
These characters have special meanings in regex:
| Character | Meaning | Example |
|---|---|---|
. |
Matches any single character (except newline) | c.t matches "cat", "cut", "c9t" |
^ |
Start of string/line | ^Hello matches "Hello" at the start |
$ |
End of string/line | world$ matches "world" at the end |
* |
Zero or more of the previous character | ab*c matches "ac", "abc", "abbc" |
+ |
One or more of the previous character | ab+c matches "abc", "abbc" but not "ac" |
? |
Zero or one of the previous character | colou?r matches "color" and "colour" |
\ |
Escape special characters | \. matches a literal dot |
Character Classes
Character classes let you match any one character from a set of characters:
| Pattern | Meaning | Example |
|---|---|---|
[abc] |
Matches a, b, or c | [aeiou] matches any vowel |
[^abc] |
Matches anything except a, b, or c | [^0-9] matches non-digits |
[a-z] |
Matches any lowercase letter | [A-Za-z] matches any letter |
[0-9] |
Matches any digit | [0-9]{3} matches three digits |
Shorthand Character Classes
Common character classes have convenient shortcuts:
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d |
[0-9] |
Any digit |
\D |
[^0-9] |
Any non-digit |
\w |
[A-Za-z0-9_] |
Any word character |
\W |
[^A-Za-z0-9_] |
Any non-word character |
\s |
[ \t\n\r\f] |
Any whitespace |
\S |
[^ \t\n\r\f] |
Any non-whitespace |
Test Your Regex Patterns
Practice and test regular expressions with our interactive regex tester.
Open Regex TesterQuantifiers
Quantifiers specify how many times a character or group should be matched:
| Quantifier | Meaning | Example |
|---|---|---|
{n} |
Exactly n times | \d{4} matches exactly 4 digits |
{n,} |
n or more times | \d{2,} matches 2 or more digits |
{n,m} |
Between n and m times | \d{2,4} matches 2 to 4 digits |
* |
Same as {0,} | a* matches "", "a", "aa", "aaa"... |
+ |
Same as {1,} | a+ matches "a", "aa", "aaa"... |
? |
Same as {0,1} | a? matches "" or "a" |
Groups and Alternation
Capturing Groups
Parentheses () create capturing groups, which let you:
- Apply quantifiers to multiple characters
- Capture matched text for later use
- Create backreferences
Example: Grouping
Without parentheses, ha+ would match "ha", "haa", "haaa"
Alternation (OR)
The pipe | acts like an OR operator:
Example: Alternation
Non-Capturing Groups
Use (?:...) when you need grouping but don't need to capture:
Anchors and Boundaries
Anchors don't match characters—they match positions:
| Anchor | Matches | Example |
|---|---|---|
^ |
Start of string | ^Hello - string must start with "Hello" |
$ |
End of string | bye$ - string must end with "bye" |
\b |
Word boundary | \bcat\b matches "cat" but not "category" |
\B |
Non-word boundary | \Bcat matches "cat" in "category" |
Practical Regex Examples
Here are commonly used regex patterns with explanations:
1. Email Validation
Basic Email Pattern
Breakdown:
^ - Start of string
[\w.-]+ - One or more word chars, dots, or hyphens
@ - Literal @ symbol
[\w.-]+ - Domain name
\. - Literal dot
\w{2,} - TLD with 2+ characters
$ - End of string
2. Phone Number (US Format)
US Phone Number
3. URL Pattern
Simple URL Pattern
4. Password Validation
Strong Password (8+ chars, uppercase, lowercase, digit)
Uses lookaheads (?=...) to check conditions without consuming characters.
5. Date Format (YYYY-MM-DD)
ISO Date Format
6. IP Address (IPv4)
IPv4 Address
Regex Flags
Flags modify how the regex engine interprets patterns:
| Flag | Name | Effect |
|---|---|---|
i |
Case Insensitive | /cat/i matches "cat", "Cat", "CAT" |
g |
Global | Find all matches, not just the first |
m |
Multiline | ^ and $ match line starts/ends |
s |
Dotall | . matches newline characters too |
Tips for Writing Better Regex
- Start simple - Build patterns incrementally
- Test thoroughly - Use a regex tester with various inputs
- Be specific - Avoid overly broad patterns like
.* - Use anchors -
^and$prevent partial matches - Comment complex patterns - Use verbose mode or external documentation
- Consider edge cases - Empty strings, special characters, long inputs
- Forgetting to escape special characters (
.,*,?) - Greedy vs. lazy quantifiers (
.*vs.*?) - Not anchoring patterns when needed
- Overly complex patterns that are hard to maintain
Conclusion
Regular expressions are an incredibly powerful tool once you understand the fundamentals. Start with simple patterns and gradually build complexity as you become comfortable with the syntax.
Key takeaways:
- Regex patterns are built from literal characters and metacharacters
- Character classes (
[...]) match sets of characters - Quantifiers (
*,+,{n}) control repetition - Groups (
(...)) and alternation (|) add structure - Anchors (
^,$,\b) match positions - Always test your patterns with our Regex Tester
With practice, you'll find regex becomes second nature and saves countless hours of manual text processing!