Regular Expressions Explained
Regular expressions (regex) are a concise language for matching patterns in text. They're built into every major programming language and text editor, and once you learn the syntax, you can validate input, extract data, search-and-replace, and parse logs in seconds. This guide covers the core concepts with real-world examples you can test immediately.
- Learn regular expressions from scratch.
- What Is a Regular Expression?.
- Covers literal characters & escaping.
- Covers character classes.
- Covers quantifiers.
What Is a Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. When you write /hello/, you're telling the regex engine to find the literal text "hello" in a string. When you write /\d{3}-\d{4}/, you're matching a phone number pattern like "555-1234" — three digits, a dash, then four digits.
In JavaScript, regex patterns are enclosed in forward slashes: /pattern/flags. In other languages like Python, they're passed as strings to regex functions. The pattern syntax itself is nearly universal across languages, though some advanced features vary.
Literal Characters & Escaping
Most characters match themselves literally. The pattern /cat/ matches the text "cat" inside "concatenate" or "my cat is sleeping". But some characters have special meaning in regex and need to be escaped with a backslash if you want to match them literally.
-webkit-backdrop-filter alongside backdrop-filter for Safari support. Without the prefix, the effect is invisible to roughly 25% of mobile users.The special characters (metacharacters) are: . * + ? ^ $ { } [ ] ( ) | \
The dot . is the most commonly confused metacharacter. Without escaping, it matches any single character (except newlines). If you want to match an actual period, use \..
Character Classes
Character classes let you match any one character from a set. Square brackets define a class — [aeiou] matches any single vowel, and [0-9] matches any digit.
backdrop-filter inside a position: fixed element can cause severe scroll performance issues. Test thoroughly on real iOS devices.You can combine ranges inside a class: [a-zA-Z0-9] matches any alphanumeric character. The caret ^ at the start of a class negates it — [^0-9] matches anything that isn't a digit.
Quantifiers
Quantifiers specify how many times the preceding element should match.
Greedy vs lazy
By default, quantifiers are greedy — they match as much text as possible. Adding ? after a quantifier makes it lazy, matching as little as possible. This matters when you're extracting content between delimiters:
Anchors & Boundaries
Anchors don't match characters — they match positions in the string.
Word boundaries are especially useful for matching whole words. Without \b, searching for "cat" would also match inside "category", "concatenate", and "scatter".
Groups & Capturing
Parentheses create groups that serve two purposes: they let you apply quantifiers to a sequence of characters, and they capture the matched text for later use.
Non-capturing groups
If you need grouping for logic but don't need to capture the result, use (?:...). This is slightly more efficient and keeps your capture group numbering clean:
Named groups
Named capture groups use (? syntax, making your regex more readable:
Alternation (OR)
The pipe character | acts as a logical OR. It matches the pattern on either side:
Without parentheses, alternation applies to everything on each side. /abc|def/ matches "abc" or "def", not "ab(c or d)ef". Use groups to limit the scope of the alternation.
Lookaheads & Lookbehinds
Lookaheads and lookbehinds check whether a pattern exists before or after the current position without consuming characters. They're zero-width assertions, meaning they don't include the looked-at text in the match.
Lookaheads are supported in all modern regex engines. Lookbehinds have excellent support in JavaScript (since ES2018), Python, Java, and .NET, but some older engines don't support them.
Flags
Flags modify how the regex engine interprets the pattern. They're placed after the closing slash in JavaScript:
Common Patterns
Email validation (basic)
This matches most standard email formats. Perfect email validation via regex is notoriously complex — for production use, combine a reasonable regex with server-side verification.
URL matching
Hex color code
Matches both 3-digit shorthand (#FFF) and 6-digit (#FF6B6B) hex colors.
Phone number (US)
Handles formats like (555) 123-4567, 555-123-4567, and 555.123.4567.
IP address (IPv4)
HTML tag content
The \1 backreference matches whatever the first group captured, ensuring the opening and closing tags match. Test all of these patterns in the Regex Tester to see them work in real time.
Common Pitfalls
Greedy matching is the most frequent source of regex bugs. If your pattern matches more text than expected, try adding ? after the quantifier to make it lazy.
Forgetting to escape special characters is another common mistake. If your pattern includes dots, brackets, parentheses, or dollar signs as literal characters, make sure they're preceded by a backslash.
Regex for HTML parsing is tempting but fragile. Regular expressions can't reliably handle nested structures, self-closing tags, or attributes with quotes. For anything beyond simple extraction, use a proper HTML parser instead.
Performance can be a concern with complex patterns on large inputs. Patterns with nested quantifiers like (a+)+ can cause catastrophic backtracking, where the engine tries exponentially many paths. Avoid nesting quantifiers that can match the same characters.
Use the Regex Tester to write, test, and debug regular expressions against sample text with real-time highlighting of matches.
⚡ Open Regex Tester