DeveloperApril 2026·13 min read

Regular Expressions Explained

Regular expressions (regex) are a concise language for matching patterns in text. They're built into every major programming language and text editor, and once you learn the syntax, you can validate input, extract data, search-and-replace, and parse logs in seconds. This guide covers the core concepts with real-world examples you can test immediately.

🔍
Try the Regex Tester
Write, test, and debug regular expressions in real time — free, no signup
DG
Derek Giordano
Designer & Developer
In this guide
01What Is a Regular Expression?02Literal Characters & Escaping03Character Classes04Quantifiers05Anchors & Boundaries06Groups & Capturing07Alternation (OR)08Lookaheads & Lookbehinds09Flags10Common Patterns11Common Pitfalls
⚡ Key Takeaways
  • Learn regular expressions from scratch.
  • What Is a Regular Expression?.
  • Covers literal characters & escaping.
  • Covers character classes.
  • Covers quantifiers.

What Is a Regular Expression?

A regular expression is a sequence of characters that defines a search pattern. When you write /hello/, you're telling the regex engine to find the literal text "hello" in a string. When you write /\d{3}-\d{4}/, you're matching a phone number pattern like "555-1234" — three digits, a dash, then four digits.

In JavaScript, regex patterns are enclosed in forward slashes: /pattern/flags. In other languages like Python, they're passed as strings to regex functions. The pattern syntax itself is nearly universal across languages, though some advanced features vary.

Literal Characters & Escaping

Most characters match themselves literally. The pattern /cat/ matches the text "cat" inside "concatenate" or "my cat is sleeping". But some characters have special meaning in regex and need to be escaped with a backslash if you want to match them literally.

💡 Tip
Always include -webkit-backdrop-filter alongside backdrop-filter for Safari support. Without the prefix, the effect is invisible to roughly 25% of mobile users.

The special characters (metacharacters) are: . * + ? ^ $ { } [ ] ( ) | \

/hello\.com/ matches "hello.com" /\$9\.99/ matches "$9.99" /file\.txt/ matches "file.txt" /2 \+ 2 = 4/ matches "2 + 2 = 4"

The dot . is the most commonly confused metacharacter. Without escaping, it matches any single character (except newlines). If you want to match an actual period, use \..

Character Classes

Character classes let you match any one character from a set. Square brackets define a class — [aeiou] matches any single vowel, and [0-9] matches any digit.

⚠ Warning
On iOS Safari, backdrop-filter inside a position: fixed element can cause severe scroll performance issues. Test thoroughly on real iOS devices.
[abc]Matches 'a', 'b', or 'c'
[a-z]Matches any lowercase letter
[A-Z]Matches any uppercase letter
[0-9]Matches any digit
[^abc]Matches any character EXCEPT 'a', 'b', or 'c'
\dShorthand for [0-9] — any digit
\wShorthand for [a-zA-Z0-9_] — any 'word' character
\sShorthand for whitespace (space, tab, newline)
\DAny non-digit (opposite of \d)
\WAny non-word character (opposite of \w)
.Any character except newline

You can combine ranges inside a class: [a-zA-Z0-9] matches any alphanumeric character. The caret ^ at the start of a class negates it — [^0-9] matches anything that isn't a digit.

Quantifiers

Quantifiers specify how many times the preceding element should match.

*Zero or more times
+One or more times
?Zero or one time (optional)
{3}Exactly 3 times
{2,5}Between 2 and 5 times
{3,}3 or more times
/\d+/ one or more digits: "42", "100", "7" /colou?r/ matches "color" and "colour" /\d{3}-\d{4}/ matches "555-1234" /[a-z]{2,4}/ 2 to 4 lowercase letters

Greedy vs lazy

By default, quantifiers are greedy — they match as much text as possible. Adding ? after a quantifier makes it lazy, matching as little as possible. This matters when you're extracting content between delimiters:

Text: hello and world /.*<\/b>/ greedy: matches "hello and world" /.*?<\/b>/ lazy: matches "hello" then "world"

Anchors & Boundaries

Anchors don't match characters — they match positions in the string.

^Start of string (or line, with m flag)
$End of string (or line, with m flag)
\bWord boundary (between \w and \W)
\BNot a word boundary
/^Hello/ matches "Hello world" but not "Say Hello" /\.com$/ matches strings ending in ".com" /\bcat\b/ matches "cat" but not "concatenate"

Word boundaries are especially useful for matching whole words. Without \b, searching for "cat" would also match inside "category", "concatenate", and "scatter".

Groups & Capturing

Parentheses create groups that serve two purposes: they let you apply quantifiers to a sequence of characters, and they capture the matched text for later use.

/(ha)+/ matches "ha", "haha", "hahaha" /(\d{3})-(\d{4})/ captures area code and number separately // In JavaScript: const match = "555-1234".match(/(\d{3})-(\d{4})/); // match[1] = "555", match[2] = "1234"

Non-capturing groups

If you need grouping for logic but don't need to capture the result, use (?:...). This is slightly more efficient and keeps your capture group numbering clean:

/(?:https?:\/\/)?([\w.-]+)/ // Groups the protocol but only captures the domain

Named groups

Named capture groups use (?...) syntax, making your regex more readable:

/(?\d{4})-(?\d{2})-(?\d{2})/ // match.groups.year, match.groups.month, match.groups.day

Alternation (OR)

The pipe character | acts as a logical OR. It matches the pattern on either side:

/cat|dog/ matches "cat" or "dog" /red|blue|green/ matches any of the three colors /(Mr|Mrs|Ms)\.\s\w+/ matches "Mr. Smith", "Ms. Jones"

Without parentheses, alternation applies to everything on each side. /abc|def/ matches "abc" or "def", not "ab(c or d)ef". Use groups to limit the scope of the alternation.

Lookaheads & Lookbehinds

Lookaheads and lookbehinds check whether a pattern exists before or after the current position without consuming characters. They're zero-width assertions, meaning they don't include the looked-at text in the match.

(?=...)Positive lookahead — followed by...
(?!...)Negative lookahead — NOT followed by...
(?<=...)Positive lookbehind — preceded by...
(?Negative lookbehind — NOT preceded by...
/\d+(?=px)/ matches digits followed by "px": "16" in "16px" /\d+(?!px)/ matches digits NOT followed by "px" /(?<=\$)\d+/ matches digits preceded by "$": "99" in "$99" /(?

Lookaheads are supported in all modern regex engines. Lookbehinds have excellent support in JavaScript (since ES2018), Python, Java, and .NET, but some older engines don't support them.

Flags

Flags modify how the regex engine interprets the pattern. They're placed after the closing slash in JavaScript:

gGlobal — find all matches, not just the first
iCase-insensitive matching
mMultiline — ^ and $ match line boundaries
sDotall — dot (.) also matches newlines
uUnicode — enables full Unicode support
/hello/gi matches "Hello", "HELLO", "hElLo" everywhere in the string /^start/gm matches "start" at the beginning of any line

Common Patterns

Email validation (basic)

/^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$/

This matches most standard email formats. Perfect email validation via regex is notoriously complex — for production use, combine a reasonable regex with server-side verification.

URL matching

/https?:\/\/[\w.-]+(?:\.[a-zA-Z]{2,})(?:\/[\w./?#&=-]*)?/

Hex color code

/^#(?:[0-9a-fA-F]{3}|[0-9a-fA-F]{6})$/

Matches both 3-digit shorthand (#FFF) and 6-digit (#FF6B6B) hex colors.

Phone number (US)

/^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$/

Handles formats like (555) 123-4567, 555-123-4567, and 555.123.4567.

IP address (IPv4)

/^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/

HTML tag content

/<(\w+)>(.*?)<\/\1>/

The \1 backreference matches whatever the first group captured, ensuring the opening and closing tags match. Test all of these patterns in the Regex Tester to see them work in real time.

Common Pitfalls

Greedy matching is the most frequent source of regex bugs. If your pattern matches more text than expected, try adding ? after the quantifier to make it lazy.

Forgetting to escape special characters is another common mistake. If your pattern includes dots, brackets, parentheses, or dollar signs as literal characters, make sure they're preceded by a backslash.

Regex for HTML parsing is tempting but fragile. Regular expressions can't reliably handle nested structures, self-closing tags, or attributes with quotes. For anything beyond simple extraction, use a proper HTML parser instead.

Performance can be a concern with complex patterns on large inputs. Patterns with nested quantifiers like (a+)+ can cause catastrophic backtracking, where the engine tries exponentially many paths. Avoid nesting quantifiers that can match the same characters.

Test your regex patterns

Use the Regex Tester to write, test, and debug regular expressions against sample text with real-time highlighting of matches.

⚡ Open Regex Tester
DG
Derek Giordano
Written by the creator of Ultimate Design Tools. BA in Business Marketing.
📚 References & Further Reading