Regex Complexity Scorer (Backtracking Risk, Lookahead Depth)
Most regex patterns are well-behaved — they run in linear time on any input, fail fast on non-matches, and never become the bottleneck. A few patterns hide a quiet exponential. The classic example, (a+)+$ applied to a long string of as followed by a single b, takes time exponential in the string length — a regex that looks like a one-liner becomes a denial-of-service vector when an attacker controls the input. This tool reads the pattern, identifies the structural features that lead to that class of failure, and produces a complexity score so you can spot the risk before the pattern reaches a public input handler.
What the Scorer Measures
Five factors contribute to the complexity score: alternation count (the | operator, each adding a branch), quantifier nesting depth (a quantified group inside another quantified group, the classic ReDoS shape), lookaround depth (lookahead and lookbehind assertions, which can chain), overall pattern length (a soft signal that longer patterns are harder to reason about), and named special features (atomic groups, possessive quantifiers, backreferences). Each factor produces a sub-score; the overall score is a weighted sum, mapped to a tier (Low, Moderate, High, Critical). A High or Critical score does not mean the pattern is broken — it means it warrants a second look.
Catastrophic Backtracking, Explained
The pathological case is two overlapping quantifiers operating on the same input position. (a+)+ says "one or more groups, each containing one or more as." On a string of N as, the regex engine has 2^(N-1) ways to partition the as into groups, and a backtracking engine (which is what JavaScript, Python's re, and PCRE all use by default) will try every partition before giving up on a non-match. The fix is almost always to rewrite the pattern: a+ alone is equivalent for matching purposes, runs in linear time, and never backtracks. The scorer flags the (a+)+ shape as a critical risk and points at the specific quantified-inside-quantified construct.
Lookaheads, Lookbehinds, and Their Costs
Lookarounds are zero-width assertions — they check whether something matches without consuming characters. A single lookahead has constant cost; multiple lookarounds at different positions in a pattern compound in cost roughly multiplicatively. The scorer counts lookaround depth (the maximum number of stacked lookarounds at any position in the pattern) and weights it heavily. Most patterns need at most one or two lookarounds at any position; patterns with three or more are usually trying to do something that would be cleaner as two sequential matches or as a parser.
How to Read the Score
Low (0–20) patterns are safe — simple character classes, anchored matches, no nested quantifiers. Moderate (21–50) patterns have some complexity, often from a few alternations or a single lookahead, but no ReDoS shape. High (51–80) patterns have either a deep alternation tree or a borderline nested quantifier; treat them as worth reviewing, especially if the input is user-supplied. Critical (81+) patterns have at least one classic ReDoS shape and should be rewritten or replaced with a non-backtracking engine (RE2, Hyperscan) before they handle untrusted input. The scorer also notes the specific construct that drove the high score, so the rewrite target is obvious from the output.
Pair this with the rest of the UDT regex cluster: Regex Tester to confirm a pattern still matches the intended cases after rewriting, Regex Builder for assembling patterns piece-by-piece, our regex 101 guide for a primer on the syntax, and Cron Next Fire for the scheduling side when regexes drive cron job content.
Frequently Asked Questions
Built by Derek Giordano · Part of Ultimate Design Tools