Developer

Regex Cheatsheet & Common Patterns: Email, URL, Phone Validation

Practical regex patterns for everyday validation tasks — and the tricky edge cases that catch most developers.

11 min read

Regex is one of those tools where 80% of what you need fits on a single page. The other 20% is a thicket of edge cases, language-specific quirks, and patterns that look right but quietly miss valid inputs. This guide gives you the 80% — and warns about the 20%.

The metacharacter quick-reference

SymbolMeaning
.Any character (except newline)
^Start of string (or line, with /m)
$End of string (or line, with /m)
\d \DDigit / non-digit
\w \WWord char [A-Za-z0-9_] / non-word
\s \SWhitespace / non-whitespace
*0 or more (greedy)
+1 or more (greedy)
?0 or 1, or non-greedy modifier
{n,m}Between n and m times
[abc]Any of a, b, c
[^abc]None of a, b, c
(...)Capturing group
(?<name>...)Named group
|Alternation

Test patterns live: Paste a regex and see matches highlight on your text.

Open Regex Tester →

Pattern: Email validation

The classic question. The answer most developers give is wrong, but practical:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This matches the structure most people expect: [email protected]. It will reject foo@bar (no TLD) and [email protected] (TLD too short). It will accept many edge cases that are valid per RFC 5322 but rare in practice (quoted local parts, IP-literal domains).

The strict RFC 5322 regex is over 6,000 characters long. Don't use it. Instead, accept a permissive pattern at validation time, then verify the address by sending an email. That's the only way to know it really works.

Pattern: URL detection

https?:\/\/[\w.-]+(?:\/[\w\-./?%&=]*)?

For finding URLs in text (e.g. auto-linking), this is plenty. For validating that a string is a well-formed URL, use the URL constructor in JavaScript: new URL(input) throws if the input isn't a valid URL.

Pattern: Phone numbers

Phone formats vary wildly by country. For US numbers in common formats:

^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

For international numbers, don't use regex. Use Google's libphonenumber library — it knows the rules for every country and is constantly updated as numbering plans change.

Pattern: IPv4 addresses

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

This is the strict version that rejects values like 999.999.999.999. The simpler (?:\d{1,3}\.){3}\d{1,3} matches the structure but accepts invalid octets.

Greedy vs lazy quantifiers

By default, * and + are greedy — they match as much as possible. Adding ? after them (*?, +?) makes them lazy — match as little as possible.

Classic example: extracting the content of an HTML tag.

Input: <b>hello</b> world <b>again</b>
Greedy <b>.*</b>: matches everything from first <b> to last </b>
Lazy <b>.*?</b>: matches each <b>…</b> pair separately

Lookarounds

Lookaheads and lookbehinds let you assert what comes before or after a match without including it in the match.

  • foo(?=bar) — “foo” only if followed by “bar”
  • foo(?!bar) — “foo” only if NOT followed by “bar”
  • (?<=foo)bar — “bar” only if preceded by “foo”
  • (?<!foo)bar — “bar” only if NOT preceded by “foo”

When NOT to use regex

  • HTML/XML: use a real parser (DOMParser, BeautifulSoup, jsdom).
  • Strict spec compliance: for things like RFC-strict email validation, regex is the wrong tool.
  • Recursive structures: JavaScript regex cannot match nested parens reliably. Use a parser.
  • Performance-critical paths: a complex regex can backtrack catastrophically. Test with adversarial input.