Regex Cheatsheet & Common Patterns: Email, URL, Phone Validation
Practical regex patterns for everyday validation tasks — and the tricky edge cases that catch most developers.
Regex is one of those tools where 80% of what you need fits on a single page. The other 20% is a thicket of edge cases, language-specific quirks, and patterns that look right but quietly miss valid inputs. This guide gives you the 80% — and warns about the 20%.
The metacharacter quick-reference
| Symbol | Meaning |
|---|---|
| . | Any character (except newline) |
| ^ | Start of string (or line, with /m) |
| $ | End of string (or line, with /m) |
| \d \D | Digit / non-digit |
| \w \W | Word char [A-Za-z0-9_] / non-word |
| \s \S | Whitespace / non-whitespace |
| * | 0 or more (greedy) |
| + | 1 or more (greedy) |
| ? | 0 or 1, or non-greedy modifier |
| {n,m} | Between n and m times |
| [abc] | Any of a, b, c |
| [^abc] | None of a, b, c |
| (...) | Capturing group |
| (?<name>...) | Named group |
| | | Alternation |
Test patterns live: Paste a regex and see matches highlight on your text.
Open Regex Tester →Pattern: Email validation
The classic question. The answer most developers give is wrong, but practical:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This matches the structure most people expect: [email protected]. It will reject foo@bar (no TLD) and [email protected] (TLD too short). It will accept many edge cases that are valid per RFC 5322 but rare in practice (quoted local parts, IP-literal domains).
The strict RFC 5322 regex is over 6,000 characters long. Don't use it. Instead, accept a permissive pattern at validation time, then verify the address by sending an email. That's the only way to know it really works.
Pattern: URL detection
https?:\/\/[\w.-]+(?:\/[\w\-./?%&=]*)?
For finding URLs in text (e.g. auto-linking), this is plenty. For validating that a string is a well-formed URL, use the URL constructor in JavaScript: new URL(input) throws if the input isn't a valid URL.
Pattern: Phone numbers
Phone formats vary wildly by country. For US numbers in common formats:
^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
For international numbers, don't use regex. Use Google's libphonenumber library — it knows the rules for every country and is constantly updated as numbering plans change.
Pattern: IPv4 addresses
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
This is the strict version that rejects values like 999.999.999.999. The simpler (?:\d{1,3}\.){3}\d{1,3} matches the structure but accepts invalid octets.
Greedy vs lazy quantifiers
By default, * and + are greedy — they match as much as possible. Adding ? after them (*?, +?) makes them lazy — match as little as possible.
Classic example: extracting the content of an HTML tag.
Input: <b>hello</b> world <b>again</b>
Greedy <b>.*</b>: matches everything from first <b> to last </b>
Lazy <b>.*?</b>: matches each <b>…</b> pair separately
Lookarounds
Lookaheads and lookbehinds let you assert what comes before or after a match without including it in the match.
foo(?=bar)— “foo” only if followed by “bar”foo(?!bar)— “foo” only if NOT followed by “bar”(?<=foo)bar— “bar” only if preceded by “foo”(?<!foo)bar— “bar” only if NOT preceded by “foo”
When NOT to use regex
- HTML/XML: use a real parser (DOMParser, BeautifulSoup, jsdom).
- Strict spec compliance: for things like RFC-strict email validation, regex is the wrong tool.
- Recursive structures: JavaScript regex cannot match nested parens reliably. Use a parser.
- Performance-critical paths: a complex regex can backtrack catastrophically. Test with adversarial input.