Regex Cheatsheet & Common Patterns: Email, URL, Phone Validation

Regex is one of those tools where 80% of what you need fits on a single page. The other 20% is a thicket of edge cases, language-specific quirks, and patterns that look right but quietly miss valid inputs. This guide gives you the 80% — and warns about the 20%.

The metacharacter quick-reference

Symbol	Meaning
.	Any character (except newline)
^	Start of string (or line, with /m)
$	End of string (or line, with /m)
\d \D	Digit / non-digit
\w \W	Word char [A-Za-z0-9_] / non-word
\s \S	Whitespace / non-whitespace
*	0 or more (greedy)
+	1 or more (greedy)
?	0 or 1, or non-greedy modifier
{n,m}	Between n and m times
[abc]	Any of a, b, c
[^abc]	None of a, b, c
(...)	Capturing group
(?<name>...)	Named group
\|	Alternation

Test patterns live: Paste a regex and see matches highlight on your text.

Open Regex Tester →

Pattern: Email validation

The classic question. The answer most developers give is wrong, but practical:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

This matches the structure most people expect: [email protected]. It will reject foo@bar (no TLD) and [email protected] (TLD too short). It will accept many edge cases that are valid per RFC 5322 but rare in practice (quoted local parts, IP-literal domains).

The strict RFC 5322 regex is over 6,000 characters long. Don't use it. Instead, accept a permissive pattern at validation time, then verify the address by sending an email. That's the only way to know it really works.

Pattern: URL detection

https?:\/\/[\w.-]+(?:\/[\w\-./?%&=]*)?

For finding URLs in text (e.g. auto-linking), this is plenty. For validating that a string is a well-formed URL, use the URL constructor in JavaScript: new URL(input) throws if the input isn't a valid URL.

Pattern: Phone numbers

Phone formats vary wildly by country. For US numbers in common formats:

^$?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}$

For international numbers, don't use regex. Use Google's libphonenumber library — it knows the rules for every country and is constantly updated as numbering plans change.

Pattern: IPv4 addresses

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

This is the strict version that rejects values like 999.999.999.999. The simpler (?:\d{1,3}\.){3}\d{1,3} matches the structure but accepts invalid octets.

Greedy vs lazy quantifiers

By default, * and + are greedy — they match as much as possible. Adding ? after them (*?, +?) makes them lazy — match as little as possible.

Classic example: extracting the content of an HTML tag.

Input: hello world again
Greedy .*: matches everything from first to last 
Lazy .*?: matches each … pair separately

Lookarounds

Lookaheads and lookbehinds let you assert what comes before or after a match without including it in the match.

foo(?=bar) — “foo” only if followed by “bar”
foo(?!bar) — “foo” only if NOT followed by “bar”
(?<=foo)bar — “bar” only if preceded by “foo”
(?<!foo)bar — “bar” only if NOT preceded by “foo”

When NOT to use regex

HTML/XML: use a real parser (DOMParser, BeautifulSoup, jsdom).
Strict spec compliance: for things like RFC-strict email validation, regex is the wrong tool.
Recursive structures: JavaScript regex cannot match nested parens reliably. Use a parser.
Performance-critical paths: a complex regex can backtrack catastrophically. Test with adversarial input.