User Tools

Site Tools


Regular Expressions in C#

Special characters: ^ $ \ . * + ? ( ) [ ] { } |

The characters ^ and $ are called anchors:

  • ^ matches the beginning of the string
  • $ matches the end of the string; the void after the last character

Parts of a regex can be repeated:

  • * matches the preceding part zero or more times; * is equivalent to {0,}
  • + matches the preceding part one or more times; + is equivalent to {1,}
  • ? matches the preceding part zero or one time; ? is equivalent to {0,1}

{…} represents a bounded repeat:

  • a{n} matches 'a' repeated exactly n times
  • a{n,} matches 'a' repeated n or more times
  • a{n,m} matches 'a' repeated between n and m times inclusive

Other simple patterns:

  • . - any single character except the newline character
  • \s - any whitespace character
  • \S - any character that isn't a whitespace
  • \b - a word boundary
  • \B - any position that isn't a word boundary

The above repeats are greedy because they find the longest match. To make them non-greedy, add ? behind the repeat i.e., *?, +?, ??, {…}?.

RegEx Maching strings
a*b a, ab, aab, aaab, etc.
a+b ab, aab, aaab, etc.
a?b b, ab
do(es)? do, does
o{2} oo
o{2,} oo, ooo, oooo, etc.
o{1,3} o, oo, ooo
[adg] 'a' or 'd' or 'g'
[a-z] any character from 'a' to 'z'
B[iu]rma Birma or Burma
.* any number of characters other than newline
\w* any number of alphanumeric characters
[^1-6] any character except the digits from 1 to 6
“[^“\r\n]*” any string enclosed in quotes
\b(in|out)\b a word 'in' or 'out'
\bxxx\b.*\byyy\b a word 'xxx' followed by 'yyy'
\ba\w*\b words that start with the letter 'a'
\b\w{5,6}\b five and six letter words
\b\d{4,5}\b 4- or 5-digit number
^\w* the first word in a line or in the text
^test the string 'test' if it is the first string in a line or in the text
^51|^52 the strings '51' or '52' if they are the first strings in a line or in the text
^a{3,4}$ the strings 'aaa' or 'aaaa' if they are the only strings in a line or in the text
^test$ the string 'test' if it's the only string in a line or in the text
test$ the string 'test' if a line or the text ends with it
[.?!] the punctuation at the end of a sentence; ”.“ and ”?“ lose their special meanings
[\d]{1,7} 7-digit number
(\d+|) a number or empty
(\d+|\**|) a number or asterisks or empty

Simple examples:

using System.Text.RegularExpressions;
// Check if all characters are numeric.
Regex reg = new Regex(@"^\d+$");
bool b1 = reg.Match("3451").Success; // true
bool b2 = reg.Match("34a1").Success; // false
// Check if a string has letters a,A,b,B.
bool b3 = Regex.IsMatch("acd", "a|b", RegexOptions.IgnoreCase); // true
bool b4 = Regex.IsMatch("eBd", "a|b", RegexOptions.IgnoreCase); // true
bool b5 = Regex.IsMatch("efg", "a|b", RegexOptions.IgnoreCase); // false
// Modify a string.
string s1 = Regex.Replace("aib", "a|b", "X"); // XiX

Example: Extract email addresses from a string:

public List<string> GetEmails(string str)
    const string Pattern = @"[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}";
    List<string> emails = new List<string>();
    Regex reg = new Regex(Pattern, RegexOptions.IgnoreCase);
    MatchCollection matches = reg.Matches(str);
    foreach (Match m in matches)
    return emails;
string emailsStr = " john@";
List<string> emails = GetEmails(emailsStr); // emails = { "", "" }

Example: Use named groups to match patterns:

string str = @"Leon's GDI+ GDI++ +GDU Mike'ss Kale's Chloe''s";
Regex re = new Regex(@"((?<GroupPlusSign>\w+\+)|(?<GroupApostrophe>\w+'s))(?=(\s|$))", RegexOptions.None);
MatchCollection matches = re.Matches(str); // 3 matches: Leon's GDI+ Kale's
int i = 1;
foreach (Match m in matches)
    Console.WriteLine($"Match #{i}: " +
        $"GroupPlusSign={m.Groups["GroupPlusSign"].Value}, " +


Match #1: GroupPlusSign=, GroupApostrophe=Leon's
Match #2: GroupPlusSign=GDI+, GroupApostrophe=
Match #3: GroupPlusSign=, GroupApostrophe=Kale's

Common Patterns

Pattern Examples / Comments
Email-1 \w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3}
Email-2 [\w\.-]+(\+[\w-]*)?@([\w-]+\.)+[\w-]+
Phone-1 ([+]|)(([0-9]+)([-|\s]|))* 1-222-345 345-564321, 33211, 34-23-67
Phone-2 ([+]|)([0-9]|)(\s|\-|)[\d]{3}(\s|\-|)[\d]{3}(\s|\-|)[\d]{4} +1-111-222-3333, 453 678 9900
Phone-3 /^([\d]{3}|[\d]{3}-[\d]{1,3}|[\d]{3}-[\d]{1,3}-[\d]{1,4})$/
Phone-4 [\d]{3}[-|\s]?[\d]{3}[-|\s]?[\d]{4} 905-234-3422
Phone-5 (1|\+1)?[- .]?(\([0-9]\d{2}\)|[0-9]\d{2})[- .]?\d{3}[- .]?\d{4}
Canadian postal code [a-zA-Z][0-9][a-zA-Z](\s|\-|)[0-9][a-zA-Z][0-9]
Money ^\$?\d{1,3}((,?\d{3})*(\.\d{2})?|(\.?\d{3})*(,\d{2})?)$
Domain name [a-z0-9][-a-z0-9]*(\.[-a-z0-9]+)*\.[a-z]{2,6}
IP address-1 \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b
IP address-2 ((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?) Note: enforcing N<256 arithmetically is not possible with RegExp
IP address-3 (\d{1,3}\.){3}\d{1,3} allows numbers to be greater than 255
Number [0-255] ^(?:(?:25[0-5]|2[0-4]\d|[01]\d\d|\d?\d))
Number [0-255] ^(25[0-5]|2[0-4]\d|[01]\d\d|\d?\d) no optimization
An identifier in a programming language [A-Za-z_][A-Za-z0-9_]*
C-style hexadecimal number 0[xX][A-Fa-f0-9]+
A sequence of digits ^\d+$ mandatory
A sequence of digits ^\d*$ optional i.e. an empty string is allowed
Padding spaces ^\s+|\s+$
An HTML tag <[A-Za-z][A-Za-z0-9]*>
A generic tag <[^>]+> greedy '+' and a negated character class
A generic tag <.+?> slower - '+' is lazy instead of greedy
A number between 1000 and 9999 \b[1-9][0-9]{3}\b
A number between 100 and 99999 \b[1-9][0-9]{2,4}\b



URL (port and IP allowed)






General Info

Regular expressions can be used for string-related operations such as:

  • Validation: Check if an input string is well-formed.
  • Parsing: Extract information from an input string.
  • Transformation: Search substrings and replace them with a new substring.
  • Iteration: Search all occurrences of a substring.
  • Tokenization: Split a string into substrings.
notes/csharp/regex.txt · Last modified: 2018/12/03 by leszek