Special characters: ^ $ \ . * + ? ( ) [ ] { } |
The characters ^ and $ are called anchors:
Parts of a regex can be repeated:
{…} represents a bounded repeat:
Other simple patterns:
The above repeats are greedy because they find the longest match. To make them non-greedy, add ? behind the repeat i.e., *?, +?, ??, {…}?.
RegEx | Maching strings |
---|---|
a*b | a, ab, aab, aaab, etc. |
a+b | ab, aab, aaab, etc. |
a?b | b, ab |
do(es)? | do, does |
o{2} | oo |
o{2,} | oo, ooo, oooo, etc. |
o{1,3} | o, oo, ooo |
[adg] | 'a' or 'd' or 'g' |
[a-z] | any character from 'a' to 'z' |
B[iu]rma | Birma or Burma |
.* | any number of characters other than newline |
\w* | any number of alphanumeric characters |
[^1-6] | any character except the digits from 1 to 6 |
“[^“\r\n]*” | any string enclosed in quotes |
\b(in|out)\b | a word 'in' or 'out' |
\bxxx\b.*\byyy\b | a word 'xxx' followed by 'yyy' |
\ba\w*\b | words that start with the letter 'a' |
\b\w{5,6}\b | five and six letter words |
\b\d{4,5}\b | 4- or 5-digit number |
^\w* | the first word in a line or in the text |
^test | the string 'test' if it is the first string in a line or in the text |
^51|^52 | the strings '51' or '52' if they are the first strings in a line or in the text |
^a{3,4}$ | the strings 'aaa' or 'aaaa' if they are the only strings in a line or in the text |
^test$ | the string 'test' if it's the only string in a line or in the text |
test$ | the string 'test' if a line or the text ends with it |
[.?!] | the punctuation at the end of a sentence; ”.“ and ”?“ lose their special meanings |
[\d]{1,7} | 7-digit number |
(\d+|) | a number or empty |
(\d+|\**|) | a number or asterisks or empty |
Simple examples:
using System.Text.RegularExpressions; ... // Check if all characters are numeric. Regex reg = new Regex(@"^\d+$"); bool b1 = reg.Match("3451").Success; // true bool b2 = reg.Match("34a1").Success; // false // Check if a string has letters a,A,b,B. bool b3 = Regex.IsMatch("acd", "a|b", RegexOptions.IgnoreCase); // true bool b4 = Regex.IsMatch("eBd", "a|b", RegexOptions.IgnoreCase); // true bool b5 = Regex.IsMatch("efg", "a|b", RegexOptions.IgnoreCase); // false // Modify a string. string s1 = Regex.Replace("aib", "a|b", "X"); // XiX
Example: Extract email addresses from a string:
public List<string> GetEmails(string str) { const string Pattern = @"[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}"; List<string> emails = new List<string>(); Regex reg = new Regex(Pattern, RegexOptions.IgnoreCase); MatchCollection matches = reg.Matches(str); foreach (Match m in matches) emails.Add(m.Value); return emails; } ... string emailsStr = "leon@micro.com john@ Mel@kata.ca Phil@@lego.com"; List<string> emails = GetEmails(emailsStr); // emails = { "leon@micro.com", "Mel@kata.ca" }
Example: Use named groups to match patterns:
string str = @"Leon's GDI+ GDI++ +GDU Mike'ss Kale's Chloe''s"; Regex re = new Regex(@"((?<GroupPlusSign>\w+\+)|(?<GroupApostrophe>\w+'s))(?=(\s|$))", RegexOptions.None); MatchCollection matches = re.Matches(str); // 3 matches: Leon's GDI+ Kale's int i = 1; foreach (Match m in matches) { Console.WriteLine($"Match #{i}: " + $"GroupPlusSign={m.Groups["GroupPlusSign"].Value}, " + $"GroupApostrophe={m.Groups["GroupApostrophe"].Value}"); ++i; }
Output:
Match #1: GroupPlusSign=, GroupApostrophe=Leon's Match #2: GroupPlusSign=GDI+, GroupApostrophe= Match #3: GroupPlusSign=, GroupApostrophe=Kale's
Example: “hello out there, how are you”
Pattern | Description | Matching substring |
---|---|---|
h.*o | find an 'h', followed by multiple arbitrary characters (even if they are 'o'), followed by an 'o' | “hello out there, how are yo” |
Make * ungreedy:
Pattern | Description | Matching substring |
---|---|---|
h.*?o | find an 'h', followed by multiple arbitrary characters, followed by the first occurence of 'o' | “hello” |
Backreferences are used to capture matches for later reuse.
Pattern | Description |
---|---|
(exp) | match exp and capture it in an automatically numbered group |
(?<name>exp) | match exp and capture it in a named group |
(?:exp) | match exp, but do not capture it |
\b(\w+)\b\s*\b\1\b | match repeated words; uses an automatically numbered group #1 (\w+) |
\b(?<Word>\w+)\b\s*\k<Word>\b | match repeated words; uses a named group 'Word' |
(\w+)\s*=\s*(.*?)\s*$ | name=value pairs; name is in $1, value is in $2 (note: we make * ungreedy by using *?) |
Example: “Today is monday the 18th.”
Pattern | Description |
---|---|
[0-9]+th | '18th' is matched |
(?:[0-9]+)th | '18th' is matched (avoiding capturing with the ?: operator) |
([0-9]+)th | '18th' is matched and '18' is captured in $1 |
Named groups - .NET syntax:
string str = @"leon's GDI+ GDI++ +GDU Hello9's hi'ss ho''s"; Regex re = new Regex(@"((?<GroupPlusSign>\w+\+)|(?<GroupApostropheS>\w+'s))(?=(\s|$))", RegexOptions.None); MatchCollection matches = re.Matches(sb.ToString()); foreach (Match m in matches) { string word1 = m.Groups["GroupPlusSign"].Value; // matches GDI+ string word2 = m.Groups["GroupApostropheS"].Value; // matches leon's and Hello9's }
Pattern | Examples / Comments | |
---|---|---|
Email-1 | \w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3} | |
Email-2 | [\w\.-]+(\+[\w-]*)?@([\w-]+\.)+[\w-]+ | |
Phone-1 | ([+]|)(([0-9]+)([-|\s]|))* | 1-222-345 345-564321, 33211, 34-23-67 |
Phone-2 | ([+]|)([0-9]|)(\s|\-|)[\d]{3}(\s|\-|)[\d]{3}(\s|\-|)[\d]{4} | +1-111-222-3333, 453 678 9900 |
Phone-3 | /^([\d]{3}|[\d]{3}-[\d]{1,3}|[\d]{3}-[\d]{1,3}-[\d]{1,4})$/ | |
Phone-4 | [\d]{3}[-|\s]?[\d]{3}[-|\s]?[\d]{4} | 905-234-3422 |
Phone-5 | (1|\+1)?[- .]?(\([0-9]\d{2}\)|[0-9]\d{2})[- .]?\d{3}[- .]?\d{4} | |
Canadian postal code | [a-zA-Z][0-9][a-zA-Z](\s|\-|)[0-9][a-zA-Z][0-9] | |
Money | ^\$?\d{1,3}((,?\d{3})*(\.\d{2})?|(\.?\d{3})*(,\d{2})?)$ | |
Domain name | [a-z0-9][-a-z0-9]*(\.[-a-z0-9]+)*\.[a-z]{2,6} | |
IP address-1 | \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b | |
IP address-2 | ((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?) | Note: enforcing N<256 arithmetically is not possible with RegExp |
IP address-3 | (\d{1,3}\.){3}\d{1,3} | allows numbers to be greater than 255 |
Number [0-255] | ^(?:(?:25[0-5]|2[0-4]\d|[01]\d\d|\d?\d)) | |
Number [0-255] | ^(25[0-5]|2[0-4]\d|[01]\d\d|\d?\d) | no optimization |
An identifier in a programming language | [A-Za-z_][A-Za-z0-9_]* | |
C-style hexadecimal number | 0[xX][A-Fa-f0-9]+ | |
A sequence of digits | ^\d+$ | mandatory |
A sequence of digits | ^\d*$ | optional i.e. an empty string is allowed |
Padding spaces | ^\s+|\s+$ | |
An HTML tag | <[A-Za-z][A-Za-z0-9]*> | |
A generic tag | <[^>]+> | greedy '+' and a negated character class |
A generic tag | <.+?> | slower - '+' is lazy instead of greedy |
A number between 1000 and 9999 | \b[1-9][0-9]{3}\b | |
A number between 100 and 99999 | \b[1-9][0-9]{2,4}\b |
URL (IP):
^(http\://|https\://|ftp\://|)((([a-z_0-9\-]+)+(([\:]?)+([a-z_0-9\-]+))?)(\@+)?)?(((((([0-1])?([0-9])?[0-9])|(2[0-4][0-9])|(2[0-5][0-5])))\.(((([0-1])?([0-9])?[0-9])|(2[0-4][0-9])|(2[0-5][0-5])))\.(((([0-1])?([0-9])?[0-9])|(2[0-4][0-9])|(2[0-5][0-5])))\.(((([0-1])?([0-9])?[0-9])|(2[0-4][0-9])|(2[0-5][0-5]))))|((([a-z0-9\-])+\.)+([a-z]{2}\.[a-z]{2}|[a-z]{2,4})))(([\:])(([1-9]{1}[0-9]{1,3})|([1-5]{1}[0-9]{2,4})|(6[0-5]{2}[0-3][0-6])))?$
URL (port and IP allowed)
^(((ht|f)tp(s?))\://)?((([a-zA-Z0-9_\-]{2,}\.)+[a-zA-Z]{2,})|((?:(?:25[0-5]|2[0-4]\d|[01]\d\d|\d?\d)(?(\.?\d)\.)){4}))(:[a-zA-Z0-9]+)?(/[a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~]*)?$
Date:
^((((((0?[13578])|(1[02]))\-((0?[1-9])|([1-2][0-9])|(3[01])))|(((0?[469])|(11))\-((0?[1-9])|([1-2][0-9])|(30)))|(0?2\-((0?[1-9])|([1-2][0-9]))))\-\d{2}(([02468][048])|([13579][26])))|(((((0?[13578])|(1[02]))\-((0?[1-9])|([1-2][0-9])|(3[01])))|(((0?[469])|(11))\-((0?[1-9])|([1-2][0-9])|(30)))|(0?2\-((0?[1-9])|(1[0-9])|(2[0-8]))))\-\d{2}(([02468][1235679])|([13579][01345789]))))$
Time:
^(\s(((0?[1-9])|(1[0-9])|(2[0-3])|(0)|(00))\:([0-5][0-9])(|(\s)|(\:([0-5][0-9])))))?$
Regular expressions can be used for string-related operations such as: