jim.shamlin.com

Regular Expressions

Here's a quick reference to regular expressions ... by no means comprehensive, just enough to jog my memory (or supplement it).


Basics

$string =~ m/abc/

Searches for the pattern "abc" within the specified string and returns true if it is found, without altering the original string. The operator (m) is optional (understood to be matching unless something else is specified).

$string =~ s/ab/cd/

Substitutes 'cd' for 'ab' in the specified string.

$string =~ tr/[a,b]/[c,d]/

Substitutes 'c' for 'a' and 'b' for 'd' in the specified string.


Options

The options are specified after the last '/' in the expression:

iIgnore case ('a' will match 'a' or 'A')
sIgnore line breaks and treat as a single line
mHeed line breaks and treat string as multiple lines
xIgnore any white space
gMatch all occurrences (substitution and transliteration only)
oMatch only the first (left-most) occurrence in the string (substitution and transliteration only)

Wildcard

A period (.) is a wildcard that will match any character. To search for an actual period, it must be escaped with a slash \.).


Alternation

A pipe (|) can be used to specify alterate patterns:

$string =~ m /Tom|Dick|Harry/;

This will return true if the expression finds any of those three names in the string.


Multipliers

a+One or more instance of 'a'
a*Zero or more of instances of 'a'
a?Zero or one instance of 'a'
a{3}Exactly three instances of 'a'
a{3,}Three or more instances of 'a'
a{3,5}Between three and five instances of 'a'

Sets of Characters

[abc]Either 'a' or 'b' or 'c'
[^abc]Anything that is not an 'a' or 'b' or 'c'
[a-z]Any lowercase character
[A-Z]Any uppercase character
[A-F]Any uppercase character from A to F
[0-9]Any numeric character

Memory

Parentheses are used for memory, either once or multiple times, to store the patterns matched in variables $1, $2, $3, etc. For example:

$string =~ m/([0-9])/;

After running this, the actual pattern matched will be stored in a variable named $1


Metacharacters

There are numerous "metacharacters" that stand for other characters (or sets of characters):

\"Escapes" the next character - such that \. searches for a period (not a wildcard) and \[ searches for an open bracket
\AIndicates the beginning of the string
\ZIndicates the end of the string
\sA space character
\tA tab character
\nA newline character
\rA return character
\fA line-feed character
\dAny "digit" character - number, decimal, or hyphen
\DAny "non- digit" character - opposite of \d
\wAny "word" character - letter, digit, hyphen, and underscore
\WAny "non-word" character - opposite of \w

There are many other metacharacters - for finding backspaces, "bell" characters, vertical tabs, and other such things. These are just the most useful for Web/CGI work.


And finally

This is online for my own use and reference, but feel to snag it if you think it would be useful. It's a trifle and I don't expect to be credited or compensated in any way ... but nor does it come with any sort of guarantee.