Regular expressions

Regular expressions (short: regex) are a powerful tool for text manipulation and search over electronic data. They are implemented in many programming languages and count as the major strength of scripting languages such as Perl or Python as well as the UNIX based operating systems. They allow for search operations as well as text manipulation in the form of search and replace tasks. Most people will be familiar with regular expressions without knowing it from the search facilities supported by their operating systems. A simple example of a regular expression is the use of the so-called asterisk * used when searching for specific file types. Thus, a search for all xml-files on your machine rests on the fact that the file ending (aka extension) marks the file type: example.xml is a file of the file-type xml called example. A search for all xml-files on your machine could look like this:

*.xml

A search for all xml-files whose name starts with the letter e could look like this:

e*.xml

The asterisk is a powerful element in implementations of regular expressions and is used in many operating systems and other software programs as a simple shortcut for 'any character (sequence)'.

Resources for using regular expressions:

Jeffrey Friedl's "Mastering Regular Expressions"

This page provides access to some of our own guidelines and cheat sheets for using regular expressions. However, if you are really interested in the nuts and bolts of this fascinating topic, there is an excellent O'Reilly book by Jeffrey Friedl entitled Mastering Regular Expressions that has taught me more about regexes than I ever thought there was to know.

Please be sure to also check out his personal website regex.info.