Dispersion Design

< Back

Escaping Characters in Perl

2012-05-22

Introduction

I had been programming with Perl for many years before I actually took the time to understand what the rules are for escaping characters. The rules differ for 'single quoted strings', "double quoted strings", /regular expressions/ and [character classes]. This article will explain the escaping rules for each case.

Summary

Here is a summary of all the rules:

UsagesCharacters Needing Escaping
Single Quoted Strings
'    - apostrophe/single quote
\    - backslash
Double Quoted Strings
"    - double quote
$    - dollar
@    - at symbol
\    - backslash
Regular Expressions
^    - caret
$    - dollar
.    - period/full stop
?    - question mark
*    - asterisk
+    - plus
[    - left square bracket
(,)  - parentheses/round brackets
|    - pipe/vertical bar
\    - backslash
When they can be confused with a range:
{,}  - curly brackets
When entered directly into regex:
@    - at symbol
/    - forward slash
Character Classes
[,]  - square brackets
\    - backslash
Unless the first or last character in class:
-    - dash or hyphen
If the first character in class:
^    - caret

Single Quoted Strings

Single Quotes (')

Within a single-quoted string, it is required to escape single quote characters with a backslash (\').

my $str = 'You can\'t do this without a backslash.';

Result:

You can't do this without a backslash.

Backslashes (\)

Escaping a backslash with another backslash is optional, but allowed. Two consecutive backslashes WILL be interpreted as a single backslash. Also, you must escape a backslash if it is the last character in the string.

my $str = 'Escaping a \ is optional (\\), except at the end \\';

Result:

Escaping a \ is optional (\), except at the end \

No other characters need to be escaped within a single quoted string.

Single Quoted Summary

RuleCharactersExampleResult
Single quotes''Did\'t'Didn't
Backslashes (sometimes)\'c:\test\\'c:\test\

Double Quoted Strings

Double-quoted strings behave quite differently from single-quoted strings in Perl. This may come as a surprise to someone familiar with other programming languages, such as JavaScript.

Double Quotes (")

To enter a double quote within a double-quoted string, you must escape it with a backslash:

my $str = "He said, \"Not a chance!\"";

Result:

He said, "Not a chance!"

Scalar Variables ($, @)

Double quoted strings allow variable interpolation, so the '$' and '@' characters must be escaped so that they are not confused with variables:

my $cost = "25 items \@ \$1.90 each, 62% profit.";

Result:

25 items @ $1.90 each, 62% profit.

Backslashes (\)

Double quoted strings also allow special characters to be inserted using escape sequences that begin with a backslash. Regular backslashes must always be escaped to avoid them being interpreted as an escape sequence.

my $str = "Create a single \\ with two backslashes";

Result:

Create a single \ with two backslashes

Double Quoted Summary

RuleCharactersExampleResult
Double quotes""\"OK?\"""OK?"
Variable characters$, @"\$1.90 each"$1.90 each
Backslashes\"c:\\files\\"c:\files\

Regular Expressions

Escaping rules become more difficult with regular expressions because there are more characters that are used for special purposes. In total, there are eleven characters that must be escaped within a regular expression. The following table lists each character, along with it’s non-escaped use.

CharacterNon-Escaped Use
1^Start of string or line
2$End of string or line
3.Any single character
4?Zero or one occurrence of previous character
5*Zero or more occurrences of previous character
6+One or more occurrences of previous character
7[Bracketed character class
8, 9(, )Group items
10|Alternation (OR operator)
11\Escape next character

The following characters must be escaped only when they can be confused with their special use case:

CharacterNon-Escaped UseExample
{, }Range quantifier{2,5}

Finally, when entering a Perl regular expression directly, you must escape any characters that would be interpreted as variables or the end of expression.

CharacterNon-Escaped Use
$, @Scalar variable
/End of regex

Forward-Slash Example (/)

The following example shows how to remove an HTML tag containing a forward slash. The forward slash must be escaped because it is entered directly into the regular expression.

my $str = 'Some text</p>';
$str =~ s/<\/p>//;

We could avoid having to escape the forward slash if we defined the regular expression as a variable first. This is because forward slashes do not need to be escaped in either single quoted or double quoted strings and only need to be escaped in regular expressions when entered directly:

my $str = 'Some text</p>';
my $regex = '</p>';
$str =~ s/$regex//;

Back-Slash Example (\)

Searching for backslashes can get quite tricky, as the backslashes sometimes need to be escaped twice. For example, if we want to replace all backslashes with forward slashes, both the backslash and the forward slash must be escaped:

$str =~ s/\\/\//g;	# Replace all \ with /

If we pre-defined the search string using a double-quoted string, we would have to double escape the backslash. The same substitution becomes:

my $regex = "\\\\";		# $regex contains \\
$str =~ s/$regex/\//g;		# Replace all \ with /

A Complex Example

Replacing $10 with £6.32 requires the following:

ConditionFindReplace
Desired text$10.00£6.32
Regex escaped\$10\.00£6.32
Single-quoted string'\$10\.00'not possible
Double-quoted string"\\\$10\\.00""\x{a3}6.32"

Implementing the double-quoted string option in Perl code:

my $str = 'It costs $10.00';
my $find = "\\\$10\\.00";	# contains \$10\.00
my $replace = "\x{a3}6.32";	# contains £6.32

binmode STDOUT, ":utf8";
print "$str\n";
print "find: $find\n";
print "replace: $replace\n";

$str =~ s/$find/$replace/;

print "$str\n";

Results in the following:

It costs $10.00
find: \$10\.00
replace: £6.32
It costs £6.32

Character Classes

Character classes define a set of characters to be used within a regular expression. They are delimited with square brackets, such as [a-zA-F]. Even though character classes are found within regular expressions, they have different escaping rules than regular expressions.

Character Range (-)

The hyphen (-) character is used to indicate a range of characters. For example, [0-9] means any digit from 0 to 9. If you wish to create a character class that includes a hyphen, it should be escaped to avoid it begin interpreted as a range.

A hyphen as the first character or last character in a character class does not need to be escaped, as it would not create a valid range. For example:

[0-9]		# Any digit from 0 to 9
[0\-9]		# '0', '-' or '9' characters
[-0-9]		# '-' character or digit from 0 to 9
[0-9-]		# '-' character or digit from 0 to 9

Set Inversion or Negation (^)

A caret (^) character at the beginning of a character class is used to invert the set. For example: [^0-9] means any non-digit character. If you wish to use the caret character at the beginning of a character class, it must be escaped.

A caret anywhere other than the first character in a character class does not need to be escaped.

[^0-9]		# Any non-digit
[\^0-9]		# '^' character or any digit
[0-9^]		# '^' character or any digit

Character Class Delimiters ([, ])

The character class delimiters themselves should be escaped:

[\[\]]		# Match any '[' or ']' character

The Escape Character (\)

The backslash escape character should be escaped:

[\\]		# Match any '\' character

Character Class Summary

The following table shows the characters that need to be escaped within a character class and their non-escaped usage:

CharacterNon-Escaped Use
-Character range, unless located at the beginning or end of the set
^Invert or negate set (only if found at the beginning of set)
[, ]Start or end of character class
\Escape character