Dispersion Design

< Back

Percent Encoding of Characters

2012-05-26

Introduction

Percent encoding is a method of encoding prohibited characters in strings. Percent encoding allows characters to be used in a string that would normally not be able to be represented.

Percent encoding is most often seen in URLs (URIs) and the most commonly encoded character is a space. URLs are not allowed to contain the space character (ASCII character number 32, which is 0x20 in hexadecimal notation), so a space character gets written as '%20'. For example:

http://www.dispersiondesign.com/path containing spaces/

would be encoded as:

http://www.dispersiondesign.com/path%20containing%20spaces/

The following table shows some characters and their percent encoded equivalent:

CharacterASCII valueASCII value (in hex)Percent Encoded
(space)320x30%20
%370x25%25
&380x26%26
,440x2C%2C
.460x2E%2E
?630x3F%3F

Let’s see how to encode and decode percent encoding.

Decoding (Unescaping) Percent Encoding

In programming languages that support regular expressions, such as Perl, PHP and JavaScript, decoding a percent encoded string is a simple substitution operation. First we need a regular expression that locates valid percent encoded character sequences. In URLs, a percent encoded sequence starts with a '%' (percent) character, followed by exactly two characters that can be 0-9, a-f or A-F. In regular expression syntax, we can find two consecutive characters that are 0-9, a-f or A-F with:

[0-9a-fA-F]{2}

Finding these characters with a preceeding '%' character is then simply:

%([0-9a-fA-F]{2})

The percent encoded value is a hexadecimal value, so it needs to be converted to a decimal value. In Perl, this is accomplished using the hex() function:

my $decimal = hex($1);

Then, the resulting decimal value needs to be converted to a character. The function in Perl for this is chr():

my $character = chr($decimal);

Putting this together, the unescaping (decoding) or percent encoding can be performed in Perl with a single line of code:

$str =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/ge;

JavaScript Solution

In JavaScript, the same thing can be performed with the parseInt() and fromCharCode() functions:

var regex = /%([0-9a-fA-F]{2})/g;
str = str.replace(regex, function (str, p1) {
	return String.fromCharCode(parseInt(p1, 16));
});

However, JavaScript has a built-in function called unscape() that can perform the same task:

str = unescape(str);

Encoding (Escaping) with Percent Encoding

Creating a percent encoded string requires that the invalid characters first be defined. For example, if you wish to encode all characters that are not a-z, A-Z and 0-9, you would need a regular expression like the following:

[^0-9a-zA-Z]

Now, in Perl, these characters can be substituted using ord() to get the decimal ASCII value for the character and sprintf() to get the hexadecimal equivalent:

$str =~ s/([^0-9a-zA-Z])/sprintf("%%%02X", ord($1))/ge;

JavaScript Solution

In JavaScript, the solution can be written:

var regex = /[^0-9a-zA-Z]/g;
str = str.replace(regex, function (str) {
	var d = str.charCodeAt(0);
	return (d < 16 ? '%0' : '%') + d.toString(16);
});

JavaScript also has a built-in function called escape() that will percent-encode a string. However, the escape() function does not give you any control over which characters are escaped.

Demo