Fetching a Query String in Perl
2012-05-25
Introduction
When developing web applications in Perl, it is common to use the CGI module to decode any query string variables that are passed from the browser to the script. There are times, however, that you may wish to avoid using the CGI module. As it happens, decoding variables from a query string is quite simple to do. This article will explain how to correctly fetch query string key/value pairs.
Limitations of the CGI Module
There are several reasons why you might want to decode your own query string variables in your script and avoid using the CGI module:
1. The CGI module is bloated
The CGI module is fairly large and is overkill in many scripts.
2. The CGI module does not handle Unicode correctly
The CGI module makes no attempt to interpret the character set that was
used by the browser to encode the text. If you are using Unicode and UTF-8
encoding consistently — and you should be — then you will have
to remember to use the decode_utf8()
function on any data you
get from the CGI::param()
function. Instead, lets have our
script interpret the data as UTF-8 automatically.
2. The CGI module does not differentiate between query string variables and POST variables
Sometimes it is useful to be able to use the content of a query string variable before processing the POST data. For example, you might want to check a verification variable, ensuring that the user has the appropriate permission to send POST data to the script. The CGI module does not differentiate between query string variables and POST variables and decodes both at the same time.
Fetching the Query String
The query string, containing our key/value pairs, is made available to a
Perl script through the QUERY_STRING
environment variable. We
can get the value of this variable in the following way:
my $query_string = $ENV{'QUERY_STRING'};
If the script is run from the command line instead of as a CGI script, it might be useful to pass variables as a command line argument. We can create a special case for this:
my $query_string = ''; if ($ENV{'REQUEST_METHOD'}) { $query_string = $ENV{'QUERY_STRING'}; } elsif ($ARGV[0]) { $query_string = $ARGV[0]; }
Splitting Up The Pairs
The query string will normally use the following format:
key1=value1&key2=value2&key3=value3
However, there is another, less used, format that we must also be able to handle:
key1=value1;key2=value2;key3=value3
The perl split
function will easily split the query string
into separate variables:
my @pairs = split(/[&;]/, $query_string);
Splitting pairs into keys and values
We now have an array of strings that look like this:
key1=value1 key2=value2 key3=value3
The next task is to step through the array and split up the key/value pairs:
foreach(@pairs) { my($key, $value) = split(/=/, $_, 2); # more processing needed here... }
At this point, we have the raw key and value strings that need a little more processing.
Decoding Key/Value Strings
Each key and value will be encoded according to the URI specs. Several steps need to be performed to decode the strings.
Firstly, any pluses (+) should be converted back to spaces.
$key =~ tr/+/ /; $value =~ tr/+/ /;
Secondly, any 'percent-encoded' bytes need to be converted back to their non-encoded equivalent.
$key =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/ge; $value =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/ge;
Now it is time to interpret the character set of the resulting strings. We will assume that it is encoded as UTF-8, and will decode it as such:
use Encode; $key = Encode::decode_utf8($key); $value = Encode::decode_utf8($value);
We can then put the resulting values into an associative array (a hash):
$param{$key} = $value;
Putting It Together
Putting all these steps into a single function results in:
use Encode; sub fetch_cgi_variables { my $query_string = ''; if ($ENV{'REQUEST_METHOD'}) { $query_string = $ENV{'QUERY_STRING'}; } elsif ($ARGV[0]) { $query_string = $ARGV[0]; } my %param; my @pairs = split(/[&;]/, $query_string); foreach (@pairs) { my($key, $value) = split(/=/, $_, 2); next if !defined $key; $key =~ tr/+/ /; $key =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/ge; $key = Encode::decode_utf8($key); next if ($key eq ''); if (defined $value) { $value =~ tr/+/ /; $value =~ s/%([0-9a-fA-F]{2})/chr(hex($1))/ge; $value = Encode::decode_utf8($value); } $param{$key} = $value; } return \%param; }
A careful reader will notice that there are a couple of additional statements added to this finished code to prevent undefined keys and avoid warnings if a value is undefined.
Understanding Character Sets for Form Data
Unfortunately, browsers do not always encode form data as UTF-8. Instead, they will encode the form data using the same character set as the document that contains the form. In other words, if your HTML is encoded as UTF-8 (and specifically indicates this), then your form data will also be encoded as UTF-8.
The moral to this story is: USE UTF-8 CONSISTENTLY. Make sure your HTML pages are encoded using UTF-8 and make sure that your server indicates this in the HTTP header:
Content-Type: text/html; charset=UTF-8
Additionally, it is good to also set the character set in the HTML header using the http-equiv meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Following these steps will ensure that data does not get garbled due to incorrect character sets when transmitting from the browser to the script.