Dispersion Design

< Back

Parsing Times From Strings

Written: 2012-04-11

Introduction

Times can be written in either 12-hour or 24-hour format, with the seconds value being optional. The minutes value can also be omitted in 12-hour format. The table below shows a variety of possible formats that times are commonly written.

DescriptionExample
12-hour format11:34:12 pm
24-hour format18:02:49
Omitted seconds1:30 pm
Omitted minutes and seconds8 am
Leading zero for hours value08:00
Using 'h' as a separator19h15
Alternative am/pm format8:10a

It is sometimes necessary to be able to search though a text string and identify a time, regardless of the format in which it is written.

Finding Valid Number Values

First we would like to write regular expressions to detect valid hours, minutes and seconds values. Hours written in 12-hour format can be written with one digit (1-9) or two digit (01-09) and must be between 1 and 12:

# 1-9, 01-09, 10-12
my $h12_ex = '0?[1-9]|1[0-2]';

Hours written in 24-hour format can also be one digit or two digits, but must be between 0 and 23:

# 0-9, 00-09, 10-23
my $h24_ex = '0?[0-9]|1[0-9]|2[0-3]';

Minutes will always be two digits and can be between 00 and 59:

# 00-59
my $min_ex = '[0-5][0-9]';

Seconds, if present, will have the same format as the minutes.

24-hour Time Format

24-hour time format has a required hours value and minutes value and an optional seconds value. The separator between the hours and minutes can be a ':' or 'h'. The separator between the minutes and seconds must be a ':'.

my $t24_ex = "($h24_ex)[h:]($min_ex)(?::($min_ex))?";

12-hour Time Format

12-hour time format is a little more complicated than 24-hour format. 12-hour time format has a required hours value and an am/pm indicator. Minutes and seconds are optional. The separator between hours and minutes can be ':' or 'h'. The separator between minutes and seconds must be ':'. A separator between the time and the am/pm indicator is optional and can be a '-' or white space:

my $t12_ex = "($h12_ex)(?:[h:]($min_ex)(?::($min_ex))?)?[-\\s]*([ap])m?";

Time Spacing Characters

When searching for a time within a string, the time does not always occur at the beginning or end of the string. We need to make sure that if the time is written within a larger string, that it is delimited by a valid character. If we did not check for this, then the algorithm would determine that the string "need 4 amplifiers" contains "4 am" and interpret it as a time. Valid delimiters are a hyphen (-), white space (\s), period (.), comma (,), apostrophe (') or straight quotes ("). If necessary, the set of delimiters could easily be expanded, depending on the application.

my $t_delim_ex = '[-\s.,\'"]';

Performing the Search

When searching for a time within a string, we will first check for a 12-hour formatted time. If a 12-hour formatted time is found, the hours value is immediately converted to 24-hour representation.

if($str =~ /^(?:.*?$t_delim_ex)??$t12_ex(?:$t_delim_ex.*)?$/)
{
	$h = $1;
	$m = $2;
	$s = $3;

	my $am_pm = $4;

	if(!defined $m){ $m = 0; }
	if(!defined $s){ $s = 0; }
	if($h >= 12)
	{
		$h = 0;
	}
	if($am_pm eq 'p')
	{
		$h += 12;
	}
}

If a 12-hour formatted time is not found, we check for a 24-hour formatted time:

elsif($str =~ /^(?:.*?$t_delim_ex)??$t24_ex(?:$t_delim_ex.*)?$/)
{
	$h = $1;
	$m = $2;
	$s = $3;

	if(!defined $s){ $s = 0; }
}

Putting It Together

The complete Perl code for finding a time within a string is now:

my ($h, $m, $s);

my $h12_ex = '0?[1-9]|1[0-2]';          # 1-9, 01-09, 10-12
my $h24_ex = '0?[0-9]|1[0-9]|2[0-3]';   # 0-9, 00-09, 10-23
my $min_ex = '[0-5][0-9]';              # 00-59
my $t12_ex = "($h12_ex)(?:[h:]($min_ex)(?::($min_ex))?)?[-\\s]*([ap])m?";
my $t24_ex = "($h24_ex)[h:]($min_ex)(?::($min_ex))?";
my $t_delim_ex = '[-\s.,\'"]';

# 12-hour time (11a, 5:12 pm, etc)
if($str =~ /^(?:.*?$t_delim_ex)??$t12_ex(?:$t_delim_ex.*)?$/)
{
	$h = $1;
	$m = $2;
	$s = $3;

	my $am_pm = $4;

	if(!defined $m){ $m = 0; }
	if(!defined $s){ $s = 0; }
	if($h >= 12)
	{
		$h = 0;
	}
	if($am_pm eq 'p')
	{
		$h += 12;
	}
}
# 24 hour time (13:45, 09h15, etc)
elsif($str =~ /^(?:.*?$t_delim_ex)??$t24_ex(?:$t_delim_ex.*)?$/)
{
	$h = $1;
	$m = $2;
	$s = $3;

	if(!defined $s){ $s = 0; }
}

Other Programming Languages

It should be straightforward to convert this algorithm to other programming languages, provided the language supports regular expressions. The equivalent algorithm in JavaScript would be written:

var h, m, s;

var h12_ex = '0?[1-9]|1[0-2]';          // 1-9, 01-09, 10-12
var h24_ex = '0?[0-9]|1[0-9]|2[0-3]';   // 0-9, 00-09, 10-23
var min_ex = '[0-5][0-9]';              // 00-59
var t12_ex = '(' + h12_ex + ')(?:[h:](' + min_ex + ')(?::(' +
	min_ex + '))?)?[-\\s]*([ap])m?';
var t24_ex = '(' + h24_ex + ')[h:](' + min_ex + ')(?::(' +
	min_ex + '))?';
var t_delim_ex = '[-\\s.,\'"]';

// 12-hour time (11a, 5:12 pm, etc)
var regex = new RegExp('^(?:.*?' + t_delim_ex + ')??' + t12_ex +
			'(?:' + t_delim_ex + '.*)?$');
var match = regex.exec(str);
if (match) {
	h = parseInt(match[1], 10);
	m = match[2];
	s = match[3];

	var am_pm = match[4];

	m = m ? parseInt(m, 10) : 0;
	s = s ? parseInt(s, 10) : 0;
	if (h >= 12) {
		h = 0;
	}
	if (am_pm === 'p') {
		h += 12;
	}
} else {
	// 24 hour time (13:45, 09h15, etc)
	regex = new RegExp('^(?:.*?' + t_delim_ex + ')??' +
		t24_ex + '(?:' + t_delim_ex + '.*)?$');
	match = regex.exec(str);
	if (match) {
		h = parseInt(match[1], 10);
		m = parseInt(match[2], 10);
		s = match[3];

		s = s ? parseInt(s, 10) : 0;
	}
}

Demo