home | blog

Regular Expressions in Perl 5.10

Published on 2007.12.24 at 01:36:28 Bookmark and Share

Tags: Perl, Perl 5, 5.10, regular expressions


There are many new features in the regular expression engine of Perl 5.10. I point out some of them.

Named captures

I am trying to match a phone number and save the values in variables.

One way to do it is:

    if ($str =~ /^(\d+)-(\d+)-(\d+)$/) {
        $num{country} = $1;
        $num{area}    = $2;
        $num{phone}   = $3;
    }

The new way is

    if ($str =~ /^(?<country>\d+)-(?<area>\d+)-(?<phone>\d+)$/) {
        %num = %+;
    }

Starting from 5.10 we can name the capturing parenthesis and the strings they match will be in the %+ hash using the names of the parenthesis as the keys.

Not only that but we can use these names also instead of the \1, \2 matching buffers y writing \k as in the following example:

    /(?<letters>[a-z]+)-(?<digits>\d+)-\k<letters>-\k<digits>/

Using names will make it much clearer what each pair of parenthesis are matching and will eliminate bugs created when we add or remove a pair that changes the numbering.

For example in this regex:

    /(.)(.)\2\1/

If I want to add a repetition to it I would start writing

    /((.)(.)\2\1){2}/

but this is incorrect and gives a syntax error as now I need to change the numbers of the buffers:

    /((.)(.)\3\2){2}/

Using named buffers even if they are just single letter will solve this problem:

    /(?<c>(?<a>.)(?<b>.)\k<a>\k<b>)/

blog comments powered by Disqus
Follow szabgab on Twitter
Tags
Perl (270)
Perl 5 (94)
Padre (79)
Perl 6 (42)
IDE (41)
testing (38)
CPAN (28)
business (27)
newsletter (24)
marketing (23)
training (20)
TPF (17)
open source (17)
Windows (17)
promotion (17)
Parrot (16)
YAPC (16)
Israel (15)
grants (15)
Python (14)