Regular Expressions in Perl 5.10

There are many new features in the regular expression engine of Perl 5.10. I point out some of them.

Named captures

I am trying to match a phone number and save the values in variables.

One way to do it is:


  if ($str =~ /^(\d+)-(\d+)-(\d+)$/) {
      $num{country} = $1;
      $num{area}    = $2;
      $num{phone}   = $3;
  }

The new way is


  if ($str =~ /^(?<country>\d+)-(?<area>\d+)-(?<phone>\d+)$/) {
      %num = %+;
  }

Starting from 5.10 we can name the capturing parenthesis and the strings they match will be in the %+ hash using the names of the parenthesis as the keys.

Not only that but we can use these names also instead of the \1, \2 matching buffers y writing \k as in the following example:


  /(?<letters>[a-z]+)-(?<digits>\d+)-\k<letters>-\k<digits>/

Using names will make it much clearer what each pair of parenthesis are matching and will eliminate bugs created when we add or remove a pair that changes the numbering.

For example in this regex:


  /(.)(.)\2\1/

If I want to add a repetition to it I would start writing


  /((.)(.)\2\1){2}/

but this is incorrect and gives a syntax error as now I need to change the numbers of the buffers:


  /((.)(.)\3\2){2}/

Using named buffers even if they are just single letter will solve this problem:


  /(?<c>(?<a>.)(?<b>.)\k<a>\k<b>)/