Sorting arrays in Perl

In this episode of the Perl tutorial we are going to see how we can use sort an array of strings in Perl.

Perl has a built-in function called sort that can, not surprisingly, sort and array. In its most simple form, you just give it an array and it returns the elements of that array in sorted. @sorted = sort @original.


  #!/usr/bin/perl
  use strict;
  use warnings;
  use 5.010;

  use Data::Dumper qw(Dumper);

  my @words = qw(foo bar zorg moo);

  say Dumper \@words;

  my @sorted_words = sort @words;

  say Dumper \@sorted_words;

The above example will print


  $VAR1 = [
          'foo',
          'bar',
          'zorg',
          'moo'
        ];

  $VAR1 = [
          'bar',
          'foo',
          'moo',
          'zorg'
        ];

This is the most simple case but it is not always what you want. For example, what happens if some of the words start with an upper case letter?


  my @words = qw(foo bar Zorg moo);

The result in @sorted_names is


  $VAR1 = [
          'Zorg',
          'bar',
          'foo',
          'moo'
        ];

As you can see the word that starts with an upper-case letter became first. That's because sort by default sorts according to the ASCII table and all the upper case letters are located earlier than the lower case letters.

Comparison function

The way sort works in Perl is that it goes over every two element of the original array. In every turn it puts the value from the left side into the variable $a and the value on the right side in the variable $b. Then calls a "comparison function". That "comparison function" will return 1 if the content of $a should be on the left, -1 if the content of $b should be on the left or 0 if it does not matter as the two values are the same.

By default you don't see this comparison function and it compares the values according to the ASCII table but if you want you can write it explicitly.


  sort { $a cmp $b } @word;

(This is the same as without the block.) Here you can see that be default perl uses cmp in the comparison function. That's because cmp is doing exactly what we need here. It compares the values on its two sides as strings, returns 1 if the left argument is "less than" the right argument; returns -1 if the left argument is "greater than" the right argument; and returns 0 if they are the same.

Sorting in alphabetic order

If you want to disregard the case in the strings - what is usually called alphabetic order - you can do so as given in the next example:


  my @sorted_words = sort { lc($a) cmp lc($b) } @words;

Here, for the sake of comparison, we call the lc function that returns the lower case version of its argument. Then cmp compares those lower case versions and decides which of the original strings must go first and which second.

The result is


  $VAR1 = [
          'bar',
          'foo',
          'moo',
          'Zorg'
        ];

Sorting numbers in Perl

If we take an array of numbers and sort them with the default sorting,


  my @numbers = (14, 3, 12, 2, 23);
  my @sorted_numbers = sort @numbers;
  say Dumper \@sorted_numbers;

the result is probably not what we are expecting:


  $VAR1 = [
          12,
          14,
          2,
          23,
          3
        ];

If you think about it, of course, this is not surprising. When the comparison function sees 12 and 3 it compares them as strings. That means comparing the first character in both strings. "1" to "3". "1" is ahead of "3" in the ASCII table and thus the string "12" will come before the string "3".

Perl does not magically understand that you want to order these values as numbers.

No problem though as we can write a comparison function that will compare the two values as number. For that we use the <=> (also called spaceship operator) that will compare its two parameters as numbers and return 1, -1 or 0.


  my @sorted_numbers = sort { $a <=> $b } @numbers;

Results in:


  $VAR1 = [
          2,
          3,
          12,
          14,
          23
        ];

Exercise

Given a file where each row is a number, create another file that will have the same numbers but in sorted order.

input.txt


   3
   52
   17
   7

output.txt should be


   3
   7
   17
   52


Exercise 2

input.txt


   A1
   B1
   A27
   A3

Each string has a single letter at the beginning and then a number.

Sort them first based on the letter and among values with the same leading letter sort them according to the numbers so the above values would be in the following order:

output.txt


  A1
  A3
  A27
  B1


Perl tutorial and video course

For further articles see the Beginner Perl Maven tutorial book and video course.


In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation.
blog comments powered by Disqus
Online courses:


Would you like to get
updated when I publish
the next article?

Follow me:

Google Plus Twitter RSS feed