|
In this part of the Perl tutorial we are going to see how to make sure we only have distinct values in an array. Perl 5 does not have a built in function to filter out duplicate values from an array but there are several solutions to the problem. Depending on your point of view probably the simplest way is to use uniq function of the List::MoreUtils module. use List::MoreUtils qw(uniq); my @words = qw(foo bar baz foo zorg baz); my @unique_words = uniq @words; The full example is this: use strict; use warnings; use 5.010; use List::MoreUtils qw(uniq); use Data::Dumper qw(Dumper); my @words = qw(foo bar baz foo zorg baz); my @unique_words = uniq @words; say Dumper \@unique_words; The result is:
$VAR1 = [
'foo',
'bar',
'baz',
'zorg'
];
For added fun the same module also provides the distinct function which is just an alias of the uniq function. In order to use this module you'll have to install it from CPAN that I'll explain in another blog post.
Home made uniqIf you cannot install the above module for whatever reason or you think the overhead of loading it is too big there is a very short expression that will do the same:
my @unique = do { my %seen; grep { !$seen{$_}++ } @data };
This of course can look cryptic to someone who does not know it already so it is recommended to define your own uniq subroutine and use that in the rest of the code:
use strict;
use warnings;
use 5.010;
use Data::Dumper qw(Dumper);
my @words = qw(foo bar baz foo zorg baz);
my @unique = uniq( @words );
say Dumper \@unique_words;
sub uniq {
my %seen;
return grep { !$seen{$_}++ } @_;
}
Home made uniq explainedI can't just throw this example here and leave it like that. I'd better explain it. Let's start from the easy part.
my @unique;
my %seen;
foreach my $value (@words) {
if (! $seen{$value}) {
push @unique, $value;
$seen{$value} = 1;
}
}
Here we are using a regular foreach loop to go over the values in the original array, one by one. We use a helper hash called %seen. The nice thing about the hashes is that their keys are unique. We start with an empty hash so when we encounter the first "foo", $seen{"foo"} does not exist and thus its value is undef which is false. Meaning we have not seen this value yet. We push it to the end of the new @uniq array where we are going to collect the distinct values. We also set the value of $seen{"foo"} to 1. It does not matter what, it just needs to be true. The next time we encounter the same string we already have that key in the %seen hash and its value is true so the if condition will fail and we won't push the duplicate values in the resulting array.
Shortening the home made unique functionFirst of all we replace the assignment of 1 $seen{$value} = 1; by the post-increment operator $seen{$value}++. This does not change the behavior of the previous solution - any positive number is going to be evaluated as TRUE but it will allow us to include the setting of the "seen flag" within the if condition. It is important that this is a post-increment as this means the increment only takes place after the boolean expression was evaluated. So the first time we encounter a value the expression will be TRUE and the rest of the times it will be FALSE.
my @unique;
my %seen;
foreach my $value (@data) {
if (! $seen{$value}++ ) {
push @unique, $value;
}
}
This is shorter but we can do even better.
Filtering duplicate values using grepThe grep function in Perl is a generalized form of the well known grep command of Unix. It is basically a filter. You provide an array on the right hand side and an expression in the block. The grep function will take each value of the array one-by-one, put it in $_, the default scalar variable of Perl and then execute the block. If the block evaluates to TRUE, the value can pass. If the block evaluates to FALSE the current value is filtered out. That's how we got to this expression:
my %seen;
my @unique = grep { !$seen{$_}++ } @words;
Wrapping it in doThe last little thing we had there is wrapping the above two statements in either a do block
my @unique = do { my %seen; grep { !$seen{$_}++ } @words };
or, better yet in a function with an expressive name:
sub uniq {
my %seen;
return grep { !$seen{$_}++ } @_;
}
ExerciseGiven the following file print out the unique values: input.txt: foo Bar bar first second Foo foo another foo expected output: foo Bar bar first second Foo another
Exercise 2This time filter out duplicates regardless of case. expected output: foo Bar first second another
Perl tutorial and video courseFor further articles see the Beginner Perl Maven tutorial book and video course.In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation. blog comments powered by Disqus |
Follow me: