2012/04/06

use Perl; Guide to references: Part 1

Understanding references and their subtleties in Perl is one of the more difficult concepts to fully wrap one's head around. However, once they are fully understood by the blossoming developer, they find a whole new level of capability and power to exploit and explore.

I often see newer programmers struggle with the concept of references on the Perl help sites I frequent. Some still have a ways to go, but many are at the stage where perhaps one more tutorial may push them over the edge and give them that 'Ahhhh' moment of clarity. My moment of clarity came when I read Randal Schwartz's "Learning Perl Objects, References & Modules" book for the something like the 8th time. Although once the concept of references is understood, the syntax and use cases can still be confusing for quite some time, especially in Perl, because There Is More Than One Way To Do It.

This tutorial is the first in a five part series. This part will focus on the basics, preparing you for more complex uses in the following four parts. I've created a cheat sheet that summarizes what you'll learn in this document.

  • Part 1 - The basics (this document)
  • Part 2 - References as subroutine parameters
  • Part 3 - Nested data structures
  • Part 4 - Code references
  • Part 5 - Concepts put to use

I will stick with a single consistent syntax throughout the series and will refrain from using one-line shortcuts and other simplification techniques in loops and other structures in hopes to keep any confusion to a minimum. Part one assumes that you have a very good understanding of the Perl variable types, when they are needed, and how they are used. Some exposure to references may also prove helpful, but shouldn't be required.

If you find anything in this document that you feel could use improvement, or if you have any questions or you feel the document needs further clarity, please feel free to provide any and all feedback via the comments section below, or send me an email.

THE BASICS

References in Perl are nothing more than a scalar variable that instead of containing a usable value, they 'point' to a different variable. When you perform an action on a reference, you are actually performing the action on the variable that the reference points to. A Perl reference is similar to a shortcut to a file or program on your computer. When you double click the shortcut, the shortcut doesn't open, it's the file that the shortcut points to that does.

We'll start with arrays, and I'll get right into the code.

We'll define an array as normal, and then print out its contents.

my @array = ( 1, 2, 3 );

for my $elem ( @array ){
    say $elem;
}

Prepending the array with a backslash is how we take a reference to the array and assign the reference to a scalar. The scalar $aref now is a reference that points to @array.

my $aref = \@array;

At this point, if you tried to print out the contents of $aref, you would get the location of the array being pointed to. You know you have a reference if you ever try to print a scalar and you get output like the following:

ARRAY(0x9bfa8c8)

Before we can use the array the reference points to, we must dereference the reference. To gain access to the array and use it as normal, we use the array dereference operator @{}. Put the array reference inside of the dereference braces and we can use the reference just as if it was the array itself:

for my $elem ( @{ $aref } ){
    say $elem;
}

The standard way of assigning an individual array element to a scalar:

my $x  = $array[0];

To access individual elements of the array through the reference, we use a different dereference operator:

my $y = $aref->[1];

Assign a string to the second element of the array in traditional fashion:

$array[1]  = "assigning to array element 2";

To do the same thing through an array reference, we dereference it the same way we did when we were taking an element from the array through the reference:

$aref->[1] = "assigning to array element 2";

You just learnt how take a reference to an array (by prepending the array with a backslash), how to dereference the entire array reference by inserting the reference within the dereference block @{}, and how to dereference individual elements of the array through the reference with the -> dereference operator. That is all there is to it. Hashes are extremely similar. Let's look at them now.

Create and initialize a normal hash, and iterate over its contents:

my %hash = ( a => 1, b => 2, c => 3 );

while ( my ( $key, $value ) = each %hash ){

    say "key: $key, value: $value";
}

Take a reference to the hash, and assign it to a scalar variable:

my $href = \%hash;

Now we'll iterate over the hash through the reference. To access the hash, we must dereference it just like we did the array reference above. The dereference operator for a hash reference is %{}. Again, just wrap the reference within its dereferencing block:

while ( my ( $key, $value ) = each %{ $href } ){

    say "key: $key, value: $value";
}

Access an individual hash value:

my $x = $hash{ a };

Access an individual hash value through the reference. The dereference operator for accessing individual elements of a hash through a reference is the same one we used for an array (->).

my $y = $href->{ a };

Assign a value to hash key 'a':

$hash{ a }  = "assigning to hash key a";

Assign a value to hash key 'a' through the reference:

$href->{ a } = "assigning to hash key a";

That's essentially the basics of taking a reference to something, and then dereferencing the reference to access the data it points to.

When we operate on a reference, we are essentially operating on the item being pointed to directly. Here is an example that shows, in action, how operating directly on the item has the same effect as operating on the item through the reference.

my @b = ( 1, 2, 3 );
my $aref = \@b;

# assign a new value to $b[0] through the reference

$aref->[0] = 99;

# print the array

for my $elem ( @b ){
    say $elem;
}

Output:

99
2
3

As you can see, the following two lines are equivalent:

$b[0] = 99;
$aref->[0] = 99;

CHEAT SHEET

Here's a little cheat sheet for review before we move on to the next part in the series.

my @a = ( 1, 2, 3 );
my %h = ( a => 1, b => 2, c => 3 );

# take a reference to the array
my $aref = \@a;

# take a reference to the hash
my $href = \%h;

# access the entire array through its reference
my $elem_count = scalar @{ $aref };

# access the entire hash through its reference
my $keys_count = keys %{ $href };

# get a single element through the array reference
my $element = $a->[0];

# get a single value through the hash reference
my $value = $h->{ a };

# assign to a single array element through its reference
$a->[0] = 1;

# assign a value to a single hash key through its ref
$h->{ a } = 1;

This concludes Part 1 of our Guide to Perl references. My goal was not to compete with all the other reference guides available, but instead to complement them, with the hope that perhaps I may have said something in such a way that it helps further even one person's understanding. Next episode, we'll learn about using references as subroutine parameters.

Update: An astute reader sent me an email after noticing that this tutorial does not mention scalar references at all. This was a design choice. I didn't feel it necessary to justify the extra space to explain them, as they are very rarely used. They do exist though :) Thanks Asbjørn Thegler for the kind email!

7 comments:

  1. Very nice Steve, I will recommend to my friends.

    ReplyDelete
  2. I wish I would have had this concise and clear tutorial a few months ago! It's great.

    ReplyDelete
  3. Thank you for the very kind comments Anonymous!

    ReplyDelete
  4. Is "my $x = $array[0];" supposed to be @array instead?

    ReplyDelete
  5. And is $array[1] = "assigning to array element 2"; supposed to also be "@array"? This is getting really confusing

    ReplyDelete
  6. Anonymous: No on both counts :)

    When accessing individual items of a list variable (array/hash), you must use the scalar sigil. Only when using the entire list do you use its list sigil, such as when you are initially assigning to it. Just remember: any time you are accessing a single item, no matter what it is, you must use $.

    ReplyDelete