2012/04/07

use Perl; Guide to references: Part 4

This is part four in my five part series on Perl references. In this post, we will be discussing code references (coderef, or just cref), some of the benefits they provide, and some interesting use cases, including closures and dispatch tables. If you haven't already, you may want to review the other parts in the series:

  • Part 1 - The basics
  • Part 2 - References as subroutine parameters
  • Part 3 - Nested data structures
  • Part 4 - Code references (this document)
  • Part 5 - Concepts put to use

As with all of the other parts in the series, I request that you leave corrections, criticisms, improvements, additions, questions and requests for further clarity in the comments section below, or in an email.

CODE REFERENCES

A code reference in Perl is no different than any of the other references we've discussed in the previous episodes, but instead of pointing to a data variable, the ref points to a subroutine. You take a reference to a subroutine the same way you take a reference to anything else:

sub hello {
    say "Hello, world!";
}

my $cref = \&hello;

The & sigil represents a sub, and it is needed when we take the reference. As with taking the ref, using the ref is the same as before as well. We must use the -> deref operator to access the item the reference points to.

# use an aref
$aref->[ 0 ];

# use an href
$href->{ a };

# use a cref
$cref->();

We can also assign an anonymous sub to a cref in cases where we don't necessarily have to define the function with a name:

my $cref = sub { say "Hello, world!"; }

Now that we have that out of the way, lets move on to some practical and interesting uses for code references.

CLOSURES

The most common type of closure is a sub that returns a reference to an inner sub. They are often used in Object Oriented Programming (OOP) (which is outside the scope of this tutorial) to keep state data. State data is data that persists after the program has exited the scope in which the data was defined. I can explain it better with some code:

sub persist {

    my $count = 0;
    return sub { say $count++; }
}

my $count_cref = persist();

$count_cref->();
$count_cref->();
$count_cref->();

First, we define a subroutine named persist(). Inside that sub we define a lexical variable $count (a lexical variable is one that can not be seen outside the scope of the block it is declared in. In this case, nothing outside of persist() can see the $count variable). After defining $count, we create an anonymous sub that prints the result of $count, and then adds one to it. We then call persist(), assigning its return value to $count_cref. The return of persist is a reference to the anonymous subroutine.

Because $count_cref points to the inner anonymous sub returned from persist() and not to persist() itself, the $count variable is never reset, and the sub that $count_cref points to will always keep its own version of $count, incremented each time the anon sub is executed through the reference.

To show how the $count variable retains its value as long as $count_cref is alive, here is the output from the above code snip:

0
1
2

Closures aren't only handy for OOP. We can use the same persist() sub to create multiple counters.

sub persist {

    my $count = 0;
    return sub { $count++; }
}

my $count_a_cref = persist();
my $count_b_cref = persist();
my $count_c_cref = persist();

say "Count A: " . $count_a_cref->();
say "Count A: " . $count_a_cref->();
say "Count B: " . $count_b_cref->();
say "Count B: " . $count_b_cref->();
say "Count B: " . $count_b_cref->();
say "Count C: " . $count_c_cref->();

Output:

Count A: 0
Count A: 1
Count B: 0
Count B: 1
Count B: 2
Count C: 0

Calls to the individual cref do not affect the state variables of the other cref state variables.

Here's an example that shows a more practical case where closures with state variables could be useful. If you're thinking that globals would do the trick here, you're right; that isn't the point of this tutorial though ;). I'm sticking with simple here. In my fifth and final installment, we'll write something far more realistic that brings all aspects of the series together.

sub write_line {

    my $count = 0;
    return sub { return ++$count; }
}

# call the function twice, each time receiving
# a separate anonymous sub, along with separate
# state variables

my $steve_lines = write_line();
my $sarah_lines = write_line();

# steve writes two lines of code

my $steve_total;
$steve_total = $steve_lines->();
$steve_total = $steve_lines->();

# sarah writes one

my $sarah_total;
$sarah_total = $sarah_lines->();

say "Steve wrote $steve_total lines of code";
say "Sarah wrote $sarah_total lines of code";

Output:

Steve wrote 2 lines of code
Sarah wrote 1 lines of code

As an aside: In the preceeding example, I had to declare the sub prior to using it. To understand why and how to get around that, see my Purpose and practical use of Perl's named blocks post.

Closures aren't limited to being returned from outer subs though. Any function that can return an inner anonymous sub that can contain its own lexical data can be used to create a closure. Here's an example:

my %h;
for my $color ( qw(red green blue) ){
   $h{$color} = sub { say $color };
}

$h{ blue }->();
$h{ red }->();
$h{ green }->();

In that example, we iterate over three colours. For each colour, we set a hash key as the colour and set that key's value as an anonymous sub that when called, prints the colour. The following three lines execute the closures. Although we could have, we didn't use any lexical data to keep track of anything. Also note that this code auto generated a dispatch table, which we are going to learn about next.

DISPATCH TABLES

Dispatch tables are hashes who's key's values are references to subroutines. It is like a table of contents that allows you to execute code through the hash keys.

my %dt = (
            hello => sub { say "Hello, world!"; },
            add   => \&add,
        );

# call the functions

my $more = $dt{ add }->( 5, 5 );
$dt{ hello }->();

sub add {

    my $x = shift;
    my $y = shift;
    return ( $x + $y );
}

First we define the dispatch table hash. The first key has a value of an anonymous sub. The second key contains a cref that points to the add() sub. This shows how short, one-line type subs can be housed within the dispatch table. The add sub has been defined to take two parameters. When we call the add sub, you can see how we call the sub through the hash key, which executes the sub through the cref it contains as its value. We then insert the parameters as normal.

An example of the benefits of a dispatch table is a menu system, where a user must select from a range of options. You give the user a list of options to select from, and in your dispatch table, you name your keys as the options you provided the user with. Each option to the user is dropped directly into the key field of the hash, and the subsequent subroutine runs.

In the following example, we have three operations the user can perform where the hash value is a cref to an external sub. The fifth option, exit, is short and simple, so we create it as an anonymous sub within the table itself. A couple sanity checks to ensure the input is legal, and running the correct operation is as simple as putting the users input into the dispatch table.

Here is a fully working menu program based on the concept of a dispatch table. I've kept it as simple and as basic as possible for clarity.

#!/usr/bin/perl

use warnings;
use strict;
use 5.10.0;

my %dt = (
            add      => \&add,
            subtract => \&subtract,
            multiply => \&multiply,
            'exit'   => sub { say "\nGoodbye!\n"; exit; },
        );
    
while (1) {

    system( "clear" );
    
    print "Please enter either add, subtract, multiply or exit: ";
    chomp ( my $operation = <STDIN> );

    # exit if told to

    $dt{ $operation }->() if $operation eq 'exit';

    # exit if illegal param

    if ( ! exists $dt{ $operation } ){
        say "\nIllegal input... exiting\n";
        exit;
    }
    
    print "Type in your first number: ";
    chomp ( my $x = <STDIN> );

    print "Type in your second number: ";
    chomp ( my $y = <STDIN> );

    # run the command selected by the user

    my $result = $dt{ $operation }->( $x, $y );

    say "\nPerforming $operation on $x and $y = $result\n";

    print "\nPress ENTER to continue...\n";
    <STDIN>;
    
}
sub add {
    my ( $x, $y ) = @_;
    return $x + $y;
}
sub subtract {
    my ( $x, $y ) = @_;
    return $x - $y;
}
sub multiply {
    my ( $x, $y ) = @_;
    return $x * $y;
}

Hopefully that simplistic example was enough to at least give you an idea of what dispatch tables could be capable of.

That's it for this episode, thanks for reading. In my next and last post in the series, we'll bring everything together in a single program that utilizes most of the concepts of what we have learnt throughout.

Update: Thanks to maximum-solo for pointing out that I had limited my definition of closures to only a single use case, and for the example code that returns closures from a for loop.

Update: Thanks to Jay Scott for sending typographical and grammatical corrections, and for numerous logical code description fixes.

2 comments:

  1. I don't agree with your "closure" definition. You are right but the closures are not limited to the case you describe.

    A closure is an anonymous function that use non-local variables of that function.

    For example, the following code generates 3 closures:

    my %h;
    for my $color (qw(red green blue))
    {
    $h{$color} = sub { say $color };
    }

    ReplyDelete
  2. You're absolutely right. The wording I've used makes it seem that closures are limited to being a sub returned from another sub. I will correct my wording and add your example. Thanks!

    ReplyDelete