stevieb's tech musings: ironman

In the 10 years I've been programming Perl off and on, I've heard a fair amount about Perl 6. There are those who love it, and those who dislike (fear?) it. For me, I had always wanted to look further into it but never found the time. Don't get me wrong, I absolutely love Perl 5, and will likely be using it until we see the day that it fades into the same level of obscurity that some of my code resembles.

Over the last couple of weeks, I've been constantly tempted to follow the Perl6 link in moritz's PerlMonks signature. Yesterday I broke down and decided to see what colour I wanted my bikeshed. Here are a few of the really interesting differences I've found so far.

In this post, I'll touch on strict, sigils, how variables are objects and have methods, types, and a bit on control structures. In a couple following posts, I'll describe the basics of other changes, and then get into more advanced aspects of the new language. When I'm comfortable enough and can change as much as possible from 5 to 6, in my last post on the subject, I'll include the code of one of my short Perl 5 modules translated into Perl6.

STRICT

Out of the box, the first really nice feature is that strict is enabled by default.

% cat no_strict.pl

#!/home/steve/perl6/perl6
say $hello;

Output:

% ./no_strict.pl

===SORRY!===
Variable $hello is not declared
at ./no_strict.pl:2

SIGILS

In Perl6, variables retain their sigils regardless of what operation you perform on them. To access an element of an array or the value of a hash in Perl 5, you had to use the scalar sigil to signify you intend to access it as such:

Perl 5 way:

my @a = qw( 1, 2, 3 );
say $a[0];

my %h = ( key => 'value' );
say $h{ key };

But in Perl6:

my @a = 1, 2, 3;
say @a[0];

my %h = 'key' => 'value';
say %h{ key };

Output:

===SORRY!===
CHECK FAILED:
Undefined routine '&key' called (line 7)

Oh, oh! What happened? The array portion of the code is fine, but we broke at the hash code. Well, in Perl6, hash keys are not automatically quoted like they are in Perl 5 when attempting to access the hash values. Instead of retrieving the value, it attempts to call the sub key(), looking for it to return the name of the key to be used. The proper way to access the hash values through a key is as such:

# the old faithful

say %h{ 'key' };

# or the new auto-quote syntax

say %h< key >;

VARIABLES ARE OBJECTS (and have methods)

Here are a few examples of the new variable object methods in action, and their corresponding perl 5 syntax (which still works in Perl6). I'll show a few examples of arrays first, then hashes. Also worth noting is the lack of parens around the array elements in the definition. Surrounding the elements in parens is still valid, but the qw() function is missing.

Variable methods, arrays

my @a = 2, 3, 1;

# number of array elements

say @a.elems;
say scalar @a;

# sort array

say @a.sort;
say sort @a;

# map array

say @a.map({ $_ + 10 });
say map { $_ + 10, ' ' } @a;

# or even

say @a.sort.map({ $_ + 10 });
say map { $_ + 10, ' ' } sort @a;

I found an interesting difference while building those code examples. In Perl 5:

perl -E 'my @a=qw( 1 2 3 ); my $x=@a; say $x'
3

...but in Perl6:

perl6 -e 'my @a=1,2,3; my $x=@a; say $x'
1 2 3

However, using the array in numeric comparisons evaluates the array as its number of elements:

perl6 -e 'my @a=1,2,3; say "ok" if @a == 3'
ok

Variable methods, hashes and their Perl 5 syntax counterparts

my %h = z => 26, b => 5, c => 2, a => 9;

say %h.keys;
say $_ for keys %h;
# could also be written as:
say keys %h; # but the spacing is different in 5

say %h.values;
say $_ for values %h;

say %h.keys.sort;
say $_ for sort keys %h;

Note: Most of the variable object methods also still act as functions, so the following are equivalent:

say %h.keys;
say keys %h;

EVERYTHING IS AN OBJECT, AND HAS A TYPE (and can optionally be constrained)

To give an extremely clear example of how everything is an object and has a type before I get further into how types are handy, I'll use some syntax that I tried and was surprised that it worked. The WHAT() method when called on something informs you of its type.

# calling methods on literals w00t! :)

say 25.WHAT;
say 'string'.WHAT;
say (1,2,3).WHAT;

Output:

Int()
Str()
Parcel()

We can do simple type checking:

my $quote = "I am liking Perl6";

if $quote ~~ Str {
    say "it's a string";
}

Note the lack of parens again, in the if() condition this time. More on this shortly. For now, just know that they can be used (but there are gotchas), but it is recommended that you don't use them.

Constraining variables to certain types is also easy.

# define $x as an Int
my Int $x = 5;

# try to assign it a string
$x = "Hello, world!";

Output:

Type check failed in assignment to '$x'; expected 'Int' but got 'Str'
  in block  at ./types.pl:15

Types have an inheritance hierarchy, but I am not too familiar with it yet. I'll update this post as I learn more. For example, an Int is a subclass of Numeric.

CONTROL STRUCTURES

I briefly touched on using parens with the if statement above. Take this example:

my $x = 5;
if ($x < 10){
    # do stuff
}

In Perl6, having the parameter directly next to the opening parens with no whitespace tells the interpreter to try to call a function named 'if'. This is true for all of the control structures (if, while, for etc). If you leave at least one whitespace between the opening parens and the first character of the expression, things will work as normal. However, to protect against mistakes, it is advised you omit the parens entirely. Here are some interesting changes:

In Perl 5, for the most part, we'd use named lexicals in our for loops like this:

for my $elem ( @a ){ say $elem; }

In Perl6, to avoid use of $_, we use a "pointy block":

for @a -> $elem { say $elem; }

Because I've been testing each code snip before I paste it into this document, I of course just ensured that my Perl 5 for() example was written correctly lest my eyes miss something. Against perl6, the typical Perl 5 for structure above gave me this output:

===SORRY!===
This appears to be Perl 5 code
at ./control.pl:15

So it looks like the pointy block is the way forward. Another note about for(); it is now only used for lists. Perl6 separated the C-style for loop into a loop() structure.

It is now possible to use more than one loop variable:

for @a -> $first, $second, $third { 
    say "$first, $second, $third: I'm greedy on each iteration!"; 
}

Or iterate over a hash without a while/each

for %h.kv -> $k, $v {
    say "$k :: $v"
}

While I was throwing out the use of the kv() method, one of my readers who opted to remain Anonymous pointed out a fantastic feature that I had missed. kv() can be used against arrays as well as hashes. When used against an array, the key is the index number of the array, and the value is the contents of the element. How many times have you sighed at the fact that you have to declare an iteration scalar prior to a for() loop, and then waste another line increasing it upon each loop? No more! The first potential use case that came to me after reading Anon's comment was using the index of the array and the element to create a hash:

my @a = 'a', 'b', 'c';
my %h;

for @a.kv -> $index, $elem {
    %h{ $index } = $elem;
}

There are so many cases where I can think of that we can benefit from not having to define "$i = 0;" and then a second line "$i++". Two lines saved. If you are like me, you dislike using variables for temporary assignments.

Thanks for reading. I hope you enjoyed my little beginning venture into the world of Perl6.

For the most part, the resource I'm using to base my tests and code on can be found here.

20120405

Update 20120406: Thanks to Daniel Ruoso for clarifying what really happens when a bareword is used as a hash key.

Update 20120410: Thanks to a kind Anonymous reader who pointed out that I missed that kv() could be called against an array, and the details of its implementation. They also provided sample code to describe a use-case, which I used as a baseline to create my array kv() example. Cheers!

This post attempts to explain the subtleties of Perl's five named blocks. You'll learn during what phase of operation each one operates at, the order of execution, and the reason they may be needed, including code examples.

The sample code assumes perl version 5.10 or higher.

Perl's five named blocks (in order of execution) are BEGIN, UNITCHECK, CHECK, INIT and END. We'll begin with BEGIN :)

BEGIN: These blocks are executed during compilation, as soon as the definition of the block is complete.

Ensuring lexical state data that shares an outer block with a subroutine is a perfect example of where a BEGIN block makes sense. Here is an example of what happens if the sub uses its state data prior to the data being defined in normal program flow. (hint: it is reset when it is redefined):

Code:

persist();
{
    my $store = 0;
    sub persist {
        say $store++;
    }
}
persist();
persist();

Output:

./begin.pl
0 0 1

By using the BEGIN block, the inner code is defined during compile, so it is available and ready to use before runtime even starts. We can now safely call persist() as many times as we like regardless of the layout of the code, and the state variable will never be reset.

Code:

persist();
BEGIN {
    my $store = 0;
    sub persist {
        say $store++;
    }
}
persist();
persist();

Output:

./begin.pl
0 1 2

NOTE: INIT blocks will perform the same task as the BEGIN block in this case, but INIT is performed at the beginning of runtime as opposed to during compile time. Although INIT could be used here, it is more common to see BEGIN used. BEGIN is only *mandatory* when you need to execute code prior to runtime starting, eg. before any other files or modules are imported. See the INIT section below for an example case where INIT *must* be used.

END: This block will execute after all code in the calling stack has finished. For instance, if I need the program to write to a log file no matter if the program fails or not, I could use an END block to ensure this happens.

Code:


say "Doing work";

other_work();

# write that we've finished

write();

sub other_work {
    say "Doing other work";
    die() if 1; #fatal error!
}
sub write {
    open my $fh, '+>', 'file.log' or die "Can't open file: $!";
    say $fh "Program run at " . time();
}

END{
    open my $fh, '+>', 'file.log' or die "Can't open file: $!";
    say $fh "Program failed at " . time();
}

Because the program terminates via die() before the write() function is called, the log file is not updated, therefore we don't know if the program ran today or not. Since we need to know that the program ran regardless of whether it exited prematurely, we'd use an END block to ensure this. END blocks are executed no matter how or why the program terminates.

INIT/CHECK/UNITCHECK: Perform the same tasks as BEGIN or END, but are executed during different phases, and in different orders.

UNITCHECK: is executed during compile (in reverse order) after the successful compilation of each file loaded with a use() statement. I suppose this would be used if one needed to change the environment in steps to set things up before the next file is loaded. I've never seen it used.

CHECK blocks run in reverse order immediately after all of the code (both use()d code and main code) is compiled. I have read that CHECK blocks are used specifically by people writing and working on the insides of compilers, but don't quote me.

INIT runs code after compilation but before the execution of the code, so realistically, it would be the choice to run what I have up in my BEGIN example above, because I didn't need the code in that example during compilation. However, most coders are more familiar with seeing the use of BEGIN blocks.

There are however distinct situations where INIT must be used instead of BEGIN. If the code within the BEGIN block calls code that will not be defined at compile time (ie. outside of any other BEGIN blocks), compilation will fail. eg:

BEGIN {
    my $store = init_store();
    sub persist {
        say $store++;
    }
}

sub init_store { 0; }

Output:

Undefined subroutine &main::init_store called at ./init.pl line 10.
BEGIN failed--compilation aborted at ./init.pl line 14.

The state data can not be defined within the BEGIN block, because the init_store() sub is not known about until runtime. Remember that BEGIN blocks are executed during compilation, prior to the program running. An INIT block must be used in this case.

See perldoc perlmod

Thanks to JavaFan for the INIT section code example.

stevieb's tech musings

2012/04/05

use Perl6; A few very welcome changes in Perl5++

2012/04/03

use Perl; Purpose and practical use of the built-in named blocks

stevieb's

stevieb's