2012/04/03

use Perl; Purpose and practical use of the built-in named blocks

This post attempts to explain the subtleties of Perl's five named blocks. You'll learn during what phase of operation each one operates at, the order of execution, and the reason they may be needed, including code examples.

The sample code assumes perl version 5.10 or higher.

Perl's five named blocks (in order of execution) are BEGIN, UNITCHECK, CHECK, INIT and END. We'll begin with BEGIN :)

BEGIN: These blocks are executed during compilation, as soon as the definition of the block is complete.

Ensuring lexical state data that shares an outer block with a subroutine is a perfect example of where a BEGIN block makes sense. Here is an example of what happens if the sub uses its state data prior to the data being defined in normal program flow. (hint: it is reset when it is redefined):

Code:

persist();
{
    my $store = 0;
    sub persist {
        say $store++;
    }
}
persist();
persist();

Output:

./begin.pl
0 0 1

By using the BEGIN block, the inner code is defined during compile, so it is available and ready to use before runtime even starts. We can now safely call persist() as many times as we like regardless of the layout of the code, and the state variable will never be reset.

Code:

persist();
BEGIN {
    my $store = 0;
    sub persist {
        say $store++;
    }
}
persist();
persist();

Output:

./begin.pl
0 1 2

NOTE: INIT blocks will perform the same task as the BEGIN block in this case, but INIT is performed at the beginning of runtime as opposed to during compile time. Although INIT could be used here, it is more common to see BEGIN used. BEGIN is only *mandatory* when you need to execute code prior to runtime starting, eg. before any other files or modules are imported. See the INIT section below for an example case where INIT *must* be used.

END: This block will execute after all code in the calling stack has finished. For instance, if I need the program to write to a log file no matter if the program fails or not, I could use an END block to ensure this happens.

Code:


say "Doing work";

other_work();

# write that we've finished

write();

sub other_work {
    say "Doing other work";
    die() if 1; #fatal error!
}
sub write {
    open my $fh, '+>', 'file.log' or die "Can't open file: $!";
    say $fh "Program run at " . time();
}

END{
    open my $fh, '+>', 'file.log' or die "Can't open file: $!";
    say $fh "Program failed at " . time();
}

Because the program terminates via die() before the write() function is called, the log file is not updated, therefore we don't know if the program ran today or not. Since we need to know that the program ran regardless of whether it exited prematurely, we'd use an END block to ensure this. END blocks are executed no matter how or why the program terminates.

INIT/CHECK/UNITCHECK: Perform the same tasks as BEGIN or END, but are executed during different phases, and in different orders.

UNITCHECK: is executed during compile (in reverse order) after the successful compilation of each file loaded with a use() statement. I suppose this would be used if one needed to change the environment in steps to set things up before the next file is loaded. I've never seen it used.

CHECK blocks run in reverse order immediately after all of the code (both use()d code and main code) is compiled. I have read that CHECK blocks are used specifically by people writing and working on the insides of compilers, but don't quote me.

INIT runs code after compilation but before the execution of the code, so realistically, it would be the choice to run what I have up in my BEGIN example above, because I didn't need the code in that example during compilation. However, most coders are more familiar with seeing the use of BEGIN blocks.

There are however distinct situations where INIT must be used instead of BEGIN. If the code within the BEGIN block calls code that will not be defined at compile time (ie. outside of any other BEGIN blocks), compilation will fail. eg:

BEGIN {
    my $store = init_store();
    sub persist {
        say $store++;
    }
}

sub init_store { 0; }

Output:

Undefined subroutine &main::init_store called at ./init.pl line 10.
BEGIN failed--compilation aborted at ./init.pl line 14.

The state data can not be defined within the BEGIN block, because the init_store() sub is not known about until runtime. Remember that BEGIN blocks are executed during compilation, prior to the program running. An INIT block must be used in this case.

See perldoc perlmod

Thanks to JavaFan for the INIT section code example.

No comments:

Post a Comment