Perl Programming Style Guide
Brent S.A. Cowgill, B.A. Sc.
Adopting a consistent style of programming can enhance comprehensibility of programs, reduce bugs and ease maintenance. In addition, it reduces the number of decisions to make when writing a particular block of code.
This guide is a statement of my personal programming style. For the most part, I follow these guidelines whenever I develop code. Many of the concepts in this style guide were borrowed from the CPAN perl style guide.
This was done for efficiency, as I discovered that many of the style guidelines I follow are similar to the ones espoused there. I do have my own differences however.
This style guide is motivated by the following guiding principles:
Standard naming conventions make it easier to pick names for variables and functions leaving the mind to solve the more challenging problem of developing the best algorithm.
PACKAGE NAMES
Package::Name
-- Mixed case, capitalize word boundaries. No underscores.Perl informally reserves lowercase module names for "pragma" modules likeinteger
andstrict
. Other modules should begin with a capital letter and use mixed case, but probably without underscores due to limitations in primitive file systems' representations of module names as files that must fit into a few sparse bytes.
VARIABLE NAMES
$thing
otherwise use the name of the base class
or data type if nothing is more appropriate.$num
, $min
, $max
, $cols
, $idx
(index) $pos
$ofs
(offset) $fh
(file handle)$idx
, $x_pos
, $y_ofs
are more
readable and can be searched for better than $i
, $x
, $y
$page_size
not $size_of_page
$html_tag
$width_cm
$global
, %Global
-- All globals must be prefixed by global
(for scalars) or
Global
(for arrays)$Package::_our_private
-- Package variables used internally by a class should be prefixed with _our_
.$lexical_name
-- Lexical variables used in functions and methods appear with no special prefix.$_protected_lexical
-- Use _
to prefix a privately scoped lexical used in a closure. see example$CONST_NAME
-- Constants are written all in upper case with _
between words.$VISIBLE
preferred over $NOT_VISIBLE
use constant TABLE_COLS => 42
$scalar_name
-- Scalars are written all in lower case with _
between words.$raArray $rhHash $rcCode $rObject
-- References are written in mixed case with a prefix indicating the
type of the referrent. Each word boundary is capitalized.$bytes_max
-- scalar maximum bytes allowed.$page_size
-- physical size of a page.$global_debug
-- a globally scoped debug flag.$idx
-- a loop index.$rhConfig
-- a Reference to a Hash of CONFIG values.$rhGlobalConfig
-- a Reference to a globally scoped Hash of Config values.$raItems
-- a Reference to an Array of Items.$rcSort
-- a Reference to an Code block to use to SORT.$rPage
-- a Reference to an object blessed as a Page.$self
-- special case - reference to self object.$aPage
-- an alternative for an object reference.$theSystem
-- an alternative for a singleton or monadic object reference.$_our_rSystem
-- in a singleton class, the internal package variable holding the one and only instance of the System class.@ArrayName %HashName
-- Regular and associative arrays are written in mixed case, capitalizing
word boundaries.%MapX_Y
.@Fields
-- array of fields.%Files
-- files in a directory.%MapField_Idx
-- map of field name to index.%_our_Properties
-- internal class-wide Property settings.METHOD NAMES
Init()
function_name()
-- Non-object procedures should be in lower case with _ on word boundaries.$rObj->MethodName()
-- Class and instance methods of classes should be in mixed case with capitals on word boundaries.$rObj->property()
-- Property get/set methods for classes should be all in lower case with _
on word boundaries.$rObj->SetLayout({ ... })
-- Property get/set methods which get/set large structures can be named GetX
or SetX
.is_empty() $rObj->IsEmpty()
-- Boolean methods which check something should be named IsX()
, HasX()
, CanX()
, is_x()
, has_x()
, or can_x()
$rObj->_Private()
-- Internal methods for classes should be prefixed by _.$rObj->OnSave() $rObj->DoCommit()
-- Methods which are meant to be implemented by derived classes should be named OnX()
, HandleX()
or DoX()
.compare_files(...)
-- a procedure to compare two files.$rRect->width()
-- width property get/set for an object.$rPage->IsLandscape()
-- check if the page is in landscape mode.$rDate->FormatString(...)
-- format a date object as a string.$self->_Init()
-- internal initialization method.OnStartTag()
-- inheritable method called when a start tag is seen.DoSave()
-- inheritable method called when object should save itself.HASH KEY NAMES
HASH_KEY_NAME
-- Always use uppercase names for hash keys, so they can be used unquoted with
little chance of clashing with future builtin names in perl._
to indicate a key that is private or computed
and not meant to be inserted by client code.Each programmer will, of course, have his or her own preferences in regards to formatting, but there are some general guidelines that will make your programs easier to read, understand, and maintain.
Unless the code in the block is only one line, then can be all on same line.Braces on their own lines, make it easy to hop up and down the file from block to block with the parenthesis match feature of most good editors.
if
, while
, foreach
, for
, BEGIN
, END
and sub
blocks with
a comment indicating the type of block, especially when several are nested.sub function { if ($true) { print; } # a small one line block if (...) { while (1) { } # while } # if else { ... } # else } # function()
mkdir($dir_temp, 0700) or die "$0: can't mkdir $dir_temp: $!"; chdir($dir_temp) or die "$0: can't chdir $dir_temp: $!"; mkdir('tmp', 0777) or die "$0: can't mkdir $dir_temp/tmp: $!";
tr [abc] [xyz];
# validate function parameters die "error" unless defined($_[0]); die "error" unless ref($_[0]) eq 'HASH'; # reverse the hash so that the values map to the keys local $_; my %ReversedMap = map { ($_[0]->{$_}, $_) } keys(%{$_[0]});
(ie @{} ?: ->[] map {} grep {} sort {} slices)
You never know when you're going to add another line to the block or need to add some debugging code to the block. When you do, you'll most likely forget to add the semicolon, causing a syntax error (= frustration). Code should always be written so you can insert/delete similar lines without causing syntax errors.
This makes it easy to insert or move logic around in the expression without having to adjust the end of the line above. It's easier to see the start of the line and fix it, than to see the end of the line.
if ($this > 12 && $this < 42 || $this == 18) { ... } # if $message = "a long message a long message a long message " . "a long message a long message a long message " . "a long message a long message a long message ";
grep
, sort
, map
, print
.use
$x_val = function();
instead of$x_val = function;
or$x_val = &function;
Functions called in this manner rely on the caller's
@_
and could behave differently when copied and pasted into another subroutine. see exampleAlso, easy to hop around the line with the parenthesis match key in your editor.
return print reverse sort by_num values %Map; # not nice return print(reverse(sort by_num (values(%Map)))); # much betterConsider the mental welfare of the person who has to maintain the code after you, and who will probably put parentheses in the wrong place.
#========================================================================== # Major Division (public/private,etc) #========================================================================== #-------------------------------------------------------------------------- # Minor Division (function groups) #--------------------------------------------------------------------------jumping from section to section is easy, search for #=== or #--- (width of block is such that 'End' key puts cursor in column 76)A suggested list of Major Divisions for scripts: Module Header, Command Line Parsing, Main Body of Script, Functions
Major Divisions for Classes: Module Header, Module Initialization and Cleanup, Constructor and Destructor, Public Methods, Private Methods
use warnings
pragma at all times. Don't use the -w
switch on the #!
line as it won't be honoured when running in Windows. You may turn it off explicitly
for particular portions of code via the $^W
variable if you must. You
should also always run under use strict
or know the reason why not. The
use sigtrap
and even use diagnostics
pragmas may also
prove useful.Test
module. Code the unit
tests before you write the function, then as you implement the function the
unit tests begin to succeed and you can ensure you have handled all possible
boundary conditions for your function.local $_
whenever you use a foreach
, s///
, tr///
or other operation
using $_
or better yet always bind to a variable. If your function
modifies $_
and is called from within a foreach
which
in turn uses $_
you will destroy the caller's array as a side effect
unless you localise $_
see exampleEspecially if the number is used in more than one place in the code. Use a real constant or just a variable name. In practice, numbers are forbidden to be used in calculations, always use a constant. The only exceptions are 0 and 1. Using named constants in function calls, serves to document the function call.my ($X, $Y, $WIDTH, $WRAP) = (10, 10, 60, 1); print_text($X, $Y, $WIDTH, "message", !$WRAP);is self documenting as compared toprint_text(10, 10, 60, "message", 0);No one enjoys looking at a complicated expression full of numbers, trying to figure out what number means what.
IO::File
or by using an
anonymous glob:my $fh = do { local *FH; *FH; }; # runs under -w with no warnings
croak
function to report errors in parameters passed
into functions and die
for more serious errors when using those
parameters during normal operation. Use Carp::Assert
$SIG{__DIE__} = &error
see why
# name of function args system error opendir($dh, $dir) or die( (caller(0))[3] . ": $dir: $!");
open($fh, $file_name) || die "$0: Can't open $file_name: $!";
is better than
die "Can't open $foo: $!" unless open(FOO,$foo);
because the second way hides the main point of the statement in a modifier.
But consider also
use Fatal qw(open); open($fh, $file_name);
On the other hand
print "Starting analysis\n" if $verbose;
is better than
$verbose && print "Starting analysis\n";
because the main point isn't whether the user typed -v or not.
Similarly, just because an operator lets you assume default arguments doesn't mean that you have to make use of the defaults. The defaults are there for lazy systems programmers writing one-shot programs. If you want your program to be readable, consider supplying the argument.
LINE: for (;;) { statements; last LINE if $foo; next LINE if /^#/; statements; }
grep()
(or
map())
or `backticks` in a void context, that is, when you just throw away their return values. Those functions all have return values, so use them. Otherwise use a
foreach()
loop or the
system()
function instead.eval { }
to see if it fails. If you know what
version or patchlevel a particular feature was implemented, you can test
$]
($PERL_VERSION
in English
) to see
if it will be there. The Config
module will also let you interrogate
values determined by the Configure program when Perl was installed./x
modifier and put in some whitespace to make it look a little less like line
noise. Don't use slash as a delimiter when your regexp has slashes or
backslashes.print()
statements.Where it makes sense, remove leading whitespace from your here documents so they don't break up the code so much. see example
use strict
and
use warnings in effect.push()
or pop()
method.{ # begin a scope to limit access to lexical my @_Stack; sub push { push(@_Stack, @_[0]); }; sub pop { pop(@_Stack); }; } # end of lexical scope
f(qw(apple orange)); sub func { my $x = shift; print "x: " . ($x || 'undef') . "\n"; } sub f { func; # x: undef &func; # x: apple -- weird func(); # x: undef &func(); # x: undef }
SWITCH: { local $_; # just in case you muck with it if (/^abc/) { ...; last SWITCH; } elsif (/^def/) { ...; last SWITCH; } else { die "never"; } } # SWITCH SWITCH: { local $_; # just in case you muck with it /^abc/ && do { ...; last SWITCH; }; /^def/ && do { ...; last SWITCH; }; die "never"; } # SWITCH foreach ($some_long_name_or_expression) { /abc/ and do { ...; last; }; /def/ and do { ...; last; }; die "never"; } # foreach for ($some_long_name_or_expression) { $value = /red/ ? 0xff0000 : /green/ ? 0x00ff00 : /blue/ ? 0x0000ff : 0x000000 ; # black if fail } # for
use Data::Dumper; my @Array = qw(THIS IS A TEST); foreach (@Array) { ff(); } sub ff { while (<STDIN>) { # @Array is now dead } } print Dumper \@that;To fix this,
ff()
should use local $_
before the while
loop. or
assign to a lexical with while (my $line = <STDIN>)
$SIG{__DIE__} = &error
you will have to do:eval { local $SIG{'__DIE__'}; $SIG{'__DIE__'} = '' if $SIG{'__DIE__'}; ... };whenever you wish to trap an exception. It's not needed as you can always rewrite your main program as:
sub main { eval { ... main program }; error($@) if ($@); } # main()
print dequote(<<"EOF"); | Attention criminal slacker, we have yet | to receive payment for our legal services. | | Love and kisses | EOF print dequote(<<'FOO'); Attention, dropsied weasel, we are launching our team of legal beagles straight for your scrofulous crotch. xx oo FOOThe following
dequote()
function handles all these cases. It
expects to be called with a here document as its argument. It
looks to see whether each line begins with a common substring,
and if so, strips that off. Otherwise, it takes the amount of
leading white space found on the first line and removes that
much off each subsequent line.
sub dequote { local $_ = shift; my ($white, $leader); # common white space and common leading string # Look at first and second line, to find a common prefix string if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) { ($white, $leader) = ($2, quotemeta($1)); } # if else { # no common prefix, strip all leading whitespace ($white, $leader) = (/^(\s+)/, ''); } # else # Now manipulate each line of the string by stripping the leadin s/^\s*?$leader(?:$white)?//gm; return $_; } # dequote()