NAME

Perl Programming Style Guide

AUTHOR

Brent S.A. Cowgill, B.A. Sc.


DESCRIPTION

Adopting a consistent style of programming can enhance comprehensibility of programs, reduce bugs and ease maintenance. In addition, it reduces the number of decisions to make when writing a particular block of code.

This guide is a statement of my personal programming style. For the most part, I follow these guidelines whenever I develop code. Many of the concepts in this style guide were borrowed from the CPAN perl style guide.

This was done for efficiency, as I discovered that many of the style guidelines I follow are similar to the ones espoused there. I do have my own differences however.


GUIDING PRINCIPLES

This style guide is motivated by the following guiding principles:

  1. The code doesn't work until it looks like it works.
  2. The code doesn't work until you've tested that it works.
  3. Anything complex belongs in a separate function or class. Every class/function does one thing only.
  4. Logic appears once and only once. Anything repeated belongs in a separate constant, variable, function or class.
  5. The brain can keep track of roughly seven things at once.
  6. If there's a small change in specifications, that should in turn generate only a small change in the code.
  7. Minimize constructs which will cause syntax errors or bugs when more is added.
  8. Copy and paste breeds bugs.
  9. Anything hacked together is just a prototype. Study it and then design the real thing.
  10. If a procedure is complex, do it every day, not once a month. Then when you need that skill in a crisis it will be easy and you won't worry about screwing up.

NAME CONVENTIONS

Standard naming conventions make it easier to pick names for variables and functions leaving the mind to solve the more challenging problem of developing the best algorithm.

PACKAGE NAMES

  1. Package::Name -- Mixed case, capitalize word boundaries. No underscores.
  2. Perl informally reserves lowercase module names for "pragma" modules like integer and strict. Other modules should begin with a capital letter and use mixed case, but probably without underscores due to limitations in primitive file systems' representations of module names as files that must fit into a few sparse bytes.

VARIABLE NAMES

  1. Name describes what it contains.
  2. Consider using a thesaurus while you program. It helps to get the right name for something.
  3. Polymorphic variables with no known base class can be called $thing otherwise use the name of the base class or data type if nothing is more appropriate.
  4. Avoid abbreviations especially by excluding vowels.
  5. Allowed abbreviations: $num, $min, $max, $cols, $idx (index) $pos $ofs (offset) $fh (file handle)
  6. No single letter variable names are allowed. $idx, $x_pos, $y_ofs are more readable and can be searched for better than $i, $x, $y
  7. Compound names should contain the noun or thing it represents, then the adjective describing it. i.e. $page_size not $size_of_page
  8. Ok to abbreviate when using long compound names. But if the abbreviation is obscure, put a glossary in the module description describing the full name: $html_tag
  9. Include units in numerical quantities where appropriate. $width_cm
  10. Name similar things consistently. Makes searches more productive.
  11. Long distinctive names without a common prefix are preferred as a good editor will have auto-completion allowing you to type one or two letters and hit the auto-complete key to get the rest of the long variable name.
  12. SCOPE
    1. $global, %Global -- All globals must be prefixed by global (for scalars) or Global (for arrays)
    2. Global variables are to be avoided in all serious code. However, in one-off scripts, they can be used. Consider using a single global hash though, as this is much easier to examine in a debugger than adding many variables to the watch window.
    3. Avoid package local variables which are meant to be used by client code, use an accessor method instead
    4. $Package::_our_private -- Package variables used internally by a class should be prefixed with _our_.
    5. $lexical_name -- Lexical variables used in functions and methods appear with no special prefix.
    6. $_protected_lexical -- Use _ to prefix a privately scoped lexical used in a closure. see example
  13. CONSTANTS
    1. $CONST_NAME -- Constants are written all in upper case with _ between words.
    2. $VISIBLE preferred over $NOT_VISIBLE
    3. or use constant TABLE_COLS => 42
  14. SCALARS
    1. $scalar_name -- Scalars are written all in lower case with _ between words.
    2. $raArray $rhHash $rcCode $rObject -- References are written in mixed case with a prefix indicating the type of the referrent. Each word boundary is capitalized.
  15. ARRAYS
    1. @ArrayName %HashName -- Regular and associative arrays are written in mixed case, capitalizing word boundaries.
    2. In many cases, the name of an associative array can be of the form %MapX_Y.

METHOD NAMES

  1. The name of the method should describe what it does in verb noun format.
  2. Avoid abbreviations. exceptions: Init()
  3. function_name() -- Non-object procedures should be in lower case with _ on word boundaries.
  4. $rObj->MethodName() -- Class and instance methods of classes should be in mixed case with capitals on word boundaries.
  5. $rObj->property() -- Property get/set methods for classes should be all in lower case with _ on word boundaries.
  6. $rObj->SetLayout({ ... }) -- Property get/set methods which get/set large structures can be named GetX or SetX.
  7. is_empty() $rObj->IsEmpty() -- Boolean methods which check something should be named IsX(), HasX(), CanX(), is_x(), has_x(), or can_x()
  8. $rObj->_Private() -- Internal methods for classes should be prefixed by _.
  9. $rObj->OnSave() $rObj->DoCommit() -- Methods which are meant to be implemented by derived classes should be named OnX(), HandleX() or DoX().

HASH KEY NAMES

  1. HASH_KEY_NAME -- Always use uppercase names for hash keys, so they can be used unquoted with little chance of clashing with future builtin names in perl.
  2. Prefix hash key name with _ to indicate a key that is private or computed and not meant to be inserted by client code.

CODE LAYOUT

Each programmer will, of course, have his or her own preferences in regards to formatting, but there are some general guidelines that will make your programs easier to read, understand, and maintain.

STYLE

  1. Opening brace on new line aligned with left of keyword.
  2. Unless the code in the block is only one line, then can be all on same line.

    Braces on their own lines, make it easy to hop up and down the file from block to block with the parenthesis match feature of most good editors.

  3. Closing brace of a multiline block lines up with left of keyword opening the block.
  4. Uncuddled elses.
  5. Mark closing braces of if, while, foreach, for, BEGIN, END and sub blocks with a comment indicating the type of block, especially when several are nested.
  6. sub function
    {
       if ($true) { print; }  # a small one line block
       if (...)
       {
          while (1)
          {
          } # while
       } # if
       else
       {
          ...
       } # else
    } # function()
      

SPACING

  1. 3-column indent. No tabs.
  2. Space around most operators.
  3. Space around a "complex" subscript (inside brackets or braces).
  4. Blank lines between chunks that do different things.
  5. No space before the semicolon.
  6. No space between function name and its opening parenthesis.
  7. Space after every comma and semicolon
  8. No space around ->
  9. Space before/after first/last parenthesis matching on current line, for inline arrays and expressions.
  10. Space on either side of curly braces for inline blocks.
  11. Line up corresponding things vertically, especially if it'd be too long to fit on one line anyway.
  12.     mkdir($dir_temp, 0700) or die "$0: can't mkdir $dir_temp: $!";
        chdir($dir_temp)       or die "$0: can't chdir $dir_temp: $!";
        mkdir('tmp',     0777) or die "$0: can't mkdir $dir_temp/tmp: $!";
      
  13. Line up your transliterations when it makes sense:
  14.     tr [abc]
           [xyz];
    

COMMENTS

  1. Wrap comments and code at ~75 columns wide
  2. If commenting down the right side, start at column 35
  3. A space after the # on a single line comment
  4. Put a comment above a chunk, giving an overview of what the chunk does. Line the comment up with the left of the first line in the chunk.
  5. Comments should not just repeat the logic of the code, but explain it on a higher level. Two ways to do this: Write a string of comments describing what needs to be done first, and then add the code after each comment line; when a complex block of code is written, summarize what the block is accomplishing.
  6.   # validate function parameters
      die "error" unless defined($_[0]);
      die "error" unless ref($_[0]) eq 'HASH';
    
      # reverse the hash so that the values map to the keys
      local $_;
      my %ReversedMap = map { ($_[0]->{$_}, $_) } keys(%{$_[0]});
    
    
  7. If you are coding to a publicised standard or grabbing a complicated algorithm from a book, put a comment in the code mentioning the URL(s) or book and pages used. Maintenance developers do not always know where you got the original code from and may benefit from reading the source.
  8. Always put a comment in code that has been written an obscure way because of speed, bug workarounds, major design assumptions, backwards compatibility, or a penchant for one-liners. A good rule of thumb, you need to add a comment if your line of code has more than 5 operators/functions or complex constructs (ie @{} ?: ->[] map {} grep {} sort {} slices)

PUNCTUATION

  1. Semicolon always present, even in "short" one-line block.
  2. You never know when you're going to add another line to the block or need to add some debugging code to the block. When you do, you'll most likely forget to add the semicolon, causing a syntax error (= frustration). Code should always be written so you can insert/delete similar lines without causing syntax errors.
  3. Long lines broken before an operator or after a comma.
  4. This makes it easy to insert or move logic around in the expression without having to adjust the end of the line above. It's easier to see the start of the line and fix it, than to see the end of the line.

    if ($this > 12
        && $this < 42
        || $this == 18)
    {
       ...
    } # if
    $message = "a long message a long message a long message "
               . "a long message a long message a long message "
               . "a long message a long message a long message ";
    

  5. When creating a hash, one item per line, end each line with a comma so lines can be inserted and moved around without having to keep track of that last line which has no comma.
  6. Never omit parenthesis when calling functions and builtins. Exceptions are grep, sort, map, print.
  7. use $x_val = function(); instead of $x_val = function; or $x_val = &function;

    Functions called in this manner rely on the caller's @_ and could behave differently when copied and pasted into another subroutine. see example

    Also, easy to hop around the line with the parenthesis match key in your editor.

    return print reverse sort by_num values %Map;      # not nice
    return print(reverse(sort by_num (values(%Map)))); # much better
      

    Consider the mental welfare of the person who has to maintain the code after you, and who will probably put parentheses in the wrong place.

STRUCTURE

  1. Perl has no switch statement but you can emulate one with any number of idioms. see example
  2. Organize your functions logically. Higher level functions at the top. Lower level and private functions at the bottom. Group related functions together. Begin with constructor/destructor, property accessor/setter, and finally mutators, behavioural overrides.
  3. Break your sections up with comment blocks:
  4. #==========================================================================
    # Major Division   (public/private,etc)
    #==========================================================================
    
    #--------------------------------------------------------------------------
    # Minor Division   (function groups)
    #--------------------------------------------------------------------------
    
    jumping from section to section is easy, search for #=== or #--- (width of block is such that 'End' key puts cursor in column 76)

    A suggested list of Major Divisions for scripts: Module Header, Command Line Parsing, Main Body of Script, Functions

    Major Divisions for Classes: Module Header, Module Initialization and Cleanup, Constructor and Destructor, Public Methods, Private Methods


PROGRAMMING IN GENERAL

  1. The most important thing is to run your programs under the use warnings pragma at all times. Don't use the -w switch on the #! line as it won't be honoured when running in Windows. You may turn it off explicitly for particular portions of code via the $^W variable if you must. You should also always run under use strict or know the reason why not. The use sigtrap and even use diagnostics pragmas may also prove useful.
  2. Test everything using perl's Test module. Code the unit tests before you write the function, then as you implement the function the unit tests begin to succeed and you can ensure you have handled all possible boundary conditions for your function.
  3. Use local $_ whenever you use a foreach, s///, tr/// or other operation using $_ or better yet always bind to a variable. If your function modifies $_ and is called from within a foreach which in turn uses $_ you will destroy the caller's array as a side effect unless you localise $_ see example
  4. Don't use hard coded numbers or short strings. Define a constant.
  5. Especially if the number is used in more than one place in the code. Use a real constant or just a variable name. In practice, numbers are forbidden to be used in calculations, always use a constant. The only exceptions are 0 and 1. Using named constants in function calls, serves to document the function call.
    my ($X, $Y, $WIDTH, $WRAP) = (10, 10, 60, 1);
    print_text($X, $Y, $WIDTH, "message", !$WRAP);
    
    is self documenting as compared to
    print_text(10, 10, 60, "message", 0);
    No one enjoys looking at a complicated expression full of numbers, trying to figure out what number means what.
  6. Think about reusability. Why waste brainpower on a one-shot when you might want to do something like it again? Keep functions short, having at most 7-15 lines of actual code (not counting comments.) This tends to increase code reuse.
  7. Localize all file handles by using IO::File or by using an anonymous glob:
  8. my $fh = do { local *FH; *FH; }; # runs under -w with no warnings
  9. Use Carp's croak function to report errors in parameters passed into functions and die for more serious errors when using those parameters during normal operation. Use Carp::Assert
  10. Don't use $SIG{__DIE__} = &error see why
  11. Always check the return codes of system calls. Good error messages should go to STDERR, include which program caused the problem, what the failed system call and arguments were, and (VERY IMPORTANT) should contain the standard system error message for what went wrong. Here's a simple but sufficient example:

                                # name of function   args  system error
    opendir($dh, $dir)     or die( (caller(0))[3] . ": $dir: $!");
    
  12. Just because you CAN do something a particular way doesn't mean that you SHOULD do it that way. Perl is designed to give you several ways to do anything, so consider picking the most readable one. For instance
  13. open($fh, $file_name) || die "$0: Can't open $file_name: $!";
    

    is better than

        die "Can't open $foo: $!" unless open(FOO,$foo);
    

    because the second way hides the main point of the statement in a modifier.

    But consider also

    use Fatal qw(open);
    open($fh, $file_name);
    

    On the other hand

        print "Starting analysis\n" if $verbose;
    

    is better than

        $verbose && print "Starting analysis\n";
    

    because the main point isn't whether the user typed -v or not.

    Similarly, just because an operator lets you assume default arguments doesn't mean that you have to make use of the defaults. The defaults are there for lazy systems programmers writing one-shot programs. If you want your program to be readable, consider supplying the argument.

  14. Don't go through silly contortions to exit a loop at the top or the bottom, when Perl provides the last operator so you can exit in the middle. Just ``outdent'' it a little to make it more visible:
  15.     LINE:
            for (;;) {
                statements;
              last LINE if $foo;
              next LINE if /^#/;
                statements;
            }
    
  16. Don't be afraid to use loop labels--they're there to enhance readability as well as to allow multilevel loop breaks. See the previous example.
  17. Avoid using grep() (or map()) or `backticks` in a void context, that is, when you just throw away their return values. Those functions all have return values, so use them. Otherwise use a foreach() loop or the system() function instead.
  18. For portability, when using features that may not be implemented on every machine, test the construct in an eval { } to see if it fails. If you know what version or patchlevel a particular feature was implemented, you can test $] ($PERL_VERSION in English) to see if it will be there. The Config module will also let you interrogate values determined by the Configure program when Perl was installed.
  19. If you have a really hairy regular expression, use the /x modifier and put in some whitespace to make it look a little less like line noise. Don't use slash as a delimiter when your regexp has slashes or backslashes.
  20. Use here documents instead of repeated print() statements.
  21. Where it makes sense, remove leading whitespace from your here documents so they don't break up the code so much. see example
  22. Consider generalizing your code. Consider writing a module or object class. Consider making your code run cleanly with use strict and use warnings in effect.

EXAMPLES


REFERENCES

  1. CPAN Perl Style Guide: http://www.perl.com/CPAN-local/doc/manual/html/pod/perlstyle.html