PHP Tutorial

The purpose of this tutorial is to give enough of a broad grounding in the use of PHP that you can understand examples, write your own simple scripts, and most importantly, learn to use the reference manual as a learning reference. A little familiarity with programming concepts is required, but I explain what variables are, which should be a simple enough level for anyone comfortable using a PC.

Those who know Perl already can skip great chunks of this tutorial - you'll be surprised how similar the two languages are. Read everything up to the Syntax section, then you can probably skip to the examples. It may be to your benefit to read the section on arrays though since they are very different from Perl arrays.

Updated: October 2002 for PHP 4.2.x (just in time for PHP 4.3 ;-)

What is it?
How does it compare to ASP? Perl?
Notes for Perl coders
Handy resources
- The PHP Manual (online/chm)
- DevShed
Syntax
- Where to put PHP code
- Statements
- Variables
- Constants
- Strings
- Operators
- Type casting
- Arrays
- Loops
- Conditionals
- Functions
- Returning values
Examples with walkthroughs
- Web page counter
- Browser-specific code
- Redirection
- Session management to persist user data securely
Database access
- Concepts: result pointers
- ODBC: DSNs (user/file/system)
Writing secure code

What is it?

PHP is an embedded web-based scripting language similar to ASP, though with a syntax that is closer to Perl, C and (to some extent) Java than VBScript, which is the commonly used language in ASP.

Embedded means that PHP code is put inside normal HTML to provide dynamic content. The advantages of this include:

Graphic designers can design the look and feel of a site without having to know how to code. Then coders can add the dynamic stuff without having to know much about good HTML design. Changing the look & feel of a site can be done without much change in the code.
Rapid application development -- the HTML of a site can be thrown out quickly to build a rapid prototype, and then the code can be added to this framework.
Placing PHP code on an HTML form allows you to use the same page for filling out the form and for correcting validation errors (like a missing phone number entry), so there's no need to look after two separate pages.

PHP's similarity to Perl makes it easy to learn; the differences in the syntax are mostly to tidy it up. Those familiar with Perl (but not too rabidly fond of it) will find PHP refreshingly tidy in comparison, while the more avid Perl fans will find that they can do almost anything in PHP that Perl could do - in fact, in some places it's a lot easier. String manipulation in PHP is a lot more simple than in Perl.

How does it compare to ASP? Perl?

PHP has the same context as ASP - that is, it's embedded inside HTML. Perl on the other hand has a shell context - it was designed for use on the command line, not inside a web page.
ASP's native implementation is Windows only, using the IIS server. Some UNIX implementations also exist, but their feature sets aren't as complete, and at the time of writing, none were free. Conversely, PHP is available for well over a dozen platforms and web servers; in fact, any server that can supports the CGI standard can run PHP, and several optimised server-specific PHP modules also exist and are supplied with the standard package.
Both PHP and Perl are modular - functionality can be added and removed by including modules. In Perl, modules must be loaded by the script or specified on the command line, but in PHP modules can be either dynamically loaded (in the script) or specified in the PHP.ini file to load automatically when PHP starts up.
In tests, PHP's performance came out marginally ahead of ASP, though to be honest, there's not much in it. Both are approximately 3-4 times faster than Cold Fusion, however.
PHP, like Perl, is open source. You can download the code, compile it, modify it, basically do whatever you want.

Notes for Perl coders

Skip this bit if you don't know Perl.

The major changes that most Perl coders find (in my experience at least) when using PHP are:

In PHP there is only one kind of array, and it's the hash (or named array), and you should treat every array like it's a hash (i.e., in a loop over an array, step through each key rather than going through the numbers on a list)
In PHP, variables, lists, and hashes are ALL signified by a $ at the front of the name. In Perl, depending on the context, either $, @, or % is required.
Filehandles are replaced by file pointers, and the file pointers are stored in a variable. This is different from Perl because in Perl, filehandles have their own namespace. So where in Perl one would do

open(MYFILE, ">newfile.txt") or die("Couldn't open file to write");

in PHP, this is

if (! $myfile = fopen("newfile.txt", "w")) die("Couldn't open file to write");
There is no regular expression operator in PHP - regular expression matching, substitution, etc. are all standard functions. This tidies up the syntax no end, in the author's opinion. However, on most PHP builds, PCRE is built in which supplies Perl-compatible regular expression functions for some seriously funky wizardry if you're that perverted a programmer that you feel you need to make your code illegible ;-)
In Perl, the default return value of a function (if one is not specifically given) is the result of the last statement in the function. In PHP, there is no default return value; this isn't in the PHP manual, however, so it may change in future. In all cases, it's wise to use a return value in a function.
Only one item can be returned from a function, though that one item can be a list. This is discussed later on.

Handy resources

The PHP Manual: Downloadable versions
Online annotated version
The compiled help format is in the new windows HTML help style, which is extremely easy to search, browse and use. It's utterly indispensable.
The online version of the manual has some advantages however; most importantly, the user comments. Here people have added notes with examples, caveats and other useful comments. If you're stuck with a new function, you could do a lot worse than checking the online manual out.

Syntax

Where to put PHP code.

When the PHP program runs through your script, it looks for either <?php or <? then runs everything from there until the next ?>

You can configure PHP to run code between <% and %> tags too (ASP) though on IIS this probably isn't the wisest of ideas.

You need to tell your web server to run PHP files through the PHP program, too. This depends on which web server you have, though the PHP package has good instructions for most of the popular types. You should put your PHP code in files with the extension .php and configure your webserver to run .php files through the PHP program, though you can also force PHP to handle all .html files. The advantage to setting up .php as an extension is that files with no PHP code won't go through the PHP program, so you'll get better server performance.

Statements

All statements end with a ;

Whitespace is ignored, so you can stack several commands on one line, as long as you end every statement with a ;

Single line comments begin with // and finish at the end of the line

Multiple line comments begin with /* and end with */

You can group operations in brackets ( ) to make them more readable to humans. In fact it's highly recommended anyway so you know exactly what's going to happen when you run the code.

Variables

In PHP as in most computer programming languages, data can be stored in named areas of memory called variables. Variables can contain numbers, text, even binary data, as well as arrays (lists). They're also used to store references to files and database query results.

Arrays are special, so I'll go into those separately, but for the most part using variables is simple. All variable names begin with a $ and are followed by one or more letters. You can use numbers, too, and underscores ( _ ), but not spaces. Variable names are case sensitive - beware! $FOO is not the same as $foo.

  $foo = 3;
  $myline = "Hello world";

Unlike some other languages, you don't have to declare your variables before you use them, though for security reasons, it's recommended that you declare some of them (for example, session variables). I'll cover this in greater detail later.

Constants

Constants are much like variables, except you can't change them. You can declare them at the beginning of a script and use them instead of any fixed number that you can't be sure will never change. PHP sets some constants for you, like TRUE, FALSE and NULL.

To create a constant, you use the define() function. Constants can be named just like variables, and they can store anything a variable can except an array.

  define("MY_CONSTANT", "Hello World");
  echo MY_CONSTANT; // prints "Hello World"

A lot of mathematical constants are also preset for you when PHP starts up, namely M_PI for pi. Read the "Mathematical functions" section of the PHP manual for the complete list.

Strings

Strings are text, pure and simple. There's two ways of storing strings, inside single quotes and double-quotes. Using "double-quotes" allows you to use special characters like \n for a newline. The backslash character marks the next character as special, though the only characters you can backslash in 'single-quoted' strings are \' (a single quote) and \\ (a backslash).

Some other special characters (for double-quoted strings)

\t	TAB
\n	newline
\r	carriage-return. In DOS text-files, a new line is `\r\n`, though in practice in Windows you can get away with UNIX format, which is just `\n`
\$	$
\'	'
\"	"

The most important difference between single and double-quoted strings is that double-quoted strings can contain variable names:

$foo = 'This doesn\'t work'; // Oh yes it does!
$foo = "Matt's second line also works";

$age = 18;
echo 'Bob is $age'; // prints "Bob is $age"
echo "Bob is $age"; // prints "Bob is 18"

Check out the manual for more string stuff -- this is generally enough to get along with in daily use.

Operators

There are some operators that work directly on variables to save you time and effort in programming. For example, ++ and -- work to add and remove 1 from contents of the variable given:

  $foo = 3;
  $foo = $foo + 1 // The 'old' way of doing this.
  $foo++; // $foo is now 5
  ++$foo; // $foo is now 6

The difference between ++$foo and $foo++ is the point at which the number is added. If you're using $foo in a statement more complicated than those above, you'll find that :

  $foo = 3;
  $bob = $foo++;  // $bob is 3, $foo is 4.
  $foo = ++$bob;  // $bob and $foo are both 4

In the first line, we set $foo to 3. The following line reads in plain English as "Set $bob to the value of $foo, then add one to $foo", whereas the last line reads "Add one to $bob and set $foo to the new value". In general, if you have to think about what's going to happen, then you should write your code in a more simple way -- think of the poor git who's going to replace you! You're excused if:

you're not coding for money
you won't be replaced
you remember why you do everything that you do and never have to check your own notes.

If you want to add/subtract more than one, use the += and -= operators. For strings you can use the .= operator.

   $foo = "Fred is ";
   $foo .= " Bob's friend"; // $foo now contains "Fred is  Bob's friend";

   $foo = 1;
   $bob = 3;
  
   $foo += 2;  // $foo is now 3
   $foo += $bob; // $foo is now 6

The usual operators

Use + to add, - to subtract, / to divide, * to multiply, and % to get the modulus (remainder). These make sense with numbers. They don't with words. Use . to join multiple items (like pieces of text):

   echo "You have " . $items . " items.";

Comparison operators

Use comparison operators in tests like if, while and for loops to test a condition. These operators will compare the items on either side and will return true or false. Again, most of these only work meaningfully with numbers.

   $a == $b    Returns true if the contents of $a match those of $b, false otherwise.
   $a < $b     True if $a is less than $b
   $a > $b     True if $a is greater than $b
   $a <= $b    True if $a is less than, or equal to $b
   $a >= $b    True if $a is greater than, or equal to $b

Logical operators

As above, these operators will return true or false depending on the outcome of the test it performs on the two sides of the operator. It takes each side to be a binary value, either true or false. In PHP, 0 and NULL are false, everything else (including negative numbers) is TRUE.

&&	AND	Returns true if both sides of the operator evaluate to true. However, if the left hand side isn't TRUE, PHP won't bother testing the right hand side.
&!	AND NOT	Returns true if the left side is TRUE and right side is FALSE.
\|\|	OR	Returns true if the left side or the right side is true. Note that PHP won't bother checking the right hand side if the left hand side is true.
!	NOT	This is one-sided only. ! $a will be TRUE if $a is FALSE.

Because PHP (Like C and Perl) "short-circuits" these operators when it can, you can use them as control structures:

($fp = fopen("filename.txt", 'r')) || die("Couldn't 
  open file for read");

PHP runs the first side, ($fp = fopen("filename.txt", 'r')). If the fopen() function call opens the file and the file pointer is stored in $fp, then this side of the line returns TRUE. || means "one side or the other (or both) must be true", PHP doesn't bother running the right hand side because it knows that one side is already true which is what the operator requires. Now, if the file-open function had failed, the left hand side would've returned FALSE so PHP would -=have=- to check the right hand side to see if that's true. The right hand side runs the die() function which quits PHP with an error message, in this case "Couldn't open file for read". In short, PHP quits if it can't open the file.

Type casting

Like Perl, variables are cast automatically into the right types (integer, floating point, string, etc.) though you can force them if you want. Just put the required type in front of the variable in brackets

  $mynum = 3.01;
  $intnum = (int) $mynum;

Check this out in the PHP manual if you want to know more about typecasting. In general it's not necessary since there are functions for dealing with (for example) number formatting.

Arrays

Arrays in PHP are pretty much different from any other implementation the author has seen in other languages. The concept of an array is simple - it's a list of items:

  $my_array = array( "bob", "fred", "barney", "thelma" );

Accessing the array is done as a whole ($my_array) or by array item. Items are numbered from 0 at the beginning, so $my_array[0] is "bob" and $my_array[3] is "thelma"

So far, this should be familiar to many coders. You don't have to declare arrays in PHP, or their length - the size of an array will adapt to meet the contents. Array items can be strings, numbers, even other arrays:

  $my_array[4] = array("billy-anne", "billy-bob", "billy-sue");

A useful function to learn here is the print_r() function to show the structure and contents of any variable. You can use it to help you visualise how this data is arranged. You can also run PHP from the command-line if you have the php-cli program in your PHP installation package (from version 4.2.x onward). To run a script from the command line, use:

  php-cli file.php

The stops the HTTP headers from showing on the command-line, which keeps your screen tidy.

Try saving the following lines to a file then running it through PHP.

<?
   $my_array = array( "bob", "fred", "barney", "thelma" );
   $my_array[4] = array("billy-anne", "billy-bob", "billy-sue");
    
   print_r($my_array);
   
>

Now, the powerful and unique feature of PHP arrays (and also a big stumbling block for many) is that PHP arrays aren't just numbered, they're named. For good coding, you shouldn't assume that your arrays are numbered, or even numbered in order! This means that a traditional for( $i = 0; $i < count($my_array); $i++) loop won't necessarily work.

Named arrays are assigned slightly differently, but not much - you can use this format for numbered arrays too:

   $this_user = array( "firstname" => "Bob",
                       "surname"   => "Smith",
                       "age"       => 24 );
   $this_user['firstname'] = "Robert";

Hopefully you can see from this example why named arrays are useful. Later on in the database section we show how you can retrieve rows from a database query and store each row as a named array. Identifying in your HTML code what $row[4] is can be annoying and time-consuming, because you have to go back and find the SQL statement and count to the 5th field... etc. But if you see $row["age"] you know exactly what part of the data is being shown.

(Un)fortunately it doesn't end there with arrays - you can treat them like a stack and push things onto, or pop them off of the top, like some programmers would with assembler or forth. PHP also lets you shift and unshift items to/from the bottom of the stack too. And because you can store arrays in arrays, you can simulate trees. If you're not a full programmer yet and this means nothing to you.. good! Less work for me :-)

Read the manual's function reference - it has a whole section of array functions that let you shuffle, reverse, sort, splice, count, merge, and filter arrays. You can subtract one array from another, or build an array of the unique values of another array.. in short, you can do a helluva lot with arrays. They're one of the most powerful features of PHP.

I'll cover some simple array functions (like looping over the contents) and ask you to play with the manual a little.

Control Structures

Loops

for

syntax: for ( expr1 ; expr2; expr3 ) { // code }

example: for ( $i = 0; $i < 10; $i++ ) { echo $i; }

// prints 0123456789

description: This is a C-like for-loop. It runs expr1 once, and once only at the start of the loop. On every iteration of the loop, PHP checks expr2 -- if it's TRUE, the contents of the { } block are run, otherwise it stops. At the end of every loop, expr3 is run.

foreach

syntax: foreach ($array as $array_item_value) { // code }

or: foreach ($array as $array_item_name => $array_item_value

) { // code }

description: PHP didn't have a foreach function until version 4 because of the way PHP arrays are different from normal arrays. There's a similar way of doing this in PHP 3 which I'll show later because it's what I'm used to doing -- if you have to maintain anyone else's code, you'll need to recognise it too. Basically what this loop does is step through every item in the array called $array, and puts the value of that item into $array_item_value. If you're looking at the second example, the name/number of that item is also stored in $array_item_name . Using this loop you can perform the same action on every item in an array (like printing it to the screen, or inserting it into a database, or building an HTML table)

The PHP3 alternative for going through each item in an array was:

   while (list($array_item_name,$array_item_value) = each($array)) {
      // code 
   }

You'll probably see a lot of that if you're maintaining someone else's code.

while

syntax: while ( expr ) { // code }

example: $i = 0; while ( $i < 10 ) { print $i; $i++; }

// prints 0123456789

description: This evaluates expr then runs the code.

If the expr is FALSE before the first loop starts, the loop doesn't run at all.

do..while

syntax: do { // code } while ( expr )

example $i = 0; do { print $i++; } while ( $i < 10 );

// prints 0123456789

description: Almost exactly the same as a while loop except the expression is tested at the end, so the loop is guaranteed to run at least once.

Conditionals

if

syntax: if ( expr ) { // code }

example: if ( 1 == 2 ) { echo "Mathematics is a LIE!"; }

description: Only runs the code if the expression is TRUE, else does nothing.

if else

syntax:

    if ( expr ) { 
      // code 
    } else {
      // code 
    }

example:

    if ( 10 < 20 ) {
      echo "10 is less than 20.. phew!";
    } else {
      echo "Ye cannae change the laws o' maths, Jim!";
    }

description: As with the if statement, if the expression in brackets is TRUE then the code in the first set of curly braces is run, but in this statement if the expression is false, then the code in the curly braces after the 'else' statement is run.

If you live in the same dimension as me, then the example above should never talk like Scotty.

if .. elseif .. else

syntax:

   if ( expr1 ) { 
     // code1
   } elseif ( expr2) {
     // code2
   } else {
     // code3
   }

example:

   if ( $a == 1 ) { 

     echo "\$a is 1!";

   } elseif ( $a == 2 ) {

     echo "\$a is 2!";

   } else {

     echo "\$a is something else!";
   }

description: As with the if-else statement above, PHP starts at the top and tests expr1. If it evaluates TRUE, then code1 is run, then PHP leaves the block. If expr1 is false, then expr2 is tested and so on. If no if/elseif expressions are true, then PHP runs the else block. You can have as many elseif blocks as you like, and the end else is optional.

switch

syntax:

   switch ( expr ) {

     case result1:
      // code
     break;

     case result2: 
      // code
     break;

     default:
      // code
   }

example:

   switch ( $i ) {

     case 1:
       echo "\$i is 1";
     break;

     case 2: 
       echo "\$i is 2";
     break;

     default:
       echo "Dunno what \$i is";
   }

description: The switch statement is quite close to the if-elseif-else statement; expr is evaluated, then PHP goes through each case until it finds a matching value, then runs all the code in the switch block until the end (including all cases below it); sometimes that's useful, the rest of the time, use a 'break' statement to leave the switch block when you've run the right case, like in the example above. If no cases match and there is a default case, PHP runs it. The default case is optional, but if you include it, it must be at the end of the statement.

With a switch, the expression has to be something simple like an integer, or a floating point number, or a string. Objects and arrays don't work well in switch-blocks.

Functions

Functions are extremely powerful, and should be familiar to anyone who knows another programming language. PHP has MANY functions built in, or available through modules you can load at run-time or in the PHP.ini .. and you can make your own. To call a function, you just type its name in your code. Functions have their own variable space (called a name-space) that lasts as long as the function is running, then it's shut down. You can import variables from outside a function by declaring them global within the function, though if you unset a variable inside a function, it won't be unset outside.

The simplest way of declaring a function is this:

   function my_function() {
     // code

   }

And to run it:

   my_function();

Put re-usable code inside functions to save yourself time and effort. You can store many useful functions together in one file and include() or require() them in other scripts so that you never have to repeat your code.

If you want your functions to return values, use the 'return' statement. You can only return one thing, but that one thing can be an array so this isn't much of a limitation (you've seen that arrays can store just about everything anyway).

The most useful functions take arguments - options, as it were - and return values. In PHP3, one had to name all the arguments that a function required, and supply defaults for the optional values. In PHP4, we can now use the simple function declaration above and check for arguments. This lets us be a lot more flexible. Still, sometimes the old ways are best -- if you declare the function arguments in the old style, it's very easy to see what is required to make a function run.

   function my_func($a, $b) {
   
     $output = $a * $b;
     return $output;
   
   }

Here we have created my_func to require two arguments. Then it multiplies them together and returns the result. $output never exists outside the function, by the way, and if you call my_func() again, it won't remember what it was last time. Functions that remember their state are called generators, and you see mention of them in languages like Python. Using global and static variables lets you simulate that kind of behaviour with PHP functions, but that's something to play with on a rainy day, right?

Now, to call my_func and catch the answer:

   $result = my_func(10, 20);

Or how about:

   echo my_func(10, 20);

Or even:

   if ( $result = my_func(10,20) ) {
     echo "My_Func returned a value of: $result";
   } else {
     echo "My_func returned zero or nothing.";
   }

Returning values

You've seen above that you can simply type

   return $output;

But if you want to return more than one thing, you can use the array() function to build an array to return:

  return array($item1, $item2, $item3);

or:

  return array("result" => $item1,
               "error" => 0 );

Capture your function return-values like above, then treat it like one. You could do:

   $output = complex_func(10, "bob", 28.73);

   if ($output['error']) {
     echo "There was an error!";
   } else {
     echo "Returned output: " . $output['result'];
   }

That's it! That's as much grounding on the basics as I can give. It should be enough for you to understand these examples -- except perhaps the database code, which is going to need some new concepts introduced. Work through, run them, play with them a bit.. you should find them useful, and they'll show you how to step through arrays, how to access files, etc. Read the comments.. there's more comment than code below, but it's there for a reason :-)

Examples with walkthroughs

Web page counter

A simple application and one you can put in any page. For example you could drop a single include() line in any PHP page you like to show a hit-counter for, or tell your web server to include this script at the bottom of every page (IIS and Apache at least can do this).

This is a text counter, not one with images.. I hate those!

<?
  /* Start of the script. If there's a counter file, read it and get the current
   * number of hits from it. Then we add one, display the number, and rewrite the 
   * file. */

  if ( is_file("counter.txt") ) {

    /* The file exists, so read it. The file() function reads a whole file
     * into memory. With a counter file, which is tiny, there's no problem 
     * with this. The file in memory is stored as an array with one line 
     * per array item, so array[0] is the first line, etc. We only need the 
     * first line */

    $file = file("counter.txt");

    $count = rtrim($file[0]); 

    /* So $count now contains the first line, minus any whitespace and/or 
     * newlines at the end. This means $count should be just a number. 
     * Lucky for us, PHP's clever enough to turn useless text into 0 as a 
     * number, so we can just add 1 to $count and save that, whatever was
     * in there before! */

    $count++;
  } else {
    // There was no counter file, so we'll start from scratch.
    $count = 1;
  }

  /* Now write the count back to a file, and display the number */

  /* First, open the file to write .. */

  if ($file_pointer = fopen("counter.txt", "w") ) {

     /* Write the counter to the file we just opened */     

     fwrite($file_pointer, $count);

     /* Close the file */
     fclose($file_pointer);

  } else {

     /* $file_pointer wasn't set, so fopen() failed, 
      * so we can't save the counter. Say so! */
 
     echo "Couldn't save counter!";

  }

  // Finally, show the counter.
  echo "This page has been visited $count times";
?>

Browser-specific code

Sometimes when you're writing HTML you'll find yourself in a situation where the code you want to use will only work on one browser (and we all know which browser that is). Not only won't it work on other browsers, but one specific browser (and we all know which that is) crashes and burns, or won't show the page at all. In this case you can detect the browser with PHP and only show the dodgy code on IE, or show something else on Netscape.

Whenever a browser asks a webserver for a page, it presents some information to the server (like which page it wants, and what browser it is.. ) and PHP turns this information into variables when it loads. Check out the $_SERVER['HTTP_USER_AGENT '] variable; it contains the name and version of the browser. The problematic browser, especially regarding Cascading Style-sheet bugs is Netscape 4, between 4.1 and 4.8 . We don't include 4.0 because that's what Internet explorer pretends to be.

So now, in your HTML code, or in your style-sheet if you keep them separate:

   <style type="text/css">
   <!--
   <?  // Browser check: Netscape 4.x can't deal with borders on CSS elements
   
     if (!eregi("^Mozilla/4.[1-8]", $_SERVER['HTTP_USER_AGENT'])) {
        $css_border = "border: 1px #FFFFFF solid;";
     } else {
        $css_border = "";
     }

   ?>
   SELECT  {
           background:  #002F54;
           color: #FFFFFF;
           font-family: Tahoma, Verdana, Arial, Helvetica, sans-serif;
           font-size: 10pt;
           <?=$css_border?>
   }
   --></STYLE>

It's the "border: 1px #FFFFFF solid;" that makes Netscape heave.. for whatever reason. So on Netscape 4.x, we don't print it (it doesn't work anyway, right?) The eregi() function tests a regular expression, which is a powerful pattern-matching algorithm. There's no way I have time to teach you those now, but there's another tutorial on my site that introduces you to perl regular expressions. They're very similar.

Redirection

Redirection is sending someone from one page to another. Or maybe reloading the current page. With PHP, as long as no data has been sent to the browser yet (any HTML, any echo commands, etc) you can send an additional HTTP header to the browser that tells it to go somewhere else:

<?
     if (!headers_sent()) {
       header("Location:  http://www.lazycat.org/tutorials.php");
       exit;
     }
?>

The exit statement is important, because you don't want to bother running the rest of the whole script again when no one's looking. The headers_sent() function returns TRUE if the headers have already been sent - if they have and you try to use the header() function, PHP throws a wobbly and prints errors to the screen and stuff. Very unprofessional.

Session management to persist user data securely

This is a trickier concept than most of the stuff we've covered above, so I'm going to go into some background first. The protocol that we use today on the world wide web is HTTP. This much you probably know. It's a state-less protocol, which you probably didn't. What this means is that when someone requests a page, the page is sent and the connection is closed. End of story to the webserver. But for you, the application writer, you want some way to identify a single visitor through their visit, because they're not just getting one page.. they're getting a dozen as they browse, and maybe they typed in a password on that first page and don't want to have to log in to every page as they go through your site.

Netscape saw that this was an issue and their answer was the "magic cookie". A magic cookie is a little piece of text that a server gives the browser with their page. The cookie is stored on the browser and it has certain instructions with it, like how long it's supposed to last, and which servers it should give the cookie to. Then whenever the browser asks for a new page, it gives the cookie to the server as part of the request. So by giving data in a cookie to someone, then the webserver (and the application) can maintain variables across connections.

Now the problem with cookies is that people can read them, and they can change them, and they can make them up completely because they're on the browser and bad people have browsers just like good people do. So what session management does is that it keeps all the data, all the variables on the server, where they're much safer than on some guy's hard drive, and links the data to browsers with a unique number, a number that's very hard to guess. So now when a browser asks for a page, and gives its cookie, which has a long number in it, PHP can load the data in the session file with that number, and retrieve all the variables saved in it.

Which means that if your visitor logs in, you can save a $_SESSION['logged_in'] variable in the session file and every time you load a page, you start the session and see if $_SESSION['logged_in'] is set. Which means you don't have to make him log in on every page, and you can be sure that the user didn't fake a login by changing the cookie file.

You need to start a session before any data is sent.. similarly, you need to register session variables before any data is sent. Data means HTML.. your page. So do this code right at the start, with no white space before the top of the page.

 <?
     session_start();
     $_SESSION['count']++;
   
     echo "You have loaded this page " . $_SESSION['count'] . " times!";
 ?>

The default lifetime of a PHP session is 0, which means it's deleted when the browser closes. However, you can browse to other sites and then come back to this one, and the session will still exist.

Look up the "Session Management Functions" in the PHP manual -- you can even write your own handler to save sessions, storing data in a database or in a different format.

Remember, the thing that catches nearly everyone out when writing session code is getting everything done before the headers are sent. If you want to be sure this never happens, and don't mind a little performance hit, you can switch on output buffering in PHP.ini which makes PHP wait until has finished drawing the entire page before sending it, so you can send headers anywhere in the script without worrying about errors.

Database access

PHP has native connections to about a dozen kinds of database, each with their own set of function for preparing, performing and interpreting functions, though you should make your life easier and learn the ODBC functions, or using the PEAR DB or PEAR MDB modules instead - then you have a single set of functions that work really well on any database and all you need is an ODBC driver for that database. PEAR is the PHP equivalent of Perl's CPAN - a central repository of code modules that can be downloaded and updated via a command-line tool. Check out the PEAR website to see about installing it - DB is part of the default install, and anyone who's had enough of PHP's database-specific quirks will find it refreshingly consistent. MDB is newer and so supports fewer databases, but has many more features. Using either of these modules requires you to know how to create and modify objects though, and that's for another tutorial. For now I'll show the "old method", which is the ODBC functions.

In PHP, to get data from a database, you create a connection to the database and tie it to a variable. Then every command you want to send to the database uses that link identifier (meaning you can connect to multiple database in the same script, see?)

For an ODBC connection, you have to create a system DSN using the ODBC administrator -- you'll find that in the Control Panel on windows (administrative tools on Win2k). Once you've created the DSN, then you can use the odbc_connect() or odbc_pconnect() functions. The difference between them is that the latter is persistent - if PHP is in memory for more than one script, it will keep pconnects open after the first script exits, and then if another script requests a connection using the same username/password and database, it'll recycle the connection. This can save a lot of time if you're contacting a remote database. On the CGI version of PHP though, ALL connections are closed when PHP quits at the end of the script so there's no difference.

ODBC Database connection:

   $link_id = odbc_connect("DSN", "username", "password");

   $link_id = odbc_pconnect("DSN", "username", "password");

or more safely, if your page NEEDS a database connection:

   if (! ($link_id = odbc_pconnect("DSN", "username", "password")) ) {
     die("Couldn't connect to database. Reload the page");
   }

This basically quits if a connection fails. You could go a step up and show some HTML that waits a few seconds and reloads the page.

Concepts: result pointers

I'm going to assume you know SQL since you wouldn't want to be connecting to a database if you didn't. There's SQL tutorials all over the place, and it's very simple (to start with, anyway :) NOTE: A lot of tutorials I've seen preach in mySQL, and tell you how to do stuff like create and alter databases in PHP, which is stunningly pointless -- the only things that should be pissing around with your DB design/structure are administration and design tools, or the mySQL interface. These functions are only useful for people writing one of the above; skip those bits.

When you have an open database connection, you need to run a query and catch the result pointer. PHP runs the query for you and creates a place in memory to store the result. You can use a whole bunch of functions to move around in, fetch and display and otherwise mess with the result by using this pointer, just like the database link identifier. The most useful functions are

odbc_exec()

Run a query

odbc_fetch_into()

Get a row from the query result, and store it in an array.

odbc_result_all()

Prints an HTML Table with the entire database result - really handy for testing, but not really pretty enough for production use.

Look up the others in the manual under "Unified ODBC Functions", there's some funky stuff in there.

This little example creates an HTML table based on the result of the query given. You can put any query you like in there instead - the code can display any number of rows and columns. It's a custom version of odbc_result_all(), in fact.

   if ($result = odbc_exec($link_id, "SELECT username, password FROM mytable ORDER BY username") ) {
     /* $result was set, so the query worked. Beware that the result might 
      * actually be empty - a working query doesn't have to return anything,
      * so don't assume anything =^.^=  */
   
     // Start an HTML table  
     echo "<TABLE border=1 cellspacing=1>\n";
     
     // Build a header row (with the field names)
     echo "<TR>";
     
     for ($field_num = 1; $field_num <= odbc_num_fields($result); $field_num++) {
    	echo "<TH bgcolor=silver>" . odbc_field_name($result, $field_num) . "</TH>";
     }
     echo "</TR>\n";
   
     /* Now show the data in the result.
      * odbc_fetch_into() builds an array with every field in the row,
      * and join turns an array into a string by joining every item in 
      * the array with a set string. The first and last items in an 
      * array, of course, don't have the joining string attached, which 
      * is why the line below echos "<TR><TD>" and "</TD></TR>" to start
      * and end the HTML table row. odbc_fetch_into() returns false when
      * there are no more rows to show, so it's good for a while loop, 
      * which stops when there are no more rows. */
     
     $row_num = 0; $row = array();
     while ( odbc_fetch_into($result, $row_num, $row) ) {
        echo "<TR><TD>" . join("</TD><TD>", $row) . "</TD></TR>\n";
        $row_num++;
     }
   
     // End the table
     echo "</TABLE>";
   
   } else {
     echo "Database error";
   }

Writing secure code before PHP 4.2.x

PHP is a honking fat security hole, apparently. Of course so's a webserver. Imagine something that runs as the system administrator, and lets ANYONE read files from your hard drive, and run scripts without them ever needing to log in. That's a bruiser, and PHP doesn't make much of a difference after that, unless you write bad code that people can mess around with.

This is something to do especially for session variables. Say you've got a variable called $logged_in and you save it in a session. Every time the page loads and you load the session, PHP restores the value of $logged_in and then you check to see if it's 1 or 0.

A strength, and a potential issue of PHP is that you can type stuff in a URL that gets turned into variables when PHP loads. This is how PHP handles GET forms (and POST forms, though it's a tiny bit harder to spoof those) - someone could do http://www.yoursite.net/index.php?logged_in=1&username=admin

And $logged_in would be set to 1, and $username would be 'admin'

If $logged_in hasn't be registered in the session yet, someone could do that and pretend they've logged into your site. This supposes that they know your variable names, of course.. but nothing's stopping them from guessing. To get around this, declare your variables at the start of your code before you load the session, then malicious users can't "pre-load" those variables.

   <?
     $logged_in = 0;
   
     session_start();
   
     if (!$logged_in) {
       echo "You aren't logged in, go away!";
     } else {
       include("secret.html");
     }
   ?>

Let's say that you have a form on your page, and you use the contents of that form to build an SQL query. Someone could quite easily save the form to their PC, mess around with it and then submit it, sending data that runs SQL of their choosing on the end of your query. It's easy enough to do, so you need to check for this kind of thing.

The easiest way around this is never to use data from a web form inside database queries, or file-open calls, or system calls. Sometimes you have to. Fortunately by default, PHP has something called 'magic quotes' enabled by default - it'll add backslashes to unsafe characters from web forms.

But really, I can't rub it in enough -- CHECK USER INPUT. Whatever comes from outside can potentially be faked, sometimes by error, sometimes maliciously. Don't trust anything.

Ways around malicious input you can't check too much:

If it's a file/folder-name, disallow the . character (ESPECIALLY at the start, and especially ../ ), and the / at the beginning of the string. This stops people going back one or more folders (i.e. ../../../../../../../etc/passwd :-)
If it's filename, don't let the user choose the full name - add the file extension yourself. Or use a naming rule where you can and check it with a regular expression
If your form allows file uploads, then use the is_uploaded_file() and/or move_uploaded_file() functions to handle them because they make sure that no one is spoofing the filename and stealing a file from your server.

If you think some of these are nasty/unlikely.. It can be done, it has been done, and it will be done again. Don't imagine for a second that you're immune!

Security changes in PHP 4.2.x

These methods have been around for ages, but weren't enforced. Up until PHP 4.2.x, the default behaviour of PHP on starting up was to turn all input from the GET, POST, COOKIE and SESSION environments into variables. So if you had a form input on a page called "myvar", then when PHP loads up the script for that request, it would set $myvar to the content of the input field. The problem with this is that sometimes you don't want people to be setting variables in your script - what if you forget to declare a variable like login-status and someone sets it for themselves, as described above? Many people who dislike PHP have this as their primary argument. Well in PHP 4.2.x the behaviour has changed.

Now, form, cookie and session variables are all stored in their own arrays, and not set as global variables when PHP starts up. In PHP 4.1.x and earlier you could access form inputs in the $HTTP_POST_VARS['var'], $HTTP_GET_VARS['var'], and $HTTP_SESSION_VARS['item'] arrays, it was pretty rare for people to do so. Now it's forced, but to help us beleagered programmers out, starting from PHP 4.1.x these variables also have short versions that are "superglobals" that you don't have to declare inside functions and you know exactly where your data is coming from : $_SESSION, $_FILES, $_COOKIES, $_SERVER, $_ENV, $_POST, and $_GET. Also, if you use the $_SESSION array, sessions are automatically started and new variables automatically registered, so you don't have to worry about session_register() and session_start().

If you're writing scripts that have to run on PHP 4.0.6 as well as 4.2.x, then you can still use the $HTTP_x_VARS variables, but performance enhancements, PEAR, and security fixes should combine to be a pretty good excuse to upgrade, and using the new superglobals ensures that you know where your data is coming from and that you can access it from anywhere in your scripts, even inside functions and objects. Don't let this give you a false sense of security though - always check and validate any data that comes from outside the script; files, form inputs, even environment variables. You can never be too safe.

Conclusion

I hope I've given enough of a theoretical grounding on PHP that you can (and have) written some short scripts. Now it's time to play with the manual - look at the functions there, put them into your scripts, play around. If all else fails, well you have the examples above which are generally quite useful. I'm working on an Object Oriented PHP tutorial for people who've found this tutorial easy enough and have gotten a little practice in, so keep an eye out for it if you thought this was easy. Object orientation in PHP is about to take a big leap forward in version 4.3.0 which uses the new Zend 2 engine; if you're familiar with other OO languages then you'll find the new PHP version brings in some of your favourite features to make it a better OO language than it ever has been before. If you don't know what I'm talking about... well, read the tutorial when it's done.

PHP Tutorial

Table of contents

What is it?

How does it compare to ASP? Perl?

Notes for Perl coders

Handy resources

Syntax

Where to put PHP code.

Statements

Variables

Constants

Strings

Operators

The usual operators

Comparison operators

Logical operators

Type casting

Arrays

Control Structures

Loops

for

foreach

while

do..while

Conditionals

if

if else

if .. elseif .. else

switch

Functions

Returning values

Examples with walkthroughs

Web page counter

Browser-specific code

Redirection

Session management to persist user data securely

Database access

ODBC Database connection:

Concepts: result pointers

Writing secure code before PHP 4.2.x

Security changes in PHP 4.2.x

Conclusion