Installation
Guide for UNIX/LINUX Platform
1. STANDARDS
1.
The Webalizer
Home Directory should be:
/html/webalizer
2.
The Web Hit
Report Home Directory should be:
/html/webalizer/name_of_website
or
/location_of_document_root/report
3.
The Web Hit
Report URL is:
http://www.name_of_website.com/report
2. GUIDELINES
The
Webalizer installation procedures:
#cd /
#mkdir html
#cd /html
#mkdir webalizer
webalizer.gif from The Webalizer Home Directory to the Web Hit Report Home Directory.
#cd /html/webalizer or #cd /location_of_document_root
#mkdir name_of_website or #mkdir report
#cd name_of_website or #cd report
#cp /html/webalizer/msfree.gif .
#cp /html/webalizer/webalizer.gif .
NOTE: name_of_website (example: www.yahoo.com --- yahoo is the name_of_website).
location_of_document_root (example: /opt/apache-1.3.12/htdocs).
#cd /html/webalizer
#cp webalizer /usr/bin
#cp webalizer /usr/local/bin
#cd
/html/webalizer
#vi
webalizer.conf
Entries
you need to change in the configuration file:
OutputDir Type the Web Hit Report Home Directory.
/html/webalizer/name_of_website
or
/location_of_document_root/report
HostName Type the Web Site’s URL.
LogFile Type the Web Server’s log file path appended by the log filename.
/web_server_logfile_location/access_logfile_name
NOTE: For Multiple Web Sites it should be save as webalizer1.conf and repeat procedure
4 for each Web Site, change the OutputDir, HostName, and LogFile value for
each Web Site and save it as webalizer$.conf (where: $ represents the number – 1,
2, 3 ……).
5. Create a Virtual Directory or Additional Document Directory for the Web Hit Report URL
page.
5.1 For
Netscape Enterprise Server or iPlanet Web Server.
5.1.1 Access the Administration
Server page.
Using UNIX/LINUX Shell:
#cd /web_server_admin_directory
#./start-admin or #./start
*
At user prompt: Type the Administrator Name and Password.
* Click “Ok”.
Using a Web Browser:
* At the web browser’s “URL Address” box,
type the Administration URL.
i.e. - http://www.name_of_website.com:admin_port_number
* At user prompt: Type
the Administrator Name and Password.
* Click “Ok”.
5.1.2
Access the Web
Site Directory from the Administration Server page.
* Click “Servers” from the top
menu.
* At the “Select a Server”, select the Web Site Name.
* Click “Manage“.
5.1.3 Create Additional Document Directory.
* Click “Content Management” from the top menu bar.
* Click “Additional Document Directory” from the side menu bar.
* At the “URL prefix” box, type report.
* At the “Map to directory” box, type the Web Hit Report Home Directory.
* Click “Ok”.
* Click “Save and Apply”.
* Click “Ok”.
NOTE: If you are using the /location_of_document_root/report as the Web
Hit
Report Home Directory, no need to do procedure 5.1.3.
5.1.4 Add Document Preferences.
* By default the Netscape Enterprise Server and iPlanet Web Server
automatically displays the
default index file as index.html.
* Exit the Administration Server page.
5.1.5
Establish
Security (Not Recommended).
NOTE: This could be done only if the Web Server is connected to LDAP
Server.
* Access the Administration Server page (see
procedure 5.1.1).
* Access the Web Site Directory from the Administration Server page
(see
procedure 5.1.2).
* Click
“Preferences” from the top menu bar.
* Click
“Restrict Access” from the side menu bar.
* At the “A.
Pick a resource” select “Browse”.
* Click “Options”.
* At the
“List from” type:
/html/webalizer/name_of_website
or
/location_of_document_root/report
* Click
“Ok”.
* At the
“Current directory” ------ select “(choose entire directory)”.
* At the “A.
Pick a resource” select “Edit Access Control”.
* Check the
“Access control is on” check box.
* At the
first line, select “Deny”.
* At the
“Allow/Deny” menu, select “Allow”.
* Click
“Update”.
* At the
first line, select “anyone”.
* At the
“Users/Groups” menu, select “Only the following people”.
* At the
“Group” box, type the group name the user belongs to.
* At the “User” box, type the username.
* At the
“Prompt for authentication” box, type “Access Denied”.
* At the
“Authentication methods”, select “Basic”.
* At the
“Authentication database”, select “Default”.
* Click
“Update”.
* At the
first line, select “all”.
* At the
“Access rights” menu, select “Only the following rights”.
* Clear the
“Write” and “Delete” check boxes.
* Check the “Read”, “Execute”, “List”, and
“Info” check boxes.
* Click
“Submit”.
* Click
“Save and Apply”.
* Click “Ok”.
* Exit the
Administration Server page.
NOTE: For Multiple Web Sites, repeat procedure 5.1.2/5.1.3/5.1.4/5.1.5 for
each Web Site.
5.2 For Apache HTTP Server.
5.2.1 Establish Security (Not Recommended – remove the AuthUserFile line).
* At the UNIX/LINUX Shell, type the following command.
#cd /
#htpasswd –c /html/webalizer/name_of_website/.htaccess
username
or
#htpasswd
-c /location_of_document_root/report//.htaccess
* At the password prompt: type a password.
* At the password reentry prompt: type the password again.
5.2.2 Edit httpd.conf or access.conf from the /apache_http_server_directory/conf
directory.
* If you are using /html/webalizer/name_of_website as the Web Hit Report
Home Directory.
#cd /apache_http_server_directory/conf
#vi httpd.conf or #vi access.conf
Entries to be added:
Alias /report/ “/html/webalizer/name_of_website/”
<Directory “/html/webalizer/name_of_website”>
AllowOverride
None
AuthName
“Webalizer Report”
AuthUserFile
/html/webalizer/name_of_website/.htaccess
require
valid-user
</Directory>
* If you are using /location_of_document_root/report as the
Web Hit Report Home Directory, no need to create an Alias.
#cd /apache_http_server_directory/conf
#vi httpd.conf or #vi access.conf
Entries to be added:
<Directory “/location_of_document_root/report”>
AuthName
“Webalizer Report”
AuthType Basic
AuthUserFile /location_of_document_root/report/.htaccess
require
valid-user
</Directory>
NOTE: For Multiple Web Sites, repeat procedure 5.2.1/5.2.2 for each Web
Site.
5.2.3 Load the new configuration (Restart the Apache HTTP Server).
* To restart the Apache HTTP Server in UNIX.
#kill –HUP `cat /web_server_http_daemon_location/httpd.pid`
* To restart the Apache HTTP Server in LINUX.
#kill –SIGHUP `cat /web_server_http_daemon_location/httpd.pid`
NOTE: The above are default path for httpd.pid file created by Apache
HTTP Server during installation.
6.1 For UNIX/LINUX running Netscape Enterprise
Server, iPlanet Web Server, or Apache HTTP Server.
6.1.1 Create a new crontab file
or edit an existing crontab file for a user.
Type the following
command below at the UNIX/LINUX Shell to create or
edit a crontab file for the current user
logon.
#crontab –e username
Entries to be
added to the crontab file.
* * * * *
/html/webalizer/webalizer –c /html/webalizer/webalizer.conf
6.1.2 STAR representation.
First STAR
from left represents the Minute (0-59).
Second STAR
from left represents the Hour (0-23).
Third STAR
from left represents the Day of Month (1-31).
Fourth STAR
from left represents the Month (1-12).
Fifth STAR
from left represents the Day of Week (0-6) 0 represents Sunday.
A STAR in
place of any of the date/time fields means ALL.
NOTE: For Multiple Web Sites, add another line and replace webalizer.conf
with webalizer$.conf (where $ represents the number 1, 2, 3 ……).
6.1.3 Examples.
To run a
task every ten minutes every hour everyday.
0,10,20,30,40,50 * * * *
To run a
task every hour everyday.
0 * * * *
To run a task every twelve midnight everyday.
0 0 * * *
To run a task every two hours everyday.
0 0,2,4,6,8,10,12,14,16,18,20,22 * * *
To run a task every
twelve midnight every first day of the month.
0 0 1 * *
To run a task every
twelve midnight of the seventh day of February.
0 0 7 2 *
To run a task every
twelve midnight Monday to Friday.
0 0 * * 1-5
6.2
If you are
running a script to automate Web Log Rotation and saves the old log file to
other
filename and
directory.
* Change the value of LogFile in webalizer.conf in procedure 4 to point to the path and
filename of the old log file.
* Determine the time when the automatic Web
Log Rotation runs,
change the configuration in procedure 6.1.1 to run the automatic
execution of webalizer
five minutes after the automatic Web Log Rotation.
6.3
If you are using
Load Balancing for your Web Server (two or more server for one Web Server).
Example: In this configuration, the customer is using two servers to host its Web
Server.
* At the first server (FIRST_SERVER_NAME), create a work directory for the logfile
synchronization.
#cd /html
#mkdir logfile
#mkdir oldlog
* Create a file name rotate1a at the /html/webalizer directory.
#cd /html/webalizer
#vi rotate1a
Enter the following line;
# !/bin/sh
# move files to archive directory
# mv /web_server_logfile_location/error_logfile_name
/html/logfile
mv /web_server_logfile_location/access_logfile_name
/html/logfile
# restart web server (APACHE-UNIX)
kill –HUP `cat /web_server_http_daemon_location/httpd.pid`
# restart web server (APACHE-LINUX)
kill –SIGHUP `cat /web_server_http_daemon_location/httpd.pid`
NOTE: No need to restart the Netscape Enterprise Server and iPlanet Web Server.
* Create a file name rotate1b at the /html/webalizer directory.
#cd /html/webalizer
#vi rotate1b
Enter the following line;
# !/bin/sh
# define backup names
# OLD_ERROR=/html/oldlog/error.`date +%y%m%d-%H%M%S`
OLD_ACCESS=/html/oldlog/access.`date +%y%m%d-%H%M%S`
# move files to backup directory
# mv /html/logfile/error_logfile_name `echo
$OLD_ERROR`
mv /html/logfile/access_logfile_name `echo
$OLD_ACCESS`
# compress the backup files
# /bin/gzip $OLD_ACCESS
# /bin/gzip $OLD_ERROR
* Create a file name rotate1c at the /html/webalizer directory.
#cd /html/webalizer
#vi rotate1c
Enter the following line;
# !/bin/sh
# run the webalizer
webalizer -c /html/webalizer/webalizer.conf
# remove files from archive directory
# rm /html/logfile/error_logfile_name
rm /html/logfile/access_logfile_name
* Create a file name .netrc at the user home directory.
# cd /user_home_directory
#vi .netrc
Enter the following line;
machine SECOND_SERVER_NAME
login username
password password
macdef init
cd /html/logfile
lcd /html/logfile
append access_logfile_name access_logfile_name
bye
<blank>
NOTE: The <blank> means enter a blank line.
* Protect your .netrc file.
#chmod go-r .netrc
* Create a cron job.
#crontab –e username
Enter the following line;
0 0 * * * /html/webalizer/rotate1a
10 0 * * * ftp SECOND_SERVER_NAME
15 0 * * * /html/webalizer/rotate1b
25 0 * * * /html/webalizer/rotate1c
* At the second server (SECOND_SERVER_NAME), create a work directory for the logfile
synchronization.
#cd /html
#mkdir logfile
#mkdir oldlog
* Create a file name rotate2a at the /html/webalizer directory.
#cd /html/webalizer
#vi rotate2a
Enter the following line;
# !/bin/sh
# move files to archive directory
# mv /web_server_logfile_location/error_logfile_name
/html/logfile
mv /web_server_logfile_location/access_logfile_name /html/logfile
# restart web server (APACHE-UNIX)
kill –HUP `cat /web_server_http_daemon_location/httpd.pid`
# restart web server (APACHE-LINUX)
kill –SIGHUP `cat /web_server_http_daemon_location/httpd.pid`
NOTE: No need to restart the Netscape Enteprise Server and iPlanet Web Server.
* Create a file name rotate2b at the /html/webalizer directory.
#cd /html/webalizer
#vi rotate2b
Enter the following line;
# !/bin/sh
# define backup names
# OLD_ERROR=/html/oldlog/error.`date +%y%m%d-%H%M%S`
OLD_ACCESS=/html/oldlog/access.`date +%y%m%d-%H%M%S`
# copy files to backup directory
# cp /html/logfile/error_logfile_name `echo
$OLD_ERROR`
cp /html/logfile/access_logfile_name `echo
$OLD_ACCESS`
# compress the backup files
# /bin/gzip $OLD_ACCESS
# /bin/gzip $OLD_ERROR
* Create a file name rotate2c at the /html/webalizer directory.
#cd /html/webalizer
#vi rotate2c
Enter the following line;
# !/bin/sh
# run the webalizer
webalizer -c /html/webalizer/webalizer.conf
# remove files from archive directory
# rm /html/logfile/error_logfile_name
rm /html/logfile/access_logfile_name
* Create a file name .netrc at the user home directory.
#cd /user_home_directory
#vi .netrc
Enter the following line;
machine FIRST_SERVER_NAME
login username
password password
macdef init
cd /html/logfile
lcd /html/logfile
put access_logfile_name
bye
<blank>
NOTE: The <blank> means enter a blank line.
* Protect your .netrc file.
#chmod go-r .netrc
* Create a cron job.
#crontab –e username
Enter the
following line;
0 0 * * *
/html/webalizer/rotate2a
5 0 * * *
/html/webalizer/rotate2b
20 0 * * * ftp FIRST_SERVER_NAME
25 0 * * * /html/webalizer/rotate2c
* Log File Synchronization Process.
6.4
Network Time
Protocol Daemon (NTPD) configuration.
NOTE: Installing NTPD is important in synchronizing files for multiple
servers.
Example 1: For Local Network Time Synchronizations of NTP Servers and Clients without
connection to the Internet using higher stratum numbers for servers as reference time.
* Create a work directory for NTP installation.
#cd /
#mkdir ntpfiles
* Download the NTP package from the Internet for the appropriate machines.
SUN Solaris - http://www.sunfreeware.com/
Red Hat Linux - http://www.redhat.com/
NOTE: You can download all UNIX/LINUX platform from
http://www.eecis.udel.edu/~ntp
* Extract the file that you downloaded.
a. If you download a ZIP file (i.e ntp-4.0.72j-machine-platform.gz)
#cd /ntpfiles
#gunzip ntp-4.0.72j-machine-platform.gz
b. If you download a TAR file (i.e ntp4.0.72j-machine-platform.tar)
#cd /ntpfiles
#tar –xvf ntp-4.0.72j-machine-platform.tar
* Add the package.
#cd /ntpfiles
#pkgadd –d ntp-4.0.72j-machine-platform
* Edit the ntp.conf file(for version 4 it is usually located at /usr/local/doc/ntp/scripts/support/conf).
a. For Server.
#cd /ntp_configuration_file_location
#vi ntp.conf
Add the following lines:
server 127.127.1.0 # Local clock
fudge 127.127.1.0 stratum 13 # Not disciplined
b. For Client.
#cd /ntp_configuration_file_location
#vi ntp.conf
Add the following lines:
server server_ip_address
driftfile /ntp_configuration_file_location/ntp.drift
* On both Server and Client, run the NTP Daemon (for version 4 it is usually
located at /usr/local/bin), run it on NTP Server first
#cd /ntp_daemon_location
#./ntpd –c /ntp_configuration_file_location/ntp.conf
* To check the time synchronization of the NTP Server and clients, run the NTP
Querry program.
#ntpq –p
On NTP Server you should see the following line similar to this.
remote refid st t when poll reach delay offset disp
===============================================================================
*LOCAL (0) LOCAL (0) 13 1 47 64 377 0.00 0.000 0.94
On NTP Clients you should see the following line similar to this.
remote refid st t when poll reach delay offset disp
=======================================================================================
*REMOTE_NAME LOCAL (0) 14 u 601 1024 377 0.42 0.119 14.86
* On both Server and Client, create a script file so that the system will run the NTP
Daemon when it restarts. Make sure the file starts in capital ‘S’.
#cd /etc/rc2.d
#vi S168ntp
Add the following lines:
/ntp_daemon_location/ntpd –c /ntp_configuration_file_location/ntp.conf
Example 2: For Local Network Time Synchronizations of NTP Servers and Clients with
connection to the Internet using Stratum 1 and 2 NTP Servers as reference time.
* Create a work directory for NTP installation.
#cd /
#mkdir ntpfiles
* Download the NTP package from the Internet for the appropriate machines.
SUN Solaris - http://www.sunfreeware.com/
Red Hat Linux - http://www.redhat.com/
NOTE: You can download all UNIX/LINUX platform from
http://www.eecis.udel.edu/~ntp
* Extract the file that you downloaded.
a. If you download a ZIP file (i.e ntp-4.0.72j-machine-platform.gz)
#cd /ntpfiles
#gunzip ntp-4.0.72j-machine-platform.gz
b. If you download a TAR file (i.e ntp4.0.72j-machine-platform.tar)
#cd /ntpfiles
#tar –xvf ntp-4.0.72j-machine-platform.tar
* Add the package.
#cd /ntpfiles
#pkgadd –d ntp-4.0.72j-machine-platform
* Edit the ntp.conf file(for version 4 it is usually located at /usr/local/doc/ntp/scripts/support/conf).
a. For Server.
#cd /ntp_configuration_file_location
#vi ntp.conf
Add the following lines:
server 127.127.1.0 # Local clock
fudge 127.127.1.0 stratum 13 # Not disciplined
driftfile /ntp_configuration_file_location/ntp.drift
server ntp_server1 # Stratum 1 Internet NTP Server
server ntp_server2 # Stratum 2 Internet NTP Server
NOTE: To see the list of Stratum 1 and 2 Internet NTP Server:
http://www.ece.udel.edu/~mills/ntp/clock1.htm
http://www.ece.udel.edu/~mills/ntp/clock2.htm
b. For Client.
#cd /ntp_configuration_file_location
#vi ntp.conf
Add the following lines:
server server_ip_address
driftfile /ntp_configuration_file_location/ntp.drift
* On both Server and Client, run the NTP Daemon (for version 4 it is usually
located at /usr/local/bin), run it on NTP Server first
#cd /ntp_daemon_location
#./ntpd –c /ntp_configuration_file_location/ntp.conf
* To check the time synchronization of the NTP Server and clients, run the
NTP Querry.
#ntpq –p
On NTP Server you should see the following line similar to this.
remote refid st t when poll reach delay offset disp
================================================================================
*LOCAL (0) LOCAL (0) 13 1 47 64 377 0.00 0.000 0.94
*ntp_server1 .PPS. 1 u 45 64 377 1.306 -0.019 0.043
*ntp_server2 .PPS. 2 u 36 64 377 1.306 -0.019 0.043
On NTP Clients you should see the following line similar to this.
remote refid st t when poll reach delay offset disp
=======================================================================================
*REMOTE_NAME LOCAL (0) 14 u 601 1024 377 0.42 0.119 14.86
* On both Server and Client, create a script file so that the system will run the
NTP Daemon when it restarts. Make sure the file starts in capital ‘S’.
#cd /etc/rc2.d
#vi S168ntp
Add the following lines:
/ntp_daemon_location/ntpd –c /ntp_configuration_file_location/ntp.conf
Test the installation by running the scripts and browsing the results
using a web browser.
* Type the following line below at UNIX/LINUX Shell and press “Enter”
key.
#cd /
#webalizer –c /html/webalizer/webalizer.conf
* If you are using Load Balancing run the script in order from Log File
Synchronization
Process (see the diagram).
* At the web browser’s “URL Address” box, type the Web Hit Report URL and
press
“Enter” key.
3. RECORDS
The yearly (index) report shows statistics for a 12 Month period, and links to each month. The monthly report has detailed statistics for that month with additional links to any URL's and referrers found. The various totals shown are explained below.
Any request made to the server which is logged, is considered a 'hit'. The requests can be for anything... html pages, graphic images, audio files, cgi scripts, etc... Each valid line in the server log is counted as a hit. This number represents the total number of requests that were made to the server during the specified report period.
Some requests made to the server, require that the server then send something back to the requesting client, such as a html page or graphic image. When this happens, it is considered a 'file' and the files total is incremented. The relationship between 'hits' and 'files' can be thought of as 'incoming requests' and 'outgoing responses'.
Pages are, well, pages! Generally, any HTML document, or anything that generates an HTML document, would be considered a page. This does not include the other stuff that goes into a document, such as graphic images, audio clips, etc... This number represents the number of 'pages' requested only, and does not include the other 'stuff' that is in the page. What actually constitutes a 'page' can vary from server to server. The default action is to treat anything with the extension '.htm', '.html' or '.cgi' as a page. A lot of sites will probably define other extensions, such as '.phtml', '.php3' and '.pl' as pages as well. Some people consider this number as the number of 'pure' hits... I'm not sure if I totaly agree with that viewpoint. Some other programs (and people :) refer to this as 'Pageviews'.
Whenever a request is made to the server from a given IP address (site), the amount of time since a previous request by the address is calculated (if any). If the time difference is greater than a pre-configured 'visit timeout' value (or has never made a request before), it is
considered a 'new visit', and this total is incremented (both for the site, and the IP address). The default timeout value is 30 minutes (can be changed), so if a user visits your site at 1:00
in the afternoon, and then returns at 3:00, two visits would be registered. Note: in the 'Top Sites' table, the visits total should be discounted on 'Grouped' records, and thought of as the "Minimum number of visits" that came from that grouping instead. Note: Visits only occur on PageType requests, that is, for any request whose URL is one of the 'page' types defined with the PageType option. Due to the limitation of the HTTP protocol, log rotations and other factors, this number should not be taken as absolutely accurate, rather, it should be considered a pretty close "guess".
Each request made to the server comes from a unique 'site', which can be referenced by a name or ultimately, an IP address. The 'sites' number shows how many unique IP addresses made requests to the server during the reporting time period. This DOES NOT mean the
number of unique individual users (real people) that visited, which is impossible to determine using just logs and the HTTP protocol (however, this number might be about as close as you will get).
The KBytes (kilobytes) value shows the amount of data, in KB, which was sent out by the server during the specified reporting period. This value is generated directly from the log file, so it is up to the web server to produce accurate numbers in the logs (some web servers do stupid things when it comes to reporting the number of bytes). In general, this should be a fairly accurate representation of the amount of outgoing traffic the server had, regardless of the web servers reporting quirks. (Note: A kilobyte is 1024 bytes, not 1000 bytes)
The Top Entry and Exit Pages give rough estimates of what URL’s are used to enter your site, and what the last pages viewed are. Because of limitations in the HTTP protocol, log rotations, etc... This number should be considered a good "rough guess" of the actual
numbers, however will give a good indication of the overall trend in where users come into, and exit, your site.
The files produced,(default names) are:
index.html - Main summary page (extension may be changed).
usage.png - Yearly graph displayed on the main index page.
usage_YYYYMM.html - Monthly summary page (extension may be changed).
usage_YYYYMM.png - Monthly usage graph for specified month/year.
daily_usage_YYYYMM.png - Daily usage graph for specified month/year.
hourly_usage_YYYYMM.png - Hourly usage graph for specified month/year.
site_YYYYMM.html - All sites listing (if enabled).
url_YYYYMM.html - All url’s listing (if enabled).
ref_YYYYMM.html - All referrers listing (if enabled).
agent_YYYYMM.html - All user agents listing (if enabled).
search_YYYYMM.html - All search strings listing (if enabled).
webalizer.hist - Previous month history (may be changed).
webalizer.current - Incremental Data (may be changed).
site_YYYYMM.tab - tab delimited sites file.
url_YYYYMM.tab - tab delimited urls file.
ref_YYYYMM.tab - tab delimited referrers file.
agent_YYYYMM.tab - tab delimited user agents file.
user_YYYYMM.tab - tab delimited usernames file.
search_YYYYMM.tab - tab delimited search string file.
4. REFERENCES