Name: gmodstat
Version: 0.2.1, November 6th 2001
Author: gj_armitage@yahoo.com

Copyright (c) 2001, Grenville Armitage

1. Summary:

 gmodstat supports post-analysis of QuakeIII Arena server
 logfiles to extract things like playing time trends, histograms
 of ping times as perceived by clients, the domains from which
 different clients connect, and the percentage of clients who
 may be playing from home and using NAT boxes.

 Servers must be running the gmod1.0 server mod (or something
 equivalent) to generate the additional logfile entries
 required by gmodstat.

 gmodstat was initially developed in a Win32 environment under
 MS Visual C++, but most subsequent development has occurred
 under FreeBSD using KDevelop 1.4.  Installation instructions are
 included in section 2 for compiling gmodstat under *nix-like
 environments and Win32/MS Visual C++.

 gmodstat is released under the GNU General Public License,
 Version 2, 1991.

 gmodstat compiles 'out of the box' under FreeBSD4.3
 and Win32/MS Visual C++ environments. It requires no
 special libraries, and should compile under other *nix
 environments.

 The rest of this file contains:

     Section 2:  Installation
     Section 3:  Intial startup
     Section 4:  The basic definition of a client
     Section 5:  IP address to domain name mappings
     Section 6:  NAT and Home User estimations
     Section 7:  Ping histograms
     Section 8:  Played time histograms and charts
     Section 9:  General game statistics
     Section 10: Other stuff
     Section 11: Conclusion
     Appendix A: New logfile tokens
     Appendix B: Release summary


2. Installation under *nix or Win32

2.1 Under FreeBSD/*nix environments

 The current development environment for gmodstat is
 FreeBSD4.3 with KDevelop 1.4, a free, X11-based C/C++
 development tool. KDevelop automagically adds tools to
 create an appropriate makefile, with which you
 can generate a running executable. (I have not verified
 whether gmodstat can or cannot be compiled under anything
 other than FreeBSD4.3 or Win32, but I'd be interested in
 hearing experiences.)  The following installation steps
 apply to *nix environments.

 The basic distribution is a gzipped tarfile named
 gmodstat-0.2.1.tar.gz, which creates a subdirectory
 ./gmodstat-0.2.1 when gunzipped/untar'ed. Once the
 tarfile is unpacked, perform the following steps:

 > cd ./gmodstat-0.2.1
 > ./configure
 > cd gmodstat
 > make

 "./configure" will spend a minute or so inspecting
 your system, compiler settings, etc and generating
 appropriate makefiles. Once this has completed successfully,
 you move into the source subdirectory and run "make"
 to actually compile gmodstat.

 You can then either copy gmodstat to somewhere more
 convenient in your path, or use "make install" to
 automatically copy gmodstat into /usr/local/bin.
 (An alternate installation location can be specified
 during the configuration stage. If you wish to install
 into //bin then execute "./configure --prefix=//"
 instead of "./configure" before compiling. "make install"
 will then copy gmodstat to //bin/gmodstat.)

 Executing "make clean" in ./gmodstat-0.2.1/gmodstat will
 subsequently remove all intermediate object files.

 KDevelop 1.4's gmodstat.kdevprj file is also supplied,
 in case it helps you do further development of gmodstat.

2.2 Under Win32/MS Visual C++

 I've supplied sample Visual C++ 6.0 project/workspace files
 ./gmodstat-0.2.1/gmodstat.dsp and ./gmodstat-0.2.1/gmodstat.dsw

 If you have Visual C++, you should be able to use WinZip
 (or similar) to unpack/untar the gmodstat distribution,
 then go into the ./gmodstat-0.2.1 folder and double-click
 on gmodstat.dsw to start start Visual C++. Tell Visual C++
 to "build" and a Win32 version of gmodstat should be built.
 (At least, it worked for me on a Windows 2000 system.
 No promises it'll work on every Win32 platform, although
 I imagine it should.)

 The gmodstat executable (in ./gmodstat-0.2.1/Debug) must be
 run from a console window (or from within Visual C++).

 Where I discovered differences between Visual C++ 6.0
 in a Win32 environment and KDevelop 1.4/gcc in a FreeBSD4.3
 environment, I've used conditional compilation directives.
 The flag WIN32 should be set for Win32-compatible code, and
 unset for FreeBSD4.3 (or equivalent) environments.

 You will need to specifically link against ws2_32.lib (add
 under Project->Settings->Linker if you're using
 MS Visual C++) for Win32 (to bring in inet_ntoa()
 functions).


3. Starting gmodstat

 gmodstat is primarily controlled by options specified in a
 configuration file. By default, the file is ./gmodstatconf.txt.
 The file ./conf-example.txt contains brief commentary on
 a range of configuration options.

 Start gmodstat with:

   gmodstat

   (to use ./gmodstatconf.txt as config file)

 or

   gmodstat -c 

   (where  is the filename of your specific configuration
   file.)

 A certain amount of run-time status information is printed to
 stdout, with the main record of gmodstat's activities logged to
 the file "logout.txt". A number of auxiliary output files may
 also be generated, depending on the set of options specified in
 the configuration file.

 gmodstat can analyse a single QuakeIII server logfile, or
 a series of logfiles representing a server that has been
 running over a long period of time. (Ideally the server has
 been stopped and restarted every six days, resulting in
 the sequence of logfiles. After 6.9 days the server's
 timestamping loses some accuracy, and in my experience the
 server itself often gets flakey.)

 The next section briefly defines what gmodstat considers
 to be a 'client', section 5 discusses how to create an
 initial ipaddress to domain name mapping file, section 6
 covers the generation of home user and NAT penetration
 analysis, while section 7 discusses how to generate ping
 histograms. Section 8 mentions how to generate played
 time histograms and cumulative played time plots.


4. Definition of a 'client'

 QuakeIII players are uniquely identified by their playername (an
 arbitrary ASCII string) and their IP address. However, because of
 the wide deployment of dynamic address assignment techniques by
 many ISPs, the same player may appear with many different (but
 related) IP addresses over many different appearances. In order
 to more accurately associate these instances as the same person,
 gmodstat uses the following algorithm to determine a client:

   Take the player's IP address, in dotted-quad form "w.x.y.z"
   Resolved this address into a domain name, 
   Take the non-host part of  and call it 
   A Client is defined by the tuple 

 For example, consider playername GUEST playing twice, with a
 different IP address each time that resolved to random123.dsl.myisp.com
 and otherpop.dsl.myisp.com - gmodstat would consider this to be the
 same client, since the domain suffix of ".dsl.myisp.com" is common
 in both cases.

 The logic here is that ISPs often dynamically assign addresses from
 related pools of IP addresses associated with the access points
 through which customers connect. Commonality of the domain suffix
 is a reasonable guesstimate that this represents the same human
 player simply being assigned a different dynamic IP address.

 My experience while developing gmodstat is that, while not perfectly
 accurate, the above algorithm is far better than counting each
 unique  tuple as a distinct client. As a further
 optimization, un-resolved IP addresses are mapped to 'fake' domain
 suffixes inside gmodstat.

 When you specify 'faked_suffix_range n' in the config file:
     n = 4, fake suffixes are ".w-x-y-z.unresolved".
     n = 3, fake suffixes are ".x-y-z.unresolved".
     n = 2, fake suffixes are ".y-z.unresolved".
     n = 1, fake suffixes are ".z.unresolved".

 Where a client is being dynamically assigned IP addresses from a
 common address pool, these faked suffixes increase the chances
 we'll correctly recognize multiple unresolvable IP addresses as
 representing the same client.

 In order to avoid DNS lookups every time gmodstat is run, gmodstat
 can create a local file of ipaddr->domainname mappings for later
 re-use. This is discussed further in the next section.

 The configuration file option 'clientnames' causes gmodstat to
 dump the list of seen clients in descending order of total playing
 time. This list is dumped to file ./clients-logout.txt


5. Creating an intial ipaddress to domain name mapping file

 The first thing I recommend is generating a local copy of
 all the ipaddr->domainname mappings relevant to your logfile(s).
 Create a config file with, at minimum, the following entries:

    sourcefile 
    resolve_missing
    dump_dns_maps

 (where  is the logfile you are analysing)

 Start gmodstat, and it will begin walking through the supplied
 server logfile performing DNS lookups on every IP address
 found. Once this is complete, it will dump all the discovered
 ipaddr->domainname mappings to the file ./ipnames-logout.txt.

 Note that this initial process may take many minutes, as not
 all IP addresses have registered domain names. gmodstat
 currently sits idle when a DNS lookups stalls waiting to
 timeout.

 Now, on all subsequent runs of gmodstat add the following to
 the config file:

    hosts_file 

    (where  is a local copy of ./ipnames-logout.txt)

 With the 'hosts_file' option, gmodstat pre-loads its internal
 ipaddr->domainname mapping cache from the named file. This
 then avoids the length process of performing DNS lookups each
 time you re-run gmodstat on the same logfile(s).

 You can have both 'hosts_file', 'resolve_missing' and
 'dump_dns_maps' specified concurrently in the config
 file - gmodstat will then use mappings from the local
 file when available, lookup the DNS for any new IP addresses
 it might find, and then dump the newly updated total list
 of seen ipaddr->domainname mappings to ./ipnames-logout.txt
 at the end of its run.

 If you have multiple logfiles they can be handled in one
 run by replacing 'sourcefile' with 'filelist nnn' where
 nnn is a text file containing each logfile's name one
 per line.


6. Analysing Home users and NAT penetration

 gmodstat can be used to estimate the use of NAT (network
 address translation) functionality across the Internet by
 looking for evidence of NAT in the client traffic.

 NAT is typically embedded in home routers and gateways, and
 sometimes in gateway routers of small ISPs. The tell-tale
 sign of NAT is where the UDP or TCP port numbers get modified
 from 'expected' values to unusual values in transit. QuakeIII
 uses well-known UDP port numbers - un-modified QuakeIII clients
 almost invariably use UDP port 27960 as the source port in the
 packets they send to the server. Detecting NAT is as simple as
 detecting clients who connect from a source UDP port other
 than 27960.

 gmod1.0 causes a server to log each player's source IP address
 and source UDP port number when they connect to the server.
 This information is used by gmodstat to estimate the % of NAT
 penetration in the player community.

 To do a simple NAT estimation, build a configuration file like
 this:

    sourcefile 
    hosts_file 
    clients_NAT_range 64
    range_increment 8
    humanreadable

 (where  is the server log we're analysing and 
 is the cache of resolved ipaddr->domainname mappings created per
 discussion in section 5.)

 'clients_NAT_range 64' specifies that NAT estimation should be
 performed for a range of sets of clients, where each set is made up
 from the clients who played for more than N minutes, where
 0 <= N <= 64. 'range_increment' says increase N in steps of 8.

 The output is dumped to ./client_stats.txt, and will be in verbose
 ASCII form because 'humandreadable' was set.

 gmodstat also tries to calculate how many clients are playing
 from "home" by counting how many clients' domain names fall under
 domains believed to represent "home users". You can set this list
 to be whatever you want, and select it with the 'homedomain_file'
 config option. Finally, gmodstat can also calculate the number of
 home users as a percentage of a specific subset of domains,
 which you can specify with the 'valid_domains' config option.

 See ./conf-example.txt for more details on these options.


7  Analysing client ping distributions

 Server's running gmod1.0 or later will generate ping sample
 histograms every few tens of seconds, reflecting hundreds or
 thousands of server-estimated ping samples. (By default gmod1.0
 logs a new histogram every 2000 client frames, sampling the
 server's internal ping estimate each frame.)

 The configuration file option 'clientnames' causes gmodstat to
 dump the list of seen clients in descending order of total playing
 time. This list is dumped to file ./clients-logout.txt

7.1 Specific Clients

 To dump aggregate histograms of a specific client's ping samples,
 use the following config file options:

    sourcefile 
    hosts_file 
    single_client_phisto    
	
 This will cause every sampled histogram to be dumped in ASCII
 format to disk for the client   The
 ASCII file is "SCPH-.txt" and will have a format
 suitable for passing to xgraph, with each histo's timestamp as
 each title line.

 To create a series of per-game histograms, add the 'do_phisto_pergame'
 config option. The xgraph title line for each histo will be the
 game's starttime (in seconds since 1/1/1970).

 To create an aggregate histogram over all games played by the specified
 client, add the 'do_phisto_total' config option.

 Use only one of 'do_phisto_pergame' or 'do_phisto_total' at a time.

 As an alternative to specifying a particular client, you can have
 histograms generated for every client who played for more than a
 certain number of minutes (measured over all games) with:

    topN_client_phistos nnn

 Clients who played more than nnn minutes will have their ping histos
 dumped to disk in individual files named "SCPH-.txt"

 Although gmod1.0 uses buckets one millisecond wide, the aggregate
 histos generated under 'do_phisto_pergame' and 'do_phisto_total'
 modes can have larger buckets. Use:

    ping_histo_range nnn zzz

 to set the bucket width to nnn milliseconds, and a maximum ping
 value of zzz milliseconds.

 See ./conf-example.txt for more details on these options.


7.2 Overall client ping distributions

 gmodstat can also create aggregate histograms of median ping times
 seen by players in every game, sorted and filtered by source
 domain (rather than time played or player name). Use:

    graph_ping_histo

 This generates a single histogram in "./PH0-pinghisto.txt" (with
 bucket size set by 'ping_histo_range' as described earlier).
 A cumulative distribution of the median ping times is stored
 in "./CPH0-pinghisto.txt".

 Add the following option to restrict the histogram to only
 those clients who fall under certain domains:

    graph_include_regions

 (the allowed domains are specified by the 'homedomain_file'
 configuration option.)

 If 'graph_many' option is also specified, the histograms are
 generated for each specified domain rather than for the union
 of specified domains. In this case, the output files are
 named "./PHxxxx-pinghisto.txt" where "xxxx" is a domain suffix.
 The cumulative distribitions will be in "./CPHxxxx-pinghisto.txt".

 See ./conf-example.txt for more details on these options.


8. Analysing played time

 The configuration option 'graph_ptime_histo' will cause gmodstat
 to create a histogram of played time versus hour of the week,
 breaking the week up into 168 hours. The output will be dumped
 to "./PT0-ptimehisto.txt". This can be useful in seeing playing
 trends that have weekly cycles (although you really need to have
 logfiles covering many weeks before this histogram starts to
 show clear trends). Day 0 is Sunday local time, 0.999 is midnight
 on Sunday, 1.999 is midnight Monday, etc.

 If the config option 'hour_of_day' is also specified, the histogram
 becomes a time-of-day histogram of total played time during any
 given 30 minute period over a 24 hour period (where hour 0 to 0.99
 is the first hour after midnight local time).

 If 'graph_include_regions' is specified, only the playing time of
 clients who fall under the specified domains will be counted. If
 'graph_many' is specified, separate histograms will be created for
 clients falling under each domain, with the outputs dumped to
 "./PTxxxx-ptimehisto.txt" (where "xxxx" is a domain suffix).

 Note that both graph_ptime_histo and graph_ping_histo are modified
 by the same graph_include_regions and graph_many options.

 gmodstat can also dump the cumulative played time, which can
 reveal long term playing trends, popular days/weeks, or
 server downtimes:

    cumulative_gametime

 The total played time across all games seen in the logfile(s)
 is dumped to "cumulativetime.txt" as a list of XY pairs (X is
 calendar time, Y is cumulative time in days). By default,
 X is hours of the week (0 is 12am Sunday morning, 6.99 is midnight
 Saturday night, etc). Adding the 'day_of_year' option causes
 X axis to become day of the year (0 is Jan 1st).


9. General game statistics

 gmodstat can also provide a summary of the games seen in the
 logfile(s), the players present during each game, and the
 kills/deaths of each player. Use the option 'gamestats' to
 start dumping per-game information. Use the option 'playerstats'
 to list each player's stats per game.

 See ./conf-example.txt for more details on these options.


10. Other stuff

 A variety of other configuration options are listed in
 ./conf-example.txt that haven't been covered in this README.

 Specifying 'minimum_itemratio 1.0' is a good idea, so that
 gmodstat ignores players who appeared in a game and didn't
 manage to pick up more than one item per minute of played
 time. Such players are basically idle, and don't deserve
 to skew our ping and NAT estimations.

 In addition, the "UnnamedPlayer" is QuakeIII's default
 playername for clients who haven't properly configured their
 client software. By default gmodstat ignores them.
 Use 'include_unnamedplayer' config option to include
 UnnamedPlayer statistics.

 gmodstat assumes it can extract the start time of a given
 logfile from the logfile itself (gmod1.0 adds a "BaseTime:"
 token to logfiles it generates). However, the timestamp
 is relative to the server's local time. Thus, when comparing
 played time histograms, etc, from servers in different timezones
 you need to inform gmodstat of an appropriate offset relative
 to your local timezone.

    base_time_offset nn

 adjusts the logfile's own notion of its start time by nn hours
 (forward if postive, backward if negative). For example, use
 'base_time_offset -8' to adjust a UK-based server's timestamps
 to Californian time.


11. Bugs, things TODO, Conclusions

 Naturally, this README file is not complete. Indeed,
 woefully inadequate in describing the output file formats
 of the various configuration options described here.

 The ultimate source of information is, of course, the
 source code. Unfortunately gmodstat has developed organically
 over the past year, so the code itself isn't always as clean
 and logical as I'd like. It is also still evolving, so you'll
 probably find routines and structures in there than have no
 apparent current or future use. Hopefully things will be cleaner
 in later releases. Enjoy!


Appendix A: New logfile tokens

 gmodstat assumes there are a number of new tokens in the
 QuakeIII server's logfiles, and one modified token. The
 new ones are "ModVersion:", "BaseTime:", and "CPhisto2:".
 The modified token is "ClientConnect:". These tokens
 are supplied as part of the gmod1.0 (or later) server mod.

A.1 ClientConnect

 ClientConnect is an existing token issued by the server when
 a new client has been detected (is in the Connecting state)
 but hasn't yet started playing. The new syntax is

   ClientConnect  

 where  is the small integer used by the server to
 uniquely identify clients during a game, and 
 is one of:

    "w.x.y.z:pp"   Client is from IP addr w.x.y.z, UDP port pp
    "seen"         Client was seen in previous game, same ipaddr:port
    "bot"          This is a bot, no network identity

A.2 ModVersion

 Should be the first entry in the logfile, appears only once per
 logfile. The second parameter is a unique string identifying the
 version of gmod (in this case "gja1.0" identifies gmod 1.0)

A.3 BaseTime

 Should be the first or second entry in the logfile, appears only once
 per logfile. The second parameter is a unique string identifying the
 local time at which the server was started. Format of the string is
 "ddmmyy-hhmm-0" to represent the date dd/mm/yy at time hhmm hours.

A.4 CPhisto2

 This token is the primary method for collecting ping data. Each line
 is of the form:

    CPhisto2: ID Low Hi lowerrs hierrs tdelta 

 where:

    ID       clientID
    Low      the lowest bucket in this interval (ms)
    Hi       the highest bucket in this interval (ms)
    lerr     number of ping samples = 0ms (wierd but possible)
    herr     number of ping samples > 998ms (mostly 999ms)
    tdelta   number of milliseconds since last histogram
    string   the histogram, encoded in printable ASCII

 gmod1.0 and 1.1 default to generating a new CPhisto2 line for each
 client every 2000 packets from the client to the server. CPhisto2
 lines are also generated at the end of each game for every client,
 or when a client disconnects, if the client has sent at least 50
 packets since the last CPhisto2 issued for that client.

 gmod1.0 and 1.1 use slight different encodings for , but in
 either case it is always less than 1024 characters long.

 Under gmod1.0  is:

   Repeated "XY" pairs of ASCII characters, or "+nn%" indicating
   the previous bucket's value is repeated in the next nn
   buckets (mostly used for suppressing adjacent buckets
   with value of zero when there's bi/multi-modal distribution
   of ping values).

   The "XY" pairs use base64, with X being
   the 64s column and Y being the 1s column. The ASCII encoding
   adds 32 (code for " ") to the base64 digit. This way each
   bucket can count up to 4095 using just two ASCII characters.

 Under gmod1.1  is:

   Repeated "XY" pairs of ASCII characters, or "znn%" indicating
   the previous bucket's value is repeated in the next nn
   buckets (mostly used for suppressing adjacent buckets
   with value of zero when there's bi/multi-modal distribution
   of ping values).

   The "XY" pairs use base64, with X being
   the 64s column and Y being the 1s column. The ASCII encoding
   adds 33 (code for " ") to the base64 digit. This way each
   bucket can count up to 4095 using just two ASCII characters.

 The total number of samples represented by a CPhisto2 line
 can be calculated simply by summing the values in every
 histogram bucket. The total number of client frames that
 were seen since the previous CPhisto2 line can be calculated
 from the total samples in the histo + lerr + herr.

 Given knowledge of the total number of frames since the
 previous CPhisto2, and the time since the previous CPhisto2
 (given by the tdelta field) you can calculate the average
 client frame rate.


A.5 CPhistoErr

 This is a variant on CPhisto2, and only occurs when the server
 could not compressed  under 1024 characters for some reason.
 There's not much gmodstat can do about such entries, and they mean
 that the ping samples of the last 2000 frames must have been fairly
 evenly and widely spread out. Such lines are of the form:

    CPhistoErr: ID Low Hi lowerrs hierrs tdelta histo-too-long

 where the parameters are as for CPhisto2, and the text "histo-too-long"
 replaces the compressed ASCII histogram.


Appendix B: Release Summary

 Releases to date:

  0.2.1
  11/6/01
  	- Fixed malloc() bug in NAT estimation routines.
  	- Noted that cumulative_time Y-axis represents days
  	  rather than hours
  	
  0.2
  10/28/01
	- Fixed bug in the 'graph_ping_histo' routine (median ping
	  values would be erroneously scaled by 1/N where N is the
	  ping histogram bucket size set by ping_histo_range).
	- Clarified documentation for graph_ping_histo: Per-game median
	  pings only calculated for games wherein which the client
	  generated three or more "CPhisto2" log entries.

   0.1
   9/28/01 (First release)

gj_armitage@yahoo.com

















    Source: geocities.com/gj_armitage/q3

               ( geocities.com/gj_armitage)