Name: gmodstat
Version: 0.2.1, November 6th 2001
Author: gj_armitage@yahoo.com
Copyright (c) 2001, Grenville Armitage
1. Summary:
gmodstat supports post-analysis of QuakeIII Arena server
logfiles to extract things like playing time trends, histograms
of ping times as perceived by clients, the domains from which
different clients connect, and the percentage of clients who
may be playing from home and using NAT boxes.
Servers must be running the gmod1.0 server mod (or something
equivalent) to generate the additional logfile entries
required by gmodstat.
gmodstat was initially developed in a Win32 environment under
MS Visual C++, but most subsequent development has occurred
under FreeBSD using KDevelop 1.4. Installation instructions are
included in section 2 for compiling gmodstat under *nix-like
environments and Win32/MS Visual C++.
gmodstat is released under the GNU General Public License,
Version 2, 1991.
gmodstat compiles 'out of the box' under FreeBSD4.3
and Win32/MS Visual C++ environments. It requires no
special libraries, and should compile under other *nix
environments.
The rest of this file contains:
Section 2: Installation
Section 3: Intial startup
Section 4: The basic definition of a client
Section 5: IP address to domain name mappings
Section 6: NAT and Home User estimations
Section 7: Ping histograms
Section 8: Played time histograms and charts
Section 9: General game statistics
Section 10: Other stuff
Section 11: Conclusion
Appendix A: New logfile tokens
Appendix B: Release summary
2. Installation under *nix or Win32
2.1 Under FreeBSD/*nix environments
The current development environment for gmodstat is
FreeBSD4.3 with KDevelop 1.4, a free, X11-based C/C++
development tool. KDevelop automagically adds tools to
create an appropriate makefile, with which you
can generate a running executable. (I have not verified
whether gmodstat can or cannot be compiled under anything
other than FreeBSD4.3 or Win32, but I'd be interested in
hearing experiences.) The following installation steps
apply to *nix environments.
The basic distribution is a gzipped tarfile named
gmodstat-0.2.1.tar.gz, which creates a subdirectory
./gmodstat-0.2.1 when gunzipped/untar'ed. Once the
tarfile is unpacked, perform the following steps:
> cd ./gmodstat-0.2.1
> ./configure
> cd gmodstat
> make
"./configure" will spend a minute or so inspecting
your system, compiler settings, etc and generating
appropriate makefiles. Once this has completed successfully,
you move into the source subdirectory and run "make"
to actually compile gmodstat.
You can then either copy gmodstat to somewhere more
convenient in your path, or use "make install" to
automatically copy gmodstat into /usr/local/bin.
(An alternate installation location can be specified
during the configuration stage. If you wish to install
into //bin then execute "./configure --prefix=//"
instead of "./configure" before compiling. "make install"
will then copy gmodstat to //bin/gmodstat.)
Executing "make clean" in ./gmodstat-0.2.1/gmodstat will
subsequently remove all intermediate object files.
KDevelop 1.4's gmodstat.kdevprj file is also supplied,
in case it helps you do further development of gmodstat.
2.2 Under Win32/MS Visual C++
I've supplied sample Visual C++ 6.0 project/workspace files
./gmodstat-0.2.1/gmodstat.dsp and ./gmodstat-0.2.1/gmodstat.dsw
If you have Visual C++, you should be able to use WinZip
(or similar) to unpack/untar the gmodstat distribution,
then go into the ./gmodstat-0.2.1 folder and double-click
on gmodstat.dsw to start start Visual C++. Tell Visual C++
to "build" and a Win32 version of gmodstat should be built.
(At least, it worked for me on a Windows 2000 system.
No promises it'll work on every Win32 platform, although
I imagine it should.)
The gmodstat executable (in ./gmodstat-0.2.1/Debug) must be
run from a console window (or from within Visual C++).
Where I discovered differences between Visual C++ 6.0
in a Win32 environment and KDevelop 1.4/gcc in a FreeBSD4.3
environment, I've used conditional compilation directives.
The flag WIN32 should be set for Win32-compatible code, and
unset for FreeBSD4.3 (or equivalent) environments.
You will need to specifically link against ws2_32.lib (add
under Project->Settings->Linker if you're using
MS Visual C++) for Win32 (to bring in inet_ntoa()
functions).
3. Starting gmodstat
gmodstat is primarily controlled by options specified in a
configuration file. By default, the file is ./gmodstatconf.txt.
The file ./conf-example.txt contains brief commentary on
a range of configuration options.
Start gmodstat with:
gmodstat
(to use ./gmodstatconf.txt as config file)
or
gmodstat -c
(where is the filename of your specific configuration
file.)
A certain amount of run-time status information is printed to
stdout, with the main record of gmodstat's activities logged to
the file "logout.txt". A number of auxiliary output files may
also be generated, depending on the set of options specified in
the configuration file.
gmodstat can analyse a single QuakeIII server logfile, or
a series of logfiles representing a server that has been
running over a long period of time. (Ideally the server has
been stopped and restarted every six days, resulting in
the sequence of logfiles. After 6.9 days the server's
timestamping loses some accuracy, and in my experience the
server itself often gets flakey.)
The next section briefly defines what gmodstat considers
to be a 'client', section 5 discusses how to create an
initial ipaddress to domain name mapping file, section 6
covers the generation of home user and NAT penetration
analysis, while section 7 discusses how to generate ping
histograms. Section 8 mentions how to generate played
time histograms and cumulative played time plots.
4. Definition of a 'client'
QuakeIII players are uniquely identified by their playername (an
arbitrary ASCII string) and their IP address. However, because of
the wide deployment of dynamic address assignment techniques by
many ISPs, the same player may appear with many different (but
related) IP addresses over many different appearances. In order
to more accurately associate these instances as the same person,
gmodstat uses the following algorithm to determine a client:
Take the player's IP address, in dotted-quad form "w.x.y.z"
Resolved this address into a domain name,
Take the non-host part of and call it
A Client is defined by the tuple
For example, consider playername GUEST playing twice, with a
different IP address each time that resolved to random123.dsl.myisp.com
and otherpop.dsl.myisp.com - gmodstat would consider this to be the
same client, since the domain suffix of ".dsl.myisp.com" is common
in both cases.
The logic here is that ISPs often dynamically assign addresses from
related pools of IP addresses associated with the access points
through which customers connect. Commonality of the domain suffix
is a reasonable guesstimate that this represents the same human
player simply being assigned a different dynamic IP address.
My experience while developing gmodstat is that, while not perfectly
accurate, the above algorithm is far better than counting each
unique tuple as a distinct client. As a further
optimization, un-resolved IP addresses are mapped to 'fake' domain
suffixes inside gmodstat.
When you specify 'faked_suffix_range n' in the config file:
n = 4, fake suffixes are ".w-x-y-z.unresolved".
n = 3, fake suffixes are ".x-y-z.unresolved".
n = 2, fake suffixes are ".y-z.unresolved".
n = 1, fake suffixes are ".z.unresolved".
Where a client is being dynamically assigned IP addresses from a
common address pool, these faked suffixes increase the chances
we'll correctly recognize multiple unresolvable IP addresses as
representing the same client.
In order to avoid DNS lookups every time gmodstat is run, gmodstat
can create a local file of ipaddr->domainname mappings for later
re-use. This is discussed further in the next section.
The configuration file option 'clientnames' causes gmodstat to
dump the list of seen clients in descending order of total playing
time. This list is dumped to file ./clients-logout.txt
5. Creating an intial ipaddress to domain name mapping file
The first thing I recommend is generating a local copy of
all the ipaddr->domainname mappings relevant to your logfile(s).
Create a config file with, at minimum, the following entries:
sourcefile
resolve_missing
dump_dns_maps
(where is the logfile you are analysing)
Start gmodstat, and it will begin walking through the supplied
server logfile performing DNS lookups on every IP address
found. Once this is complete, it will dump all the discovered
ipaddr->domainname mappings to the file ./ipnames-logout.txt.
Note that this initial process may take many minutes, as not
all IP addresses have registered domain names. gmodstat
currently sits idle when a DNS lookups stalls waiting to
timeout.
Now, on all subsequent runs of gmodstat add the following to
the config file:
hosts_file
(where is a local copy of ./ipnames-logout.txt)
With the 'hosts_file' option, gmodstat pre-loads its internal
ipaddr->domainname mapping cache from the named file. This
then avoids the length process of performing DNS lookups each
time you re-run gmodstat on the same logfile(s).
You can have both 'hosts_file', 'resolve_missing' and
'dump_dns_maps' specified concurrently in the config
file - gmodstat will then use mappings from the local
file when available, lookup the DNS for any new IP addresses
it might find, and then dump the newly updated total list
of seen ipaddr->domainname mappings to ./ipnames-logout.txt
at the end of its run.
If you have multiple logfiles they can be handled in one
run by replacing 'sourcefile' with 'filelist nnn' where
nnn is a text file containing each logfile's name one
per line.
6. Analysing Home users and NAT penetration
gmodstat can be used to estimate the use of NAT (network
address translation) functionality across the Internet by
looking for evidence of NAT in the client traffic.
NAT is typically embedded in home routers and gateways, and
sometimes in gateway routers of small ISPs. The tell-tale
sign of NAT is where the UDP or TCP port numbers get modified
from 'expected' values to unusual values in transit. QuakeIII
uses well-known UDP port numbers - un-modified QuakeIII clients
almost invariably use UDP port 27960 as the source port in the
packets they send to the server. Detecting NAT is as simple as
detecting clients who connect from a source UDP port other
than 27960.
gmod1.0 causes a server to log each player's source IP address
and source UDP port number when they connect to the server.
This information is used by gmodstat to estimate the % of NAT
penetration in the player community.
To do a simple NAT estimation, build a configuration file like
this:
sourcefile
hosts_file
clients_NAT_range 64
range_increment 8
humanreadable
(where is the server log we're analysing and
is the cache of resolved ipaddr->domainname mappings created per
discussion in section 5.)
'clients_NAT_range 64' specifies that NAT estimation should be
performed for a range of sets of clients, where each set is made up
from the clients who played for more than N minutes, where
0 <= N <= 64. 'range_increment' says increase N in steps of 8.
The output is dumped to ./client_stats.txt, and will be in verbose
ASCII form because 'humandreadable' was set.
gmodstat also tries to calculate how many clients are playing
from "home" by counting how many clients' domain names fall under
domains believed to represent "home users". You can set this list
to be whatever you want, and select it with the 'homedomain_file'
config option. Finally, gmodstat can also calculate the number of
home users as a percentage of a specific subset of domains,
which you can specify with the 'valid_domains' config option.
See ./conf-example.txt for more details on these options.
7 Analysing client ping distributions
Server's running gmod1.0 or later will generate ping sample
histograms every few tens of seconds, reflecting hundreds or
thousands of server-estimated ping samples. (By default gmod1.0
logs a new histogram every 2000 client frames, sampling the
server's internal ping estimate each frame.)
The configuration file option 'clientnames' causes gmodstat to
dump the list of seen clients in descending order of total playing
time. This list is dumped to file ./clients-logout.txt
7.1 Specific Clients
To dump aggregate histograms of a specific client's ping samples,
use the following config file options:
sourcefile
hosts_file
single_client_phisto
This will cause every sampled histogram to be dumped in ASCII
format to disk for the client The
ASCII file is "SCPH-.txt" and will have a format
suitable for passing to xgraph, with each histo's timestamp as
each title line.
To create a series of per-game histograms, add the 'do_phisto_pergame'
config option. The xgraph title line for each histo will be the
game's starttime (in seconds since 1/1/1970).
To create an aggregate histogram over all games played by the specified
client, add the 'do_phisto_total' config option.
Use only one of 'do_phisto_pergame' or 'do_phisto_total' at a time.
As an alternative to specifying a particular client, you can have
histograms generated for every client who played for more than a
certain number of minutes (measured over all games) with:
topN_client_phistos nnn
Clients who played more than nnn minutes will have their ping histos
dumped to disk in individual files named "SCPH-.txt"
Although gmod1.0 uses buckets one millisecond wide, the aggregate
histos generated under 'do_phisto_pergame' and 'do_phisto_total'
modes can have larger buckets. Use:
ping_histo_range nnn zzz
to set the bucket width to nnn milliseconds, and a maximum ping
value of zzz milliseconds.
See ./conf-example.txt for more details on these options.
7.2 Overall client ping distributions
gmodstat can also create aggregate histograms of median ping times
seen by players in every game, sorted and filtered by source
domain (rather than time played or player name). Use:
graph_ping_histo
This generates a single histogram in "./PH0-pinghisto.txt" (with
bucket size set by 'ping_histo_range' as described earlier).
A cumulative distribution of the median ping times is stored
in "./CPH0-pinghisto.txt".
Add the following option to restrict the histogram to only
those clients who fall under certain domains:
graph_include_regions
(the allowed domains are specified by the 'homedomain_file'
configuration option.)
If 'graph_many' option is also specified, the histograms are
generated for each specified domain rather than for the union
of specified domains. In this case, the output files are
named "./PHxxxx-pinghisto.txt" where "xxxx" is a domain suffix.
The cumulative distribitions will be in "./CPHxxxx-pinghisto.txt".
See ./conf-example.txt for more details on these options.
8. Analysing played time
The configuration option 'graph_ptime_histo' will cause gmodstat
to create a histogram of played time versus hour of the week,
breaking the week up into 168 hours. The output will be dumped
to "./PT0-ptimehisto.txt". This can be useful in seeing playing
trends that have weekly cycles (although you really need to have
logfiles covering many weeks before this histogram starts to
show clear trends). Day 0 is Sunday local time, 0.999 is midnight
on Sunday, 1.999 is midnight Monday, etc.
If the config option 'hour_of_day' is also specified, the histogram
becomes a time-of-day histogram of total played time during any
given 30 minute period over a 24 hour period (where hour 0 to 0.99
is the first hour after midnight local time).
If 'graph_include_regions' is specified, only the playing time of
clients who fall under the specified domains will be counted. If
'graph_many' is specified, separate histograms will be created for
clients falling under each domain, with the outputs dumped to
"./PTxxxx-ptimehisto.txt" (where "xxxx" is a domain suffix).
Note that both graph_ptime_histo and graph_ping_histo are modified
by the same graph_include_regions and graph_many options.
gmodstat can also dump the cumulative played time, which can
reveal long term playing trends, popular days/weeks, or
server downtimes:
cumulative_gametime
The total played time across all games seen in the logfile(s)
is dumped to "cumulativetime.txt" as a list of XY pairs (X is
calendar time, Y is cumulative time in days). By default,
X is hours of the week (0 is 12am Sunday morning, 6.99 is midnight
Saturday night, etc). Adding the 'day_of_year' option causes
X axis to become day of the year (0 is Jan 1st).
9. General game statistics
gmodstat can also provide a summary of the games seen in the
logfile(s), the players present during each game, and the
kills/deaths of each player. Use the option 'gamestats' to
start dumping per-game information. Use the option 'playerstats'
to list each player's stats per game.
See ./conf-example.txt for more details on these options.
10. Other stuff
A variety of other configuration options are listed in
./conf-example.txt that haven't been covered in this README.
Specifying 'minimum_itemratio 1.0' is a good idea, so that
gmodstat ignores players who appeared in a game and didn't
manage to pick up more than one item per minute of played
time. Such players are basically idle, and don't deserve
to skew our ping and NAT estimations.
In addition, the "UnnamedPlayer" is QuakeIII's default
playername for clients who haven't properly configured their
client software. By default gmodstat ignores them.
Use 'include_unnamedplayer' config option to include
UnnamedPlayer statistics.
gmodstat assumes it can extract the start time of a given
logfile from the logfile itself (gmod1.0 adds a "BaseTime:"
token to logfiles it generates). However, the timestamp
is relative to the server's local time. Thus, when comparing
played time histograms, etc, from servers in different timezones
you need to inform gmodstat of an appropriate offset relative
to your local timezone.
base_time_offset nn
adjusts the logfile's own notion of its start time by nn hours
(forward if postive, backward if negative). For example, use
'base_time_offset -8' to adjust a UK-based server's timestamps
to Californian time.
11. Bugs, things TODO, Conclusions
Naturally, this README file is not complete. Indeed,
woefully inadequate in describing the output file formats
of the various configuration options described here.
The ultimate source of information is, of course, the
source code. Unfortunately gmodstat has developed organically
over the past year, so the code itself isn't always as clean
and logical as I'd like. It is also still evolving, so you'll
probably find routines and structures in there than have no
apparent current or future use. Hopefully things will be cleaner
in later releases. Enjoy!
Appendix A: New logfile tokens
gmodstat assumes there are a number of new tokens in the
QuakeIII server's logfiles, and one modified token. The
new ones are "ModVersion:", "BaseTime:", and "CPhisto2:".
The modified token is "ClientConnect:". These tokens
are supplied as part of the gmod1.0 (or later) server mod.
A.1 ClientConnect
ClientConnect is an existing token issued by the server when
a new client has been detected (is in the Connecting state)
but hasn't yet started playing. The new syntax is
ClientConnect
where is the small integer used by the server to
uniquely identify clients during a game, and
is one of:
"w.x.y.z:pp" Client is from IP addr w.x.y.z, UDP port pp
"seen" Client was seen in previous game, same ipaddr:port
"bot" This is a bot, no network identity
A.2 ModVersion
Should be the first entry in the logfile, appears only once per
logfile. The second parameter is a unique string identifying the
version of gmod (in this case "gja1.0" identifies gmod 1.0)
A.3 BaseTime
Should be the first or second entry in the logfile, appears only once
per logfile. The second parameter is a unique string identifying the
local time at which the server was started. Format of the string is
"ddmmyy-hhmm-0" to represent the date dd/mm/yy at time hhmm hours.
A.4 CPhisto2
This token is the primary method for collecting ping data. Each line
is of the form:
CPhisto2: ID Low Hi lowerrs hierrs tdelta
where:
ID clientID
Low the lowest bucket in this interval (ms)
Hi the highest bucket in this interval (ms)
lerr number of ping samples = 0ms (wierd but possible)
herr number of ping samples > 998ms (mostly 999ms)
tdelta number of milliseconds since last histogram
string the histogram, encoded in printable ASCII
gmod1.0 and 1.1 default to generating a new CPhisto2 line for each
client every 2000 packets from the client to the server. CPhisto2
lines are also generated at the end of each game for every client,
or when a client disconnects, if the client has sent at least 50
packets since the last CPhisto2 issued for that client.
gmod1.0 and 1.1 use slight different encodings for , but in
either case it is always less than 1024 characters long.
Under gmod1.0 is:
Repeated "XY" pairs of ASCII characters, or "+nn%" indicating
the previous bucket's value is repeated in the next nn
buckets (mostly used for suppressing adjacent buckets
with value of zero when there's bi/multi-modal distribution
of ping values).
The "XY" pairs use base64, with X being
the 64s column and Y being the 1s column. The ASCII encoding
adds 32 (code for " ") to the base64 digit. This way each
bucket can count up to 4095 using just two ASCII characters.
Under gmod1.1 is:
Repeated "XY" pairs of ASCII characters, or "znn%" indicating
the previous bucket's value is repeated in the next nn
buckets (mostly used for suppressing adjacent buckets
with value of zero when there's bi/multi-modal distribution
of ping values).
The "XY" pairs use base64, with X being
the 64s column and Y being the 1s column. The ASCII encoding
adds 33 (code for " ") to the base64 digit. This way each
bucket can count up to 4095 using just two ASCII characters.
The total number of samples represented by a CPhisto2 line
can be calculated simply by summing the values in every
histogram bucket. The total number of client frames that
were seen since the previous CPhisto2 line can be calculated
from the total samples in the histo + lerr + herr.
Given knowledge of the total number of frames since the
previous CPhisto2, and the time since the previous CPhisto2
(given by the tdelta field) you can calculate the average
client frame rate.
A.5 CPhistoErr
This is a variant on CPhisto2, and only occurs when the server
could not compressed under 1024 characters for some reason.
There's not much gmodstat can do about such entries, and they mean
that the ping samples of the last 2000 frames must have been fairly
evenly and widely spread out. Such lines are of the form:
CPhistoErr: ID Low Hi lowerrs hierrs tdelta histo-too-long
where the parameters are as for CPhisto2, and the text "histo-too-long"
replaces the compressed ASCII histogram.
Appendix B: Release Summary
Releases to date:
0.2.1
11/6/01
- Fixed malloc() bug in NAT estimation routines.
- Noted that cumulative_time Y-axis represents days
rather than hours
0.2
10/28/01
- Fixed bug in the 'graph_ping_histo' routine (median ping
values would be erroneously scaled by 1/N where N is the
ping histogram bucket size set by ping_histo_range).
- Clarified documentation for graph_ping_histo: Per-game median
pings only calculated for games wherein which the client
generated three or more "CPhisto2" log entries.
0.1
9/28/01 (First release)
gj_armitage@yahoo.com
               (
geocities.com/gj_armitage)