PGN Standard: Section 20 - 21


20: Binary representation (PGC)

*** This section is under development.

The binary coded version of PGN is PGC (PGN Game Coding). PGC is a binary representation standard of PGN data designed for the dual goals of storage efficiency and program I/O. A file containing PGC data should have a name with a suffix of ".pgc".

Unlike PGN text files that may have locale dependent representations for newlines, PGC files have data that does not vary due to local processing environment. This means that PGC files may be transferred among systems using general binary file methods.

PGC files should be used only when the use of PGN is impractical due to time and space resource constraints. As the general level of processing capabilities increases, the need for PGC over PGN will decrease. Therefore, implementors are encouraged not to use PGC as the default representation because it is much more difficult (than PGN) to understand without proper software.

PGC data is composed of a sequence of PGC records. Each record is composed of a sequence of one or more bytes. The first byte is the PGN record marker and it specifies the interpretation of the remaining portion of the record. This remaining portion is composed of zero or more PGN record items. Item types include move sequences, move sets, and character strings.

20.1: Bytes, words, and doublewords

At the lowest level, PGC binary data is organized as bytes, words (two contiguous bytes), and doublewords (four contiguous bytes). All eight bits of a byte are used. Longwords (eight contiguous bytes) are not used. Integer values are stored using two's complement representation. Integers may be signed or unsigned depending on context. Multibyte integers are stored in low-endian format with the least significant byte appearing first.

A one byte integer item is called "int-1". A two byte integer item is called "int-2". A four byte integer item is called "int-4".

Characters are stored as bytes using the ISO 8859/1 Latin-1 (ECMA-94) code set. There is no provision for other characters sets or representations.

20.2: Move ordinals

A chess move is represented using a move ordinal. This is a single unsigned byte quantity with values from zero to 255. A move ordinal is interpreted as an index into the list of legal moves from the current position. This list is constructed by generating the legal moves from the current position, assigning SAN ASCII strings to each move, and then sorting these strings in ascending order. Note that a seven bit ordinal, as used by some inferior representation systems, is insufficient as there are some positions that have more than 128 moves available.

Examples: From the initial position, there are twenty moves. Move ordinal 0 corresponds to the SAN move string "Na3"; move ordinal 1 corresponds to "Nc3", move ordinal 4 corresponds to "a3", and move ordinal 19 corresponds to "h4".

Moves can be organized into sequences and sets. A move sequence is an ordered list of moves that are played, one after another from first to last. A move set is a list of moves that are all playable from the current position.

Move sequence data is represented using a length header followed by move ordinal data. The length header is an unsigned integer that may be a byte or a word. The integer gives the number, possibly zero, of following move ordinal bytes. Most move sequences can be represented using just a byte header; these are called "mvseq-1" items. Move sequence data using a word header are called "mvseq-2" items.

Move set data is represented using a length header followed by move ordinal data. The length header is an unsigned integer that is a byte. The integer gives the number, possibly zero, of following move ordinal bytes. All move sets are be represented using just a byte header; these are called "mvset-1" items. (Note the implied restriction that a move set can only have a maximum of 255 of the possible 256 ordinals present at one time.)

20.3: String data

PGC string data is represented using a length header followed by bytes of character data. The length header is an unsigned integer that may be a byte, a word, or a doubleword. The integer gives the number, possibly zero, of following character bytes. Most strings can be represented using just a byte header; these are called "string-1" items. String data using a word header are called "string-2" items and string data using a doubleword header are called "string-4" items. No special ASCII NUL termination byte is required for PGC storage of a string as the length is explicitly given in the item header.

20.4: Marker codes

PGC marker codes are given in hexadecimal format. PGC marker code zero (marker 0x00) is the "noop" marker and carries no meaning. Each additional marker code defined appears in its own subsection below.

20.4.1: Marker 0x01: reduced export format single game

Marker 0x01 is used to indicate a single complete game in reduced export format. This refers to a game that has only the Seven Tag Roster data, played moves, and no annotations or comments. This record type is used as an alternative to the general game data begin/end record pairs described below. The general marker pair (0x05/0x06) is used to help represent game data that can't be adequately represented in reduced export format. There are eight items that follow marker 0x01 to form the "reduced export format single game" record. In order, these are:

1) string-1 (Event tag value)

2) string-1 (Site tag value)

3) string-1 (Date tag value)

4) string-1 (Round tag value)

5) string-1 (White tag value)

6) string-1 (Black tag value)

7) string-1 (Result tag value)

8) mvseq-2 (played moves)

20.4.2: Marker 0x02: tag pair

Marker 0x02 is used to indicate a single tag pair. There are two items that follow marker 0x02 to form the "tag pair" record; in order these are:

1) string-1 (tag pair name)

2) string-1 (tag pair value)

20.4.3: Marker 0x03: short move sequence

Marker 0x03 is used to indicate a short move sequence. There is one item that follows marker 0x03 to form the "short move sequence" record; this is:

1) mvseq-1 (played moves)

20.4.4: Marker 0x04: long move sequence

Marker 0x04 is used to indicate a long move sequence. There is one item that follows marker 0x04 to form the "long move sequence" record; this is:

1) mvseq-2 (played moves)

20.4.5: Marker 0x05: general game data begin

Marker 0x05 is used to indicate the beginning of data for a game. It has no associated items; it is a complete record by itself. Instead, it marks the beginning of PGC records used to describe a game. All records up to the corresponding "general game data end" record are considered to be part of the same game. (PGC record type 0x01, "reduced export format single game", is not permitted to appear within a general game begin/end record pair. The general game construct is to be used as an alternative to record type 0x01 in those cases where the latter is too restrictive to contain the data for a game.)

20.4.6: Marker 0x06: general game data end

Marker 0x06 is used to indicate the end of data for a game. It has no associated items; it is a complete record by itself. Instead, it marks the end of PGC records used to describe a game. All records after the corresponding (and earlier appearing) "general game data begin" record are considered to be part of the same game.

20.4.7: Marker 0x07: simple-nag

Marker 0x07 is used to indicate the presence of a simple NAG (Numeric Annotation Glyph). This is an annotation marker that has only a short type identification and no operands. There is one item that follows marker 0x07 to form the "simple-nag" record; this is:

1) int-1 (unsigned NAG value, from 0 to 255)

20.4.8: Marker 0x08: rav-begin

Marker 0x08 is used to indicate the beginning of an RAV (Recursive Annotation Variation). It has no associated items; it is a complete record by itself. Instead, it marks the beginning of PGC records used to describe a recursive annotation. It is considered an opening bracket for a later rav-end record; the recursive annotation is completely described between the bracket pair. The rav-begin/data/rav-end structures can be nested.

20.4.9: Marker 0x09: rav-end

Marker 0x09 is used to indicate the end of an RAV (Recursive Annotation Variation). It has no associated items; it is a complete record by itself. Instead, it marks the end of PGC records used to describe a recursive annotation. It is considered a closing bracket for an earlier rav-begin record; the recursive annotation is completely described between the bracket pair. The rav-begin/data/rav-end structures can be nested.

20.4.10: Marker 0x0a: escape-string

Marker 0x0a is used to indicate the presence of an escape string. This is a string represented by the use of the percent sign ("%") escape mechanism in PGN. The data that is escaped is the sequence of characters immediately follwoing the percent sign up to but not including the terminating newline. As is the case with the PGN percent sign escape, the use of a PGC escape-string record is limited to use for non-archival data. There is one item that follows marker 0x0a to form the "escape-string" record; this is the string data being escaped:

1) string-2 (escaped string data)

21: E-mail correspondence usage

*** This section is under development.




Last updated: Tuesday, 4. June 1996
Copyright Manfred Rosenboom marochess@oocities.com

This page hosted by Get your own Free Home Page