GEDCOM
(GEnealogical Data COMmunication)
A Simplified Explanation by Bob Henly.
Back to home page
Back to Links page
Computer programs for storing, sorting and displaying family information are very useful but we soon find the need to exchange information with other researchers and then we hit problems. They are using a different program and the files produced by each are mutually incompatible.
There are several ways of exporting data from one program for importation to another e.g. Comma Separated Variable (CSV) format, but these lack one vital element for family history data; they do not have the means to convey relationships. This is where GEDCOM comes on the scene. GEDCOM is a file transfer protocol developed by the Mormon Church in Utah which addresses just these problems. It is designed to recognize that there are several different classes of record e.g. Individual, Marriage, Baptism, Notes, etc. and there are relationships between records e.g. two individuals are related by a marriage and they in turn are related to their parents and their children. This leads to a multi-level protocol ( or set of rules) for assembling this data into a file which can be exported from one program and imported into another without the first knowing anything about the second (i.e., the definition of GEDCOM is independent of the programs which use it.).
A complete specification of GEDCOM is available to software designers from:
The Family History
Department,
GEDCOM Coordinator 3T
50, East North Temple St.,
Salt Lake City.
Utah 84150
or E-mail: gedcom@gedcom.org This is a mailing list to which one subscribes in
the usual way.
Or you can down-load a copy by anonymous ftp
Because GEDCOM is a multi-level, structured protocol it enables designers to use as much or as little as they wish. For example you could use it in a program to extract, say, just names and baptism dates. All the remaining data could be ignored or, as many designers do, be confined to a notes file.
The structure of a GEDCOM file is of the form:
Level | Description | Example |
0 | Record Class (or Type) | Individual, Family, etc.This line indicates the start of a new record. Levels below this top level are used for increasing levels of detail |
1 | Record subclass | Birth, Christening etc. |
2 | Data for the event at level 1 | Date, Place etc. |
3 | Supplementary data | Supplementary data and continuation lines relating to the data at the previous level. |
Although this, at first, seems overly complicated, this structure is necessary to make the protocol truly application independent.
In any GEDCOM file a record may contain only one line at Level 0 but as many lines as necessary at the other levels.
It will commence with a record eg.
0 HEAD
1 SOUR Pedigree
1 DEST PAF
This record is a HEADing record which
defines the SOURce of the data in this case the program
PEDIGREE and the DESTination which is PAF. The latest version of
GEDCOM will accommodate other Source information records, not
included here such identification of Ownership of the data..
Now to an actual GEDCOM file. It is pure text so it can be
examined with a simple text-editor such as WINDOWS Notepad.
The following are fragments from a GEDCOM
file for my wifes WOOLF family of Exeter.
This first record is an INDIvidual record and it is at the
highest level (0) in the hierarchy. It is identified by the tag
@I65@.
Each line following the first starts with a
number which indicates its level. In the example we have:-
all the level 1 lines which refer to the individual Michael
WOOLF, his gender, references to his family and finally his BIRTh
This is followed by a level 2 record which hjas the details of
his BIRTh ie DATE, PLACe and NOTE
Then back to level 1 to define a DEATh record
which is followed by level 2 data: observe here that the NOTE
field has a continuation mark + and is followed by a level 3
CONTinuation record (for programs which do not truncate lines),
and so on:
0 @I65@ INDI
1 SEX M
1 NAME Michael/WOOLF/
1 FAMC @F320@
1 FAMS @F19@
1 FAMS @F336@
1 BIRT
2 DATE ABT 1825
2 PLAC Exeter, Devon.
2 NOTE Date derived from Death Certificate
1 DEAT
2 DATE 02 Aug 1903
2 PLAC Exeter, Devon.
2 NOTE GRO Cert. Aged 78, Army Pensioner. Informant: A Woolf,
widow of deceased living +
3 CONT at 51, Sanford St. Exeter
1 BURI
2 DATE ABT 1903
1 OCCU
2 TITL Commercial Traveller.
2 PLAC Exeter, Devon
2 FROM ABT 1878
The next example is another INDIvidual
record Michaels wife.
0 @I66@ INDI
1 SEX F
1 NAME Mary Anne/CROCOMBE/
1 FAMS @F19@
1 BIRT
2 DATE 04 Apr 1827
2 PLAC Exeter, Devon
2 NOTE Age from marriage cert and date from dau's diary.
1 CHR
2 DATE 21 Apr 1822
2 PLAC PRs. St. Pauls, Exeter. Devon
1 DEAT
2 DATE 23 Mar 1877
2 PLAC Exeter, Devon
2 NOTE From dau's diary
1 BURI
2 DATE ABT 1877
2 PLAC Exeter, Devon
Now the pace quickens! Here we have a
FAMily record indicated by the FAM tag. Both the above INDI
records had references to @F19@ and this is record @F19@ [The
symbol @ is used to distinguish these numbers from those used for
other purposes].
It is the marriage record for Michael WOOLF (I65) and Mary Ann
Crocombe (I66)
This refers also to the children of the marriage ie @I68@ and
@I69@ . It then goes on to detail the marriage:
1 MARRiage
2 DATE 02 Feb 1854
2 PLACe Exeter, Devon
2 NOTE
.. again with continuations
Note the children of this marriage are indicated at level 1 with
their own INDI records.
0 @F19@ FAM
1 HUSB @I65@
1 WIFE @I66@
1 CHIL @I69@
1 CHIL @I68@
1 MARR
2 DATE 02 Feb 1854
2 PLAC Exeter, Devon
2 NOTE At St. Sidwell's Church, Exeter.Groom bat, aged 28
Bandsman 3rd Light Dragoons, +
3 CONT res. at Cavalry Barracks, St. Davids; Father Isaac Woolf
Shopkeeper. Bride +
3 CONT spinster aged 27, of 19 Paris St., Exeter. Father Philip
Crocombe Tailor. Both +
3 CONT Signed;
In fact, Michael went to a second marriage
and the record for this marriage referenced in his INDI record as
@F336@ details a further four children of that marriage.
The specification defines tags for data at various levels and the above has only dealt with a small part just to give the flavour of it. The protocol is extremely flexible and could be adapted for other classes of data in activities unrelated to family history.
GEDCOM has become a very robust protocol and is very widely used. However not all family history programs implement the latest version and neither do many implement it fully. This accounts for one of its weaknesses not in the protocol but in the programs. Another problem which rests mainly with program developers is the question of up-dating.
You cannot simply import a GEDCOM file into an existing Family database. It will result probably in a lot of duplicate records and a lot of work trying to decide what to keep and what to discard. Always import GEDCOM files into a new, empty database and then use CUT and Paste to transfer information.
In summary GEDCOM is a protocol which defines a sequential data file in which individual record are presented together with their logical inter-linking. This linking now extends beyond the genealogical links and has provision for linking records for cross-referencing purposes.
Finally if you want to write
some programs to manipulate GEDCOM files The Family History
Department mentioned above make available a very useful suite of
routines, written in C, for use by developers.