GEDCOM (GEnealogical Data COMmunication)
A Simplified Explanation by
Bob Henly.

 

Contents

Introduction
GEDCOM File Structure
Summary

Back to home page Back to Links page

 

Introduction

Computer programs for storing, sorting and displaying family information are very useful but we soon find the need to exchange information with other researchers and then we hit problems. They are using a different program and the files produced by each are mutually incompatible.

There are several ways of exporting data from one program for importation to another e.g. Comma Separated Variable (CSV) format, but these lack one vital element for family history data; they do not have the means to convey relationships. This is where GEDCOM comes on the scene. GEDCOM is a file transfer protocol developed by the Mormon Church in Utah which addresses just these problems. It is designed to recognize that there are several different classes of record e.g. Individual, Marriage, Baptism, Notes, etc. and there are relationships between records e.g. two individuals are related by a marriage and they in turn are related to their parents and their children. This leads to a multi-level protocol ( or set of rules) for assembling this data into a file which can be exported from one program and imported into another without the first knowing anything about the second (i.e., the definition of GEDCOM is independent of the programs which use it.).

A complete specification of GEDCOM is available to software designers from:

The Family History Department,
GEDCOM Coordinator 3T
50, East North Temple St.,
Salt Lake City.
Utah 84150

or E-mail: gedcom@gedcom.org This is a mailing list to which one subscribes in the usual way.
Or you can down-load a copy by
anonymous ftp

Because GEDCOM is a multi-level, structured protocol it enables designers to use as much or as little as they wish. For example you could use it in a program to extract, say, just names and baptism dates. All the remaining data could be ignored or, as many designers do, be confined to a notes file.

Top of page

File Structure

The structure of a GEDCOM file is of the form:

Level Description Example
0 Record Class (or Type) Individual, Family, etc.This line indicates the start of a new record. Levels below this top level are used for increasing levels of detail
1 Record subclass Birth, Christening etc.
2 Data for the event at level 1 Date, Place etc.
3 Supplementary data Supplementary data and continuation lines relating to the data at the previous level.

Although this, at first, seems overly complicated, this structure is necessary to make the protocol truly application independent.

In any GEDCOM file a record may contain only one line at Level 0 but as many lines as necessary at the other levels.

It will commence with a record eg.
0 HEAD
1 SOUR Pedigree
1 DEST PAF

This record is a HEADing record which defines the SOURce of the data – in this case the program PEDIGREE and the DESTination which is PAF. The latest version of GEDCOM will accommodate other Source information records, not included here such identification of Ownership of the data..
Now to an actual GEDCOM file. It is pure text so it can be examined with a simple text-editor such as WINDOWS Notepad.

The following are fragments from a GEDCOM file for my wife’s WOOLF family of Exeter.
This first record is an INDIvidual record and it is at the highest level (0) in the hierarchy. It is identified by the tag @I65@.

Each line following the first starts with a number which indicates its level. In the example we have:-
all the level 1 lines which refer to the individual Michael WOOLF, his gender, references to his family and finally his BIRTh
This is followed by a level 2 record which hjas the details of his BIRTh ie DATE, PLACe and NOTE
Then back to level 1 to define a DEATh record
which is followed by level 2 data: observe here that the NOTE field has a continuation mark + and is followed by a level 3 CONTinuation record (for programs which do not truncate lines), and so on:

0 @I65@ INDI
1 SEX M
1 NAME Michael/WOOLF/
1 FAMC @F320@
1 FAMS @F19@
1 FAMS @F336@
1 BIRT
2 DATE ABT 1825
2 PLAC Exeter, Devon.
2 NOTE Date derived from Death Certificate
1 DEAT
2 DATE 02 Aug 1903
2 PLAC Exeter, Devon.
2 NOTE GRO Cert. Aged 78, Army Pensioner. Informant: A Woolf, widow of deceased living +
3 CONT at 51, Sanford St. Exeter
1 BURI
2 DATE ABT 1903
1 OCCU
2 TITL Commercial Traveller.
2 PLAC Exeter, Devon
2 FROM ABT 1878

The next example is another INDIvidual record – Michael’s wife.
0 @I66@ INDI
1 SEX F
1 NAME Mary Anne/CROCOMBE/
1 FAMS @F19@
1 BIRT
2 DATE 04 Apr 1827
2 PLAC Exeter, Devon
2 NOTE Age from marriage cert and date from dau's diary.
1 CHR
2 DATE 21 Apr 1822
2 PLAC PRs. St. Pauls, Exeter. Devon
1 DEAT
2 DATE 23 Mar 1877
2 PLAC Exeter, Devon
2 NOTE From dau's diary
1 BURI
2 DATE ABT 1877
2 PLAC Exeter, Devon

Now the pace quickens! Here we have a FAMily record indicated by the FAM tag. Both the above INDI records had references to @F19@ and this is record @F19@ [The symbol @ is used to distinguish these numbers from those used for other purposes].
It is the marriage record for Michael WOOLF (I65) and Mary Ann Crocombe (I66)
This refers also to the children of the marriage ie @I68@ and @I69@ . It then goes on to detail the marriage:
1 MARRiage
2 DATE 02 Feb 1854
2 PLACe Exeter, Devon
2 NOTE …….. again with continuations
Note the children of this marriage are indicated at level 1 with their own INDI records.

0 @F19@ FAM
1 HUSB @I65@
1 WIFE @I66@
1 CHIL @I69@
1 CHIL @I68@
1 MARR
2 DATE 02 Feb 1854
2 PLAC Exeter, Devon
2 NOTE At St. Sidwell's Church, Exeter.Groom bat, aged 28 Bandsman 3rd Light Dragoons, +
3 CONT res. at Cavalry Barracks, St. Davids; Father Isaac Woolf Shopkeeper. Bride +
3 CONT spinster aged 27, of 19 Paris St., Exeter. Father Philip Crocombe Tailor. Both +
3 CONT Signed;

In fact, Michael went to a second marriage and the record for this marriage referenced in his INDI record as @F336@ details a further four children of that marriage.

The specification defines tags for data at various levels and the above has only dealt with a small part just to give the flavour of it. The protocol is extremely flexible and could be adapted for other classes of data in activities unrelated to family history.

GEDCOM has become a very robust protocol and is very widely used. However not all family history programs implement the latest version and neither do many implement it fully. This accounts for one of its weaknesses – not in the protocol but in the programs. Another problem which rests mainly with program developers is the question of up-dating.

You cannot simply import a GEDCOM file into an existing Family database. It will result probably in a lot of duplicate records and a lot of work trying to decide what to keep and what to discard. Always import GEDCOM files into a new, empty database and then use CUT and Paste to transfer information.

Top of page

Summary

In summary GEDCOM is a protocol which defines a sequential data file in which individual record are presented together with their logical inter-linking. This linking now extends beyond the genealogical links and has provision for linking records for cross-referencing purposes.

Finally if you want to write some programs to manipulate GEDCOM files The Family History Department mentioned above make available a very useful suite of routines, written in ‘C’, for use by developers.

Back to home page Back to Links page Top of page