A preliminary investigation at mapping Irish and Scottish surnames using Y chromosome markers

 

John McEwan

 

26th April 2005

(updated 19th June 2005, Kincade/Kincaid altered to Britonic, Fitzpatrick to Irish Celtic)

(updated 23st July 2005, to include details about surname distribution in 1881)

 

Method

Data was extracted from Ysearch (http://www.ysearch.org/) for a number of Irish, Scottish and English families. Families were selected on the basis of:

a)     having a reasonable number of samples

b)     a surname of Irish, or Scottish origin with several outgroup surnames like Schmidt and Johns also selected.

c)      of geographic interest

d)     historically related to the Ewing/McEwen surname

Note several “tricks” were used to extract data in a consistent format and to combine them in an Excel file.

 

The data collected (752 individuals), was then edited to remove duplicate entries and only those samples with the core 25 FTDNA markers were retained and others edited down to those core markers. DYS389ii was converted to a normal marker by subtracting the repeat length of DYS389i from it. This was followed by minor editing to insert a missing value where a single genotype was missing in the core 25 markers.  The resulting file was then visually scanned and any entry within that surname group whose ancestor was not immediately obvious and related to that family was deleted (e.g. ancestor White listed in Black surname group). The resulting file had 498 entries and 39 surnames. The largest surname group was Stewart with 65 entries and the smallest was McLaren, McEwen, Fitzpatrick, Fennessy, Donoghue with 2 entries each.

 

Surnames were classified into likely origin based on commonly defined origins recorded historically. Note, most surnames became fixed and inherited in a patrilineal fashion in the period 1000-1350 AD. In addition putative surname origin was also investigated independently by examining geographical surname distribution in 1881. The assumption is that people with the same surname typically cluster around the origin of the surname. The distribution for each surname (both graphed and tabulated) used can be viewed here.

 

A number of methods of summarizing the data were investigated and finally the program Populations (http://www.pge.cnrs-gif.fr/bioinfo/populations/index.php?lang=en) was used. The graphical data produced was then visualized using TreeView32 (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html). Initially simple options were chosen. The graph presented below is an unrooted tree. The algorithm chosen was Cavali-Sforza and Edwards' (1967) chord distance method using Neighbor joining. In practice considerable investigation is needed to choose the best tree method and bootstrapping is also required to test the tree stability (I suspect the small family groups are very unstable and are included for interest only).

 

Results

The resulting graph is shown on the next page. Notice the relatively clean separation of groups based on historical origin of the name. The Irish DalRiada Celtic group seems to be distinguished from a group largely originating from the Western lowlands of Scotland and is probably Britonic type. The balance of the surnames in the upper quadrant appears to have more distant and diverse origins, although historically most originate from the Eastern Highlands or Lowland Scotland.

 

Summary

This simple analysis provided sufficient information to suggest that a more detailed analysis could be undertaken, based around clustering haplotypes and then using surname as a marker of historic location. This is explored in subsequent analyses.