A preliminary investigation
at mapping Irish and Scottish surnames using Y chromosome markers
John McEwan
26th April 2005
(updated 19th June 2005, Kincade/Kincaid altered to Britonic,
Fitzpatrick to Irish Celtic)
(updated 23st July 2005, to include details about surname
distribution in 1881)
Method
Data was extracted from Ysearch (http://www.ysearch.org/)
for a number of Irish, Scottish and English families. Families were selected on
the basis of:
a)
having a reasonable number of samples
b)
a surname of Irish, or Scottish origin with several
outgroup surnames like Schmidt and Johns also selected.
c)
of geographic interest
d)
historically related to the Ewing/McEwen surname
Note several “tricks” were used to extract data in a consistent format
and to combine them in an Excel file.
The data collected (752 individuals), was then edited to remove
duplicate entries and only those samples with the core 25 FTDNA markers were
retained and others edited down to those core markers. DYS389ii was converted
to a normal marker by subtracting the repeat length of DYS389i from it. This
was followed by minor editing to insert a missing value where a single genotype
was missing in the core 25 markers. The resulting
file was then visually scanned and any entry within that surname group whose
ancestor was not immediately obvious and related to that family was deleted
(e.g. ancestor White listed in Black surname group). The resulting file had 498
entries and 39 surnames. The largest surname group was Stewart with 65 entries
and the smallest was McLaren, McEwen, Fitzpatrick, Fennessy, Donoghue with 2
entries each.
Surnames were classified into likely origin based on commonly defined
origins recorded historically. Note, most surnames became fixed and inherited
in a patrilineal fashion in the period 1000-1350 AD. In addition putative
surname origin was also investigated independently by examining geographical surname
distribution in 1881. The assumption is that people with the same surname
typically cluster around the origin of the surname. The distribution for each
surname (both graphed and tabulated) used can be viewed here.
A number of methods of summarizing the data were investigated and
finally the program Populations (http://www.pge.cnrs-gif.fr/bioinfo/populations/index.php?lang=en) was used. The graphical data produced
was then visualized using TreeView32 (http://taxonomy.zoology.gla.ac.uk/rod/treeview.html).
Initially simple options were chosen. The graph presented below is an unrooted
tree. The algorithm chosen was Cavali-Sforza and Edwards' (1967) chord distance
method using Neighbor joining. In practice considerable investigation is needed
to choose the best tree method and bootstrapping is also required to test the
tree stability (I suspect the small family groups are very unstable and are
included for interest only).
Results
The resulting graph is shown on the next page. Notice the relatively
clean separation of groups based on historical origin of the name. The Irish
DalRiada Celtic group seems to be distinguished from a group largely
originating from the Western lowlands of Scotland and is probably Britonic
type. The balance of the surnames in the upper quadrant appears to have more
distant and diverse origins, although historically most originate from the
Eastern Highlands or Lowland Scotland.
Summary
This simple analysis
provided sufficient information to suggest that a more detailed analysis could
be undertaken, based around clustering haplotypes and then using surname as a
marker of historic location. This is explored in subsequent analyses.