The following link describes the phase 1 (25 STR) data set and analysis
where surname is used as a proxy for geographic location/origin. It has been
updated to correct several miss-assigned surnames. A detailed examination of
the geographic distribution of surnames has also been added and compared with commonly
accepted historical origins.
Cluster analysis of DNA haplotypes
by surname group
The same data set was then analyzed using the same clustering method and
software, but by individual rather than group. This approach circumvents “haplotype diversity” within surnames due to multiple
origins of the same surname and “non-paternity” events, but discards
information about origin. Haplogroup assignments were then made using Whit Athey Haplogroup estimator ( https://home.comcast.net/~whitathey/hapest.htm
) on samples of surnames (edges and centers of obvious clusters) and clusters
assigned to a haplogroup.
Perhaps surprisingly the clustering method used had an extremely good
fit with Whit’s predictions. The R1b
haplogroup was then broken into “tentative” subgroups
by visual inspection and summary statistics calculated. These included modal
values for haplogroups, summary of nominal and
percentage incidence of haplogroups by putative
surname geographic origin. Several associated comments have also been made. At
this stage statistical tests of the robustness of the classification have not
been undertaken. However, it is obvious that some clusters probably should be
collapsed and/or the analysis expanded to 37 markers. The reasons for lack of
detailed statistical analysis is that the surname classification needs further
investigation, and the data set needs to include more Irish, Dal Riadic and “British”
surnames.
Gif file of cluster phylogram
and manually assigned haplogroups
Modal values of Y STR repeats
for assigned haplogroups
Nominal and percentage incidence of haplogroups by putative origin of surname
Results to date for R1b
If a cluster diagram is plotted as an unrooted
tree the R1b haplogroup
shows a distinctive star topology reflecting its recent origin and rapid
increase in population size. The phylogram supplied
above has been “unrolled” from the origin which is close to the modal R1b value
and branch lengths indicate divergence. The reason is that this clearly
displays the surnames and eases cluster coding.
The process cleanly identified a R1b
subgroup labeled R1ba, which has
been previously described by David Wilson http://home.earthlink.net/~wilsondna/DYS392=14%20Summary.htm
In this analysis it differs from the modal values of the R1b haplogroup at 6 out of the 25 markers and seems to be
present at high frequency in Ireland, Dal Riadic Scots and at a significant, but lower frequency, in Britonic surnames.
The next most divergent group is R1bi
with 5 differences. This putative subgroup of R1b is primarily expressed in Britonic
surnames and could be called a Campbell/Strathclyde
signature. This group is now dubbed “Scot
R1b variety” after Ken Nordtvelt who has done an
extensive analysis on this grouping, see http://archiver.rootsweb.com/th/read/GENEALOGY-DNA/2005-07/1120443313
.
R1bf, g, and l subgroups with 2, 4, 0 differences from the modal values of R1b respectively, have an Irish and Dal Riadic surname frequency
bias.
R1bj subgroup with 3
differences from the R1b modal (and
is rather similar to R1bi) appears
to be expressed preferentially in Pict/Anglosaxon/Norse
origin surnames.
Statistics on the origin of
R1ba and R1bi subclades
If we take the two currently accepted subclades
R1ba and R1bi, we can conduct a chi-squared test to test if the putative
origin of the surnames present in this group. To do this those of putative
Norse, Anglosaxon and Pict
origins are discarded because of insufficient numbers and the table of results are shown below. The results strongly
suggest that these subclades differ in their
geographic distribution with R1bi
very rare in Ireland.
Table 1. Incidence of
individuals in R1b subclades based on assigned origin
of surname
|
Irish |
Dal Riadic |
Briton |
R1ba |
10 |
11 |
44 |
R1bi |
0 |
3 |
55 |
Chi-squared = 15.445 (2 degrees
of freedom)
Probability = 0.00044
Summary
The intention is these results will spur additional analysis of the R1b subgroup by others. The results suggest
that R1b subgroups will be defined
from Y STRs given: sufficient individuals, close
attention to surname and ancestor origin, and perhaps the requirement of 37 STRs genotyped. The results form the 25 STR analysis using a limited surname sampling suggest that surnames
are an reasonable proxy of geographic origin and that there may also be
significant differences in geographic frequency of their associated haplotype subgroups. However, these “subclades”
of R1b are by no means unique to a “Godelic/Irish” or
Bythronic/Briton Celtic origin. Their origins are
explored in more depth in later analyses.