Phase 1: 25 STR analysis of selected families

The following link describes the phase 1 (25 STR) data set and analysis where surname is used as a proxy for geographic location/origin. It has been updated to correct several miss-assigned surnames. A detailed examination of the geographic distribution of surnames has also been added and compared with commonly accepted historical origins.

 

Cluster analysis of DNA haplotypes by surname group

 

The same data set was then analyzed using the same clustering method and software, but by individual rather than group. This approach circumvents “haplotype diversity” within surnames due to multiple origins of the same surname and “non-paternity” events, but discards information about origin.  Haplogroup assignments were then made using Whit Athey Haplogroup estimator ( https://home.comcast.net/~whitathey/hapest.htm ) on samples of surnames (edges and centers of obvious clusters) and clusters assigned to a haplogroup.

 

Perhaps surprisingly the clustering method used had an extremely good fit with Whit’s predictions. The R1b haplogroup was then broken into “tentative” subgroups by visual inspection and summary statistics calculated. These included modal values for haplogroups, summary of nominal and percentage incidence of haplogroups by putative surname geographic origin. Several associated comments have also been made. At this stage statistical tests of the robustness of the classification have not been undertaken. However, it is obvious that some clusters probably should be collapsed and/or the analysis expanded to 37 markers. The reasons for lack of detailed statistical analysis is that the surname classification needs further investigation, and the data set needs to include more Irish, Dal Riadic and “British” surnames.

 

Gif file of cluster phylogram and manually assigned haplogroups

 

Modal values of Y STR repeats for assigned haplogroups

 

Nominal and percentage incidence of haplogroups by putative origin of surname

 

 

Results to date for R1b

If a cluster diagram is plotted as an unrooted tree the R1b haplogroup shows a distinctive star topology reflecting its recent origin and rapid increase in population size. The phylogram supplied above has been “unrolled” from the origin which is close to the modal R1b value and branch lengths indicate divergence. The reason is that this clearly displays the surnames and eases cluster coding.

 

The process cleanly identified a R1b subgroup labeled R1ba, which has been previously described by David Wilson http://home.earthlink.net/~wilsondna/DYS392=14%20Summary.htm In this analysis it differs from the modal values of the R1b haplogroup at 6 out of the 25 markers and seems to be present at high frequency in Ireland, Dal Riadic Scots and at a significant, but lower frequency, in Britonic surnames.

 

The next most divergent group is R1bi with 5 differences. This putative subgroup of R1b is primarily expressed in Britonic surnames and could be called a Campbell/Strathclyde signature. This group is now dubbed “Scot R1b variety” after Ken Nordtvelt who has done an extensive analysis on this grouping, see http://archiver.rootsweb.com/th/read/GENEALOGY-DNA/2005-07/1120443313 .

 

R1bf, g, and l subgroups with 2, 4, 0 differences from the modal values of R1b respectively, have an Irish and Dal Riadic surname frequency bias.

 

R1bj subgroup with 3 differences from the R1b modal (and is rather similar to R1bi) appears to be expressed preferentially in Pict/Anglosaxon/Norse origin surnames.

 

Statistics on the origin of R1ba and R1bi subclades

If we take the two currently accepted subclades R1ba and R1bi, we can conduct a chi-squared test to test if the putative origin of the surnames present in this group. To do this those of putative Norse, Anglosaxon and Pict origins are discarded because of insufficient numbers and the table of results are shown below. The results strongly suggest that these subclades differ in their geographic distribution with R1bi very rare in Ireland.

 

Table 1. Incidence of individuals in R1b subclades based on assigned origin of surname

 

Irish

Dal Riadic

Briton

R1ba

10

11

44

R1bi

0

3

55

 

Chi-squared   = 15.445 (2 degrees of freedom)

Probability      = 0.00044

 

Summary

The intention is these results will spur additional analysis of the R1b subgroup by others. The results suggest that R1b subgroups will be defined from Y STRs given: sufficient individuals, close attention to surname and ancestor origin, and perhaps the requirement of 37 STRs genotyped. The results form the 25 STR analysis using a limited surname sampling suggest that surnames are an reasonable proxy of geographic origin and that there may also be significant differences in geographic frequency of their associated haplotype subgroups. However, these “subclades” of R1b are by no means unique to a “Godelic/Irish” or Bythronic/Briton Celtic origin. Their origins are explored in more depth in later analyses.