Monday 6 June 2016

The Origins of R1b-GF2 (Part 2) - expanding the group

While doing the Genetic Distance (GD) analysis on the three new members for group R1b-GF2, it became apparent that some of the project members who are currently in the Ungrouped section appear as distant matches to these new members. This sparked a re-examination of Ungrouped members to see if any of them could be moved out of the Ungrouped section and into R1b-GF2, particularly in view of the fact that we have identified the terminal SNP FGC20561 among several of the members of the group. For reference, the Predicted SNP Progression for this genetic family is as follows:
R … > M269 > L150 > L23 > L51 > L151 > P311 > P312 > L21 > DF13 > ZZ10 > Z253 > S847 > S844 > S856 > S845 > S846 > Z17685 > FGC20561

First I repeated the GD analysis for each of the 16 current members of R1b-GF2 who have 37 marker results, and identified the Ungrouped members who turned up interspersed among their fellow R1b-GF2 matches.

Then I looked at each of these Ungrouped members in turn, assessing the terminal SNPs of their own matches to see if there was any hint of their terminal SNP being FGC20561 or somewhere near it.

Here is what I found.

Genetic Distance Analysis of existing R1b-GF2 members

No Ungrouped members appeared in the GD analyses performed at the 111 and 67 marker levels. But at the 37 marker level, GD analysis suggested that the following Ungrouped members might belong in R1b-GF2. 

The list below includes the kit number of each individual currently in R1b-GF2 (a & B) followed by the kit numbers of those Ungrouped members who are close to or among each individual's R1b-GF2 matches (in the order of closeness). Each Ungrouped member's kit number appears in bold the first time it occurs. Please note that for added security only the last 4 numbers of their kit numbers are shown.

R1b-GF2 
members   ... Ungrouped members that appear among / near their other R1b-GF2 matches
  • 9744 ... 7344 9133 0126 1164 8478 8451 
  • 2693 ... 9133 7344 0126 1160  
  • 7250 ... 7344 9133 0126 8478 8451 1491 3273  
  • 8902 ... 7344 9133 0126 1055 8478  
  • 4990 ... 0126 7344 9133 2356 3459  
  • 6145 ... 7344 9133 0126 1160 8451 2356 3459  
  • 7657 ... 7344 9133 8478 0126  
  • 5650 ... 9133 7344 0126 8478  
  • 6772 ... 7344 0126 9133  
  • 3181 ... 9133 7344 0126 8478  
  • 1867 ... 7344 9133 0126 3273 1491 8451 8478 9094 0754 0861  
  • 5271 ... 7344 9133 0126 3273 8478 8451 1491  
  • 7960 ... 7344 9133 0126 3273 8478 8451 1491  
  • 3189 ... 7344 9133 0126  
  • 1784 ... 7344 0126 8478 9133 8451 (see diagram below as an example of this exercise)  
  • 7099 ... 7344 9133 0126  

This GD analysis identified 15 potential candidates (in bold) for membership of R1b-GF2. 

Example: GD analysis (at 37 markers) for member PSF-1784 ... this reveals that
5 Ungrouped members (orange dots) turn up among his R1b-GF2 matches

How often do we see them?

Some of the potential candidates appeared in the R1b-GF2 match lists on multiple occasions, some only once. It is perhaps more likely that those who appeared many times are more likely to be real candidates for R1b-GF2 membership. Here is a list of the 15 candidates and the frequency with which they appeared in the R1b-GF2 match lists. (Note: again, only the last 4 digits of their kit numbers are shown).

7344   x16
9133   x16
0126   x16
1164   x1
8478   x10
8451   x7
1160   x2
1491   x4
3273   x4
1055   x1
2356   x2
3459   x2
9094   x1
0754   x1
0861   x1

The next step was to see if there was any evidence that the terminal SNP of each of these 15 candidates was at or near FGC20561. Those candidates that "passed the  test" are indicated in green; those that did not are in red.

Assessment of each potential candidate for evidence of FGC20561

7344, Farrell
Matches 16 members of R1b-GF2, including the other 3 Ungrouped members (red arrows) who are now considered probable/possible members of R1b-GF2.
GD to R1b-GF2 members = 4-10/37

Example: GD analysis (at 37 markers) of Ungrouped member 7344 - his closest matches are consistently in GF2
SNP analysis
At 37 markers, the following SNPs appear among his matches: Z253 x2, S846 x1
At 25 markers, Z253 x13, S856 x1, S846 x2, FGC20561 x4, FGC20562 x1, FGC20563 x1 ... plus some less frequent convergent subclade SNPs (Z29706, Z225, U152, S7015, FGC5494, DF23, DF13, DF103, BY3495)

So there is a strong indication that his SNP progression is similar to that of other members of R1b-GF2. And so he is now included in R1b-GF2.

9133, Farrell
Matches 16 members of R1b-GF2.
GD to R1b-GF2 members = 4-11/37
SNP analysis
At 37 markers, Z253 x1
At 25 markers, Z253 x14, S845 x2, S846 x2, S856 x2, Z17685, FGC20561 x4, FGC20562 x2, FGC20563 x7, plus some other single SNPs (A306, A600, DF103, L1308, L193, U152, Z145, Z198)

So again, the strong signal of a terminal SNP somewhere at or near FGC20561 warrants inclusion of this individual in R1b-GF2.

0126, Farrell
Matches 16 members of R1b-GF2.
GD to R1b-GF2 members = 6-11/37
SNP analysis
At 37 markers, no matches
At 25 markers, S11601 x1 ... this is a very distant SNP so currently there is insufficient evidence to move this individual into R1b-GF2. However, single SNP testing of FGC20561 may confirm that he belongs in R1b-GF2, which I would suspect that he does, given that he appears as a match (albeit distantly) to ALL 16 members of R1b-GF2. For those who appear as matches to R1b-GF2 members less frequently, a more general SNP Pack might be more advisable (such as the M343 Backbone Panel).

1164, Farrell
Matches 1 member of R1b-GF2.
GD to R1b-GF2 members = 12-19/37
SNP analysis
At 67 markers, no matches.
At 37 markers, no matches.
At 25 markers, no matches. So there is insufficient evidence to move him into R1b-GF2. Testing with the M343 Backbone Panel would provide some further direction.

8478, Farley
Matches 10 members of R1b-GF2.
GD to R1b-GF2 members = 9-15/37
SNP analysis
At 37 markers, no matches.
At 25 markers, 3 matches (all R1b-GF2).  But no SNP suggestions.
At 12 markers (I only went down to this level because of the low number of matches at the previous level), only 21 matches but 15 of them are in the Farrell project. And there are suggestive SNPs ...  Z253 x1, FGC20561 x3 ... but FGC11134 x1 and FGC34047.

The balance of the evidence suggests that this member belongs in R1b-GF2 but a single SNP test for FGC20561 would be helpful to confirm it.

8451, Farley
Matches 7 members of R1b-GF2.
GD to R1b-GF2 members = 10-16/37
SNP analysis
At 111, 67, & 37 markers, no matches.
At 25 markers, 70 matches, but no evidence of SNPs close to FGC20561, not even Z253.

Therefore this individual should not be moved into R1b-GF2 and a M343 Backbone Panel might be the best next step.

1160, Farrell
Matches 2 members of R1b-GF2.
GD to R1b-GF2 members = 11-17/37
SNP analysis
At 37 markers, 1 match.
At 25 markers, 24 matches, but no evidence of SNPs close to FGC20561. Therefore this individual also remains Ungrouped and a M343 Backbone Panel might be the best next step.

1491, Carroll
Matches 4 members of R1b-GF2.
GD to R1b-GF2 members = 10-18/37
SNP analysis - this individual has done the Big Y test & his terminal SNP is Z16277. This is on a completely different part of the haplotree (i.e. below L21 > DF13 > DF21 > Z3000) and therefore this individual is not closely related to those in R1b-GF2.

3273, Largent
Matches 4 members of R1b-GF2.
GD to R1b-GF2 members = 10-17/37
SNP analysis
At 37 markers, 12 matches. No evidence of FGC20561.
At 25 markers, 17 matches. Again no evidence. So he remains ungrouped.

1055, Farr
Matches 1 member of R1b-GF2.
GD to R1b-GF2 members = 11-15/37
SNP analysis
At 37 markers, 17 matches with no evidence of FGC20561.
At 25 markers, 349 matches but again no evidence. He remains ungrouped.

2356, Farrell
Matches 2 members of R1b-GF2.
GD to R1b-GF2 members = 10-15/37
SNP analysis: this member is M222 positive - a completely different branch. He remains Ungrouped.

3459, Farrar
Matches 2 members of R1b-GF2.
GD to R1b-GF2 members = 12-19/37
SNP analysis: this member has a terminal SNP of S7370 which is on a completely different branch (DF21). He remains Ungrouped.

9094, O Fearghall
Matches 1 member of R1b-GF2.
GD to R1b-GF2 members = 13-19/37
SNP analysis: his terminal SNP is likely to lie below DF21 (a different branch). He remains ungrouped.

0754, Farrall 
Matches 1 member of R1b-GF2.
GD to R1b-GF2 members = 11-18/37
SNP analysis: his terminal SNP is L48 (a different branch). He remains ungrouped.

0861, Farrell
Matches 1 member of R1b-GF2.
GD to R1b-GF2 members = 12-16/37
SNP analysis: no evidence of being close to FGC20561. He remains ungrouped.


Discussion

Reviewing the evidence

So at the end of that analysis, we can be reasonably confident to add 3 new members to R1b-GF2 (namely 7344, 9133, 8478) and I have provisionally added the fourth member also (0126) even though it would be better to have confirmation that he does indeed test positive for FGC20561. In fact, it would be useful if all these four new members did the single SNP test to confirm they are positive for FGC20561 as predicted.

All the evidence points to their inclusion but how much of it can be considered to be independent pieces of evidence and how much of it may be influenced by circular arguments? There are three main pieces of evidence:
  1. Genetic Distance - does the individual fall within the recommended thresholds for declaring a match? The closest GD of the four new members is 4/37, 4/37, 6/37 and 9/37. Traditionally the first two would be declared matches, the third one would be considered a possibility, and the last one would be dismissed as likely to be unrelated within a genealogical timeframe.
  2. Number of times a specific Ungrouped member appears "close to" R1b-GF2 members in the GD analyses: it is striking how often the new members appear among the closest matches of the 16 members of R1b-GF2 - three of the new members appear all 16 times, the fourth appears 10 times.
  3. Analysis of terminal SNPs among an individual's matches: although both of the first two measures rely on GD as their basis, this third measure is much more independent of the two so the risk of a "circular argument" is reduced. There is reasonably strong evidence of the terminal SNP being at or near FGC20561 in 3 of the 4 new members. The fourth one would have to do the single SNP test to confirm this.  
There are two other pieces of evidence we could examine to see if there is internal consistency in the evidence to support the inclusion of these 4 new members in R1b-GF2, and these are: the TiP24 Score and rare marker values. These concepts are discussed in a previous post here - Criteria for allocating members to specific Genetic Families.

There are no rare marker values among the R1b-GF2 group (see previous post) and so this particular "Marker of Potential Relatedness" is of no use in this instance. Here are the TiP24 Scores (at 37 markers) for each of the 4 new members (compared to their closest R1b-GF2 match):
  • 7344 (Farrell) ... 95.30%
  • 9133 (Farrell) ... 94.65%
  • 8478 (Farley) ... 70.57%
  • 0126 (Farrell) ... 80.65%

Thus, two members show a strong signal (>90%) and two show a weaker signal. And depending on where you draw the threshold for inclusion, these members are either out or in. If it is 60%, then they are all in. If it is 80% then only 3 of them are in.

However, it is always best if you compare other potential members to the member whose haplotype is closest to the modal haplotype for the group. Here are the TiP24 Scores (at 37 markers) for each of the 4 new members compared to member BRF-7960 whose genetic signature is closest to the modal haplotype for R1b-GF2a (GD=1):
  • 7344 (Farrell) ... 79.73%
  • 9133 (Farrell) ... 76.13%
  • 8478 (Farley) ... 55.90%
  • 0126 (Farrell) ... 52.64%

These TiP24 Scores are much less convincing than the previous ones and only 2 of the 4 would qualify for inclusion, and only at the lower threshold value of 60%.

A further major caveat is that the TiP24 Score relies heavily on Genetic Distance (albeit accounting for variable mutation rates) and therefore it could be argued that there is a degree of circularity in using this measure and it is therefore not truly independent. Nevertheless, the TiP24 Score adds some further credibility to including at least 2 of these 4 members.

Ancestral Origins

Let's turn now to a different question. Does the addition of these new members add anything to our exploration of the origins of this group? Here is the MDKA information for each of these new members:
  • 7344 (Farrell) ... William Farrell Tipperary Ire 1844
  • 9133 (Farrell) ... Bernard Farrell 1828 - ? arrived in US in 1850
  • 8478 (Farley) ... none given
  • 0126 (Farrell) ... James Farrell b.3/20/1842 Restigouche Co NB Canada
Only one of the new additions points to Ireland as the country of origin, and specifically County Tipperary. Thus, within R1b-GF2, there are two members who cite Ireland and specifically Tipperary as the ancestral origin of their MDKA. All other members (who have included MDKA information) cite US origins for their MDKA. This again highlights the need for all members to include accurate MDKA information in order to facilitate this analysis, especially birth location (even if it is an educated guess).

So (currently) Tipperary is the leading candidate for the ancestral homeland of R1b-GF2 but this is only based on information from two members and so has to be taken with a large grain of salt ... for now. In the next article we will be looking at other evidence.

Amalgamation

Because each of the analyses of the terminal SNPs of the matches of the individual members points to the terminal SNP for this group being FGC20561, there is no longer any major justification for the group being split into the "core group" R1b-GF2a and the more "peripheral group" R1b-GF2b. This concept was a useful exercise in the early days of the project when we wanted to separate out people whom we were very confident to group together from people who we were less confident about grouping together.

As a result of the new analyses, the two sub-groups will be amalgamated into just one larger group, R1b-GF2. This is further justified by the fact that the modal haplotypes for each of the sub-groups are virtually identical. The only differences between the Modal Haplotypes (up to 67 markers) for R1b-GF2a and R1b-GF2b are as follows:
  • dys456: 17 in 2a, 18 in 2b
  • CDYa: 38 in 2a, 39 in 2b

What's Next?

Further evidence to justify this amalgamation of the two groups could be obtained by everybody within the group testing for the terminal SNP FGC20561. This could be done via a single SNP test ($39) or via the Z253 SNP Pack ($119) or via the Big Y test ($575).

It would be good if several people from the group did the Big Y test as this would not only confirm if FGC20561 was the terminal SNP for the group, but could also possibly identify further downstream SNPs which are specific to the Farrell surname. To this end I am asking for volunteers to do the Big Y test when the next sale at FTDNA is announced. The current price is $575 but in the last sale this came down to $460 so I would propose to wait until then to buy this test. If anyone is interested please leave a comment below. It may be that several members would like to group together to raise the money for a test.



The old look of R1b-GF2a & 2b


The new look of R1b-GF2

Maurice Gleeson
May 2016







2 comments:

  1. So I would have to retest my Dad, correct?

    ReplyDelete
    Replies
    1. Hi Leigh, yes, he should do the Z253 SNP Pack at $119 (see my email of 1 Apr 2016). FTDNA can do this on the sample they already have for your Dad. This is the preferred option for everyone in R1b-GF2 as it is in between the cheapest option (a single SNP test for $39 - which may turn out NOT to be positive in some people) and the Big Y test (which is $575, reduced to about $460 in some of the FTDNA Sales).

      Delete