Monday 2 February 2015

Reasons for grouping into Genetic Families - R1b

- the rationale for allocating members to the Genetic Families of R1b

The majority of project members belong to Haplogroup R1b (25 members in genetic families, 52 members in the Ungrouped category, giving a total of 77 out of 88 members who have been Y-DNA-tested in total = 88%). This is as expected as Farrell and most variants are Western European (and more specifically Irish or British) in origin where up to 90% of the population is likely to be R1b. However, this haplogroup is also the most challenging in terms of trying to separate out its members into distinct genetic families.

The first step in identifying genetic families within this group was to identify those members who are exact or very close matches to each other. These very close matches formed the seed or nidus around which the rest of the genetic family was built.

For those who are interested (and for future referral), the process I employed for doing this was as follows:
    • I started by copying the existing Y-DNA results to a spreadsheet 
    • Then I sorted the results by each individual marker in turn up to 37 markers (omitting the multi-copy markers) 
    • Then deleted those who had only tested 12 markers 
    • Then copied the resultant tabulated data to the section below the original tabulated data (so that there were two tables with exactly the same data, one below the other) 
    • In the lower table, starting with the first marker value in the SECOND row of data, I made this equal to the value of the same marker (i.e. same column) in the FIRST row in the table ABOVE minus the value of the same marker in the SECOND row in the table ABOVE (e.g. =G102-G103 was entered in cell G166). 
    • I copied this formula into every cell in the entire table below the first row. 
    • In short, I was subtracting each marker value from the one above it. This revealed the genetic distance (GD) for each individual marker compared to that marker's value belonging to the person immediately above it in the table. A string of zeroes indicated an exact match between a given member and the person above him. These identical matches formed the basis (or nidus) for a new genetic family. 

Four new genetic families were initially identified using this technique.

The modal haplotype [1] for each of these groups was determined and the member whose haplotype was closest to the modal was used as a basis for the identification of other “close” matches, which were subsequently added to the group.

Initially, members were grouped on the basis of GD alone, with some help in borderline cases from the TiP24 score. Thereafter, other criteria (e.g. same surname variant, same MDKA, same MDKA locations) were assessed to support the allocation of each individual to a particular group. Lastly, the data was assessed for the presence of “rare” marker values and SNP data was examined to make sure that the terminal SNPs in each group were consistent with the allocation of all members to that group.

As a double-check, a Genetic Distance matrix was generated using Dean McGee’s Y-DNA Comparison Utility (FTDNA Mode) and the results were examined to ensure that the same genetic families that were generated by the methodology above were also detected by the McGee utility. The results can be viewed by clicking here (to be inserted later). No unexpected findings were revealed and the McGee’s GD Matrix supported the identification of the 8 distinct genetic families previously identified. [2]

R1b-Genetic Family 1 (R1b-GF1)
  • The Modal Haplotype at 37 markers (MH37) is represented by 116740 (an exact match to the MH37) 
  • All members of this group differ by a GD of either 1 or 2/37 from the MH37, and the largest GD between any two group members is 3/37. 
  • Member 32988 has only tested to 25 markers and the GD from the MH37 is 1 in this case, with a TiP24 score (at 25 markers) of 97.5%, supporting his inclusion in this group. 
  • Possible “rare” marker values: none obvious 
  • The terminal SNP’s for the group members are consistent, with U198 (subclade R1b1a2a1a1c2a) being downstream of Z381 (subclade R1b1a2a1a1c) - see http://www.isogg.org/tree/ISOGG_HapgrpR.html
  • Following the allocation of 6 members to this group on the basis of their genetic data alone, it became clear that all but one shared the surname Farley, further supporting their allocation to this group. The remaining member (surname Ambrose) is highly likely to have an NPE (Non-Paternity Event) somewhere along his direct male line. In other words, his father’s father’s father’s line will eventually go back to an ancestor called Farley. 
  • MDKA information is incomplete and should be updated, but two members have Virginia (US) as a common location. 
Click for larger image

R1b-Genetic Family 2a (R1b-GF2a) 
  • The MH37 is represented by both 67960 and 95271 (both exact matches) 
  • All 4 members of this group are exact or very close matches, and the largest GD between any two group members is 1/37. 
  • Member 78131 has only tested out to 25 markers but has a GD of 0/25 and so is included in this group. 
  • Possible “rare” marker values: none obvious 
  • Terminal SNPs are all M269 (and therefore consistent). 
  • Following allocation on the basis of DNA results alone, it became clear that all 4 members shared exactly the same surname (Ferrell), thus supporting their allocation to the same group. 
  • MDKA information is missing for 3 of the 4 members and needs updating. 
Click for larger image
R1b-Genetic Family 2b (R1b-GF2b) 
  • R1b-GF2b was originally part of R1b-GF2a and consisted of people who were more distant matches to the MH37 of GF2a (and subsequently matches of those matches). I later split this group into 2a and 2b to separate the outliers from the core members of the group. 
  • Member 262693 was included despite a GD of 5 from the MH37 of GF2a, because he had a Farrell variant (Farley) and a TiP24 score of 76%. [3] This member is the only one in the group to have tested to 111 markers. If either of the two members who exactly match the GF2a MH37 test out to 111 markers this will help clarify whether or not member 262693 should belong in the group. 
  • Of note, 262693 shares many similar marker values to the previous member (155650) and they may eventually form their own genetic family. Member 155650 is likely to be an NPE due to the different surname (Kelley) but is included in this group because the TiP24 score vs GF2a MH37 is 96% (GD 4/37). Alternatively, the common ancestor could be prior to the common usage of surnames. It would be interesting to see if this family had a history of a possible name change along their direct male line.
  • Several additional members were added following a review of their GD from member 262693 and the GF2a MH37: 
    • 166772, Farrell 
      • GD 7/67, and TiP24 of 96% cf 262693
      • GD 6/37, and TiP24 of 83% cf GF2a MH37 (67960) 
    • 123181, Farrell 
      • GD 5/67, and TiP24 of 97% cf 262693 
      • GD 5/37, and TiP24 of 91% cf GF2a MH37 (67960) 
    • 237657, Ferrell 
      • GD 8/67, and TiP24 of 82% cf 262693 
      • GD 5/37, and TiP24 of 78% cf GF2a MH37 (67960) – this participant actually falls just below the threshold for inclusion but has been included for the moment 
    • 204990, Farrell 
      • GD 9/67, and TiP24 of 80% cf 262693 
      • GD 6/37, and TiP24 of 93% cf GF2a MH37 (67960) 
  • It must be borne in mind that the above additional members may be examples of convergence (given the relatively greater distance from 262693 and GF2a MH37) 
  • The group members are very widely dispersed around the MH37 for this group and in fact none of the members are an exact match for the MH, lending support to the possibility that they have been wrongly grouped together. The minimum GD between any two members is 1/37 and the maximum GD is 9/37, further emphasising that they are not a tight-knit group and some or all members may eventually be reallocated to another group. 
  • Possible “rare” marker values: none obvious 
  • The terminal SNP’s for the group members are consistent, with P312 (subclade R1b1a2a1a2) being downstream of both L23 (subclade R1b1a2a) and M269 (subclade R1b1a2) - see http://www.isogg.org/tree/ISOGG_HapgrpR.html
  • This subgroup does not have a dominant surname: Farrell (x3), Ferrell (x1), Farley (x1), Kelley (x1) 


R1b-Genetic Family 3 (R1b-GF3)
  • The modal haplotype is represented by member 176224 (an exact match) 
  • The other Ferrel member (91040) differs from the MH37 by a GD of 1 
  • The third member of the group (319357) has the surname Burk and also differs by a GD of 1 from the MH, with a TiP24 score of 99% so it is likely that this person has an NPE in their ancestral line. 
  • Possible “rare” marker values: 
    • DYS449 is 26 (occurs in only 1.7% of the general population, and only 0% of R1b – see 21st table – so this is very rare) 
  • The terminal SNP’s for the group members are consistent, with L21 (subclade R1b1a2a1a2c) being downstream of P312 (subclade R1b1a2a1a2) - see http://www.isogg.org/tree/ISOGG_HapgrpR.html
  • In fact, all 3 members appear to list the same MDKA (or at least the same MDKA location), which supports the idea that this third member has an NPE somewhere on his father’s father’s father’s line. 
Click for larger image

R1b-Genetic Family 4 (R1b-GF4) 
  • The modal haplotype is represented by member 307389 (an exact match) 
  • The members differ from the MH37 by a GD of 2 or 3/37 and from each other by a maximum GD of 5/37. The TiP24 score for these latter two members (167989 and 108691) is 92%. 
  • Member 108691 bears the surname Vance and is probably an NPE (GD from MH37 = 2) or is connected to the group via a common ancestor who existed prior to the common usage of surnames. 
  • All 4 members in this group have tested out to 67 markers so more comprehensive comparisons can be made. At this 67-marker level, the members differ from the MH67 by a GD of 1 to 4/67, and from each other by a maximum of 6/67, with the TiP24 score (at 67 markers) for the most distant individuals being 97%. 
  • Possible “rare” marker values: none obvious 
  • The terminal SNP’s for the group members are consistent, with L21 (subclade R1b1a2a1a2c) being downstream of both M269 (subclade R1b1a2) and M173 (subclade R1) - see http://www.isogg.org/tree/ISOGG_HapgrpR.html
  • No surname variant predominates in this group (Ferrell, Farris, Farrell, & Vance). This may change as more people upgrade their results to 37 markers and as new members join the project and are allocated to groups. 
Click for larger image

Following this initial allocation of project members to new genetic families, singletons [4] (with at least 25 markers tested and with a Farrell surname or variant) were assessed for closeness to any of the existing genetic families above or to any particular individuals. This process revealed one final genetic family (R1b-GF5) as detailed below. 


R1b-Genetic Family 5 (R1b-GF5)
  • There are only 2 members in this group. 
  • The modal haplotype is represented by member 20515 (a GD of 1/37) 
  • The other member (297075) differs from the MH37 by a GD of 3/37, and they differ from each other by a GD of 4/37. 
  • The TiP24 score between these 2 members is 98%, supporting their allocation within the same group. 
  • Possible “rare” marker values: 
    • DYS448 is 18 (occurs in only 8.2% of the general population, and 16% of R1b so this is reasonably rare) 
  • Both members have the same terminal SNP, namely L159 (subclade R1b1a2a1a2c1e1) – see http://www.isogg.org/tree/ISOGG_HapgrpR.html
  • It is quite conceivable that the surnames in this group (Farley & Farrelly) are variants of each other. 
  • Both members give their country of ancestral origin as Ireland but the MDKA information needs updating with specific locations. 
Click for larger image

Next time ... in about 3 weeks - I'm going on vacation to Trinidad for Carnival (smiley face) ... we'll be looking at those members who remain Ungrouped and what can be done to get them out of there and into a genetic family!




[1] The modal haplotype is a fabricated haplotype. It consists of the most frequently occurring value for each individual marker among the haplotypes of the members of a given group. It can be seen in the Colorized View of the Y-DNA Results below the minimum and maximum haplotypes (indicated by MIN and MAX respectively). It is likely to be an exact match or very close match to the haplotype of the common ancestor for that particular group.

[2] To use the McGee Utility, I exported the Y-DNA results as an .xml file (click on Export to Spreadsheet), opened Excel, clicked Open File and chose the .xml file, deleted columns 2-5 so that just the kit numbers in the first column were left, followed by the marker values for each of the individual markers, then I broke the multi-copy markers out into separate columns (using the text to columns function), unchecked the markers after DYD565 (which is the last one that corresponds with FTDNA’s panel), copied the entire table (without the marker headings) and pasted it into the box indicated, & clicked Execute. You can watch it on a YouTube video by clicking here.

[3] The threshold for inclusion of a member based on TiP24 scores varies from project to project. Some Administrators use a threshold of 60%, others 80%. I initially used a 60% threshold for same surname members, and a 95% threshold for non-same-surname members (i.e. possible NPEs).

[4] i.e. project members who had not yet been allocated and who remained in the Ungrouped category




No comments:

Post a Comment