Tuesday, 20 January 2015

Criteria for allocating members to specific Genetic Families

Genetic Families are simply groups of people who have been grouped together because their genetic signatures are very similar to each other, suggesting a probable common ancestor within the previous several hundred years. Other indicators (such as similar surname, and other genealogical data) help corroborate this allocation. It may be possible (using traditional genealogical research and perhaps further DNA testing) to identify who this common ancestor is.

The Farrell DNA project currently has 88 members who have taken a Y-DNA test, of whom 31 have recently been allocated to new Genetic Families. The names of the new Genetic Families start with the broad Haplogroup (Hg) name (or Haplogroup subgroup name) followed by a specific number e.g. R1a - Genetic Family 1 (shortened to R1a-GF1).

Click for larger image

Who currently remains Ungrouped?

The DNA project currently has 57 members who remain ungrouped. Of these, 16 members have only tested out to 12 markers. All of these members have been left in the “Ungrouped” category as testing to only 12 markers does not provide enough information to reliably allocate these members to a specific genetic family. In addition, 5 members have only tested to 25 markers and most of these remain unallocated to a specific genetic family. Those ungrouped members who have only tested to 12 or 25 markers will need to upgrade to 37 markers before allocation can be reliably attempted.

Singletons (those with no close matches) have been placed in the Ungrouped category. This applies to the one member belonging to Haplogroup (Hg) E, the 2 members in Hg G, 3 members in Hg I, and 52 members in Hg R. For ease of reference, the Ungrouped category is further subdivided into “Ungrouped” (containing anyone who belongs to Hg R) and “Ungrouped (non-R)” which contains anyone in any of the other haplogroups.

The number of people in this large Ungrouped category will gradually fall as more people join the project and Ungrouped members can be paired up with a close match. This is why it is so important for those who have only tested out to 12 or 25 markers to upgrade their test to 37 markers (the Y-DNA-37 test). To upgrade, just follow the instructions on the Welcome page here … How to Upgrade

Non-Farrell members (i.e. those that do not have a Farrell surname or variant) are usually left in the Ungrouped category unless there is a very close match with a Farrell member, in which case it is highly possible that there has been an NPE (non-paternity event) somewhere along the direct male line going back to 1300 (when surnames began to be commonly used). Alternatively this could be a chance finding (e.g. an example of convergence) or the two people may be related via a common ancestor before the common usage of surnames (i.e. prior to 1300). 

Criteria for Allocation

Allocation of a particular member to a specific genetic family is based on the presence of some or all of the following criteria. These can be considered to be markers or indicators of a possible close connection, and  the more criteria that are present, the more likely that there is a real relationship between those members in that family within a genealogical timeframe (i.e. since the common usage of surnames, or about the last 700 years or so). These criteria consist of both traditional genealogical indicators as well as genetic indicators:
  1. The member has the surname Farrell or one of its putative variants
  2. The Genetic Distance (GD) between two members indicates a close or very close relationship e.g. 0-2 at 37 markers (the member's haplotype which is closest to the group modal haplotype is used as the main comparator)
  3. The TiP24 score [1] is >80% when compared against the group modal haplotype (useful for more distant matches e.g. GD = 3 or 4 at 37 markers) [2]
  4. The presence of "rare" marker values among group members [3]
  5. The results of SNP testing (if performed) are consistent among the members of the particular group (i.e. there is no evidence that some are on separate branches of the Y-Haplotree)
  6. The same surname variant is present / predominant in the particular group - see R1b-GF1 (Farley) and R1b-GF2a (Ferrell)
  7. The same MDKA is present in the particular group (e.g. see R1b-GF3)
  8. The same MDKA location is present in the particular group

You will see from the above criteria how important it is to include information about your MDKA (Most Distant Known Ancestor, a.k.a. Earliest Known Ancestor), including birth & death locations. This information can be used as corroborative evidence to support the allocation of members to a specific genetic family. As described in the previous blog post, the format for entering this information should be as follows:
John Farrell b1862 Longford d1926 New York

If you have specific questions about these criteria or any other aspect of the project, please email me or post them below in the Comments section, or post them on the Farrell Clan Facebook page. I will answer all questions either individually or devote a blog post to each one (so that everyone can benefit from the answer).

Next we’ll take a closer look at each of the new Genetic Families, one by one.

Maurice Gleeson
20 Jan 2015

[1] The TiP24 score is the value obtained from the TiP Report at 24 generations with the following settings: 1) comparison set to the 37-marker level; 2) default settings (i.e. they do not share a common ancestor more recently than 1 generations ago; display every 4 generations). In this situation, the TiP Report is not being used to estimate the time to most recent common ancestor (TMRCA) but rather as a more accurate estimate of relative closeness than merely GD (Genetic Distance). This is because GD does not take into account the variable mutation rates of markers whereas the TiP Report does. This technique was developed by James Irvine and is used in his Clan Irwin Surname DNA Study (https://www.familytreedna.com/public/irwin).

[2] The TiP24 score was set deliberately high for this initial phase of allocation in order to ensure that members in each group are highly likely to be closely related to each other and to minimize any risk (that there might be) of convergence. The TiP score might be relaxed at a later stage or in certain circumstances to allow more “outlying” members to enter each GF, or these might be allocated to a “b” version of the family, as has been done for R1b-GF2.

[3] "Rare" marker values can be considered to be values that are only shared by a very small percentage of people in that particular Haplogroup (or Haplogroup subdivision), and can be used to "define" a particular genetic family. The presence of these so-called "rare" values in a specific individual almost automatically indicates to which genetic family that individual belongs, frequently without any need to look at his other marker values. Kelly Wheaton is able to predict her Wheaton Group B participants on the basis of their first 5 markers as 3 of them are rare. This is a fine example of a very unusual situation where just testing to 12 markers is sufficient for allocation of a member to a particular genetic family - if your surname is Wheaton and you have the 3 rare marker values then you are 99.99% likely to belong to Wheaton Group B. The frequency of marker values in the general population can be accessed via the Sorenson Molecular Genealogy Foundation database and the YHRD database, whereas the marker value frequencies in some of the most common haplogroups (E3a, E3b, G, I J2, R1a, and R1b) can be accessed via Leo Little’s Y-STR Allele Frequency tables. The threshold for qualifying as rare may be quite arbitrary - Roberta Estes uses a cut-off of <25% for rare, and <6% for very rare, whereas Robert Casey uses weighted values to estimate the rarity of individual marker values and entire haplotypes.

No comments:

Post a Comment