Tuesday, 2 February 2016

The Origins of R1b-GF3

This is the first of a number of blog posts looking specifically at each of the "genetic families" identified so far in the project, and what the DNA of the members tells us about where each genetic family came from.

When we talk of our "Ancestral Origins" we have to keep in mind that this means different things at different times. For example, for those of us living in the US, our family may have lived here for several generations, but before that one line of the family may have originated from Ireland (for example) in the early 1800's, and we may have lived there for many centuries before that. And then when we get back to 500 AD, that particular ancestral line may have been part of a band of Vikings from Denmark, and going back even further, we may have emerged from Russia, 25,000 years ago, and Africa 50,000 years ago. So the concept of "origins" is relative to the time period in question. For most people doing genealogical research, the concept of "origins" stretches back about 500 years but not much further than that.

In previous blog posts, we have explored the possible origins of the Farrell surname based on what has been written in surname dictionaries (see here for the Farrell surname, and here for its potential variants). We know that Farrell is found in Ireland primarily, but there are many potential variants in both Ireland and the UK. Whether or not all these names are genetically related is the reason why we started the Farrell DNA Project in the first place. 

Our first group ... R1b-GF3
There are currently 8 Genetic Families in the project - two families belong to Haplogroup I1, one family belongs to Haplogroup R1a, and five families belong to Haplogroup R1b. Many project members who currently find themselves in the Ungrouped category are likely to be paired off with new members to form new genetic families as time goes on. In this way, the number of genetic families identified is likely to expand.

In this blog post, we are just looking at the family called "R1b-Genetic Family 3" (R1b-GF3 for short). We will look at the other genetic families during the course of the next several months. Below is a screenshot of the three members in R1b-GF3, and their Y-DNA results (out to 37 markers).

The 3 members appear to have the same MDKA (Most Distant Known Ancestor)
The first 37 markers show only two mutations altogether (1 pink, 1 purple)
(click to enlarge)

Indicators of Potential Relatedness
I have discussed in a previous blog post the rationale for why the three members of this group were grouped together. To recap, these included, first and foremost, the fact that the Genetic Distance (GD) between any two members is very small.  In fact, the genetic signature of the three members is virtually identical. There is only 1 or 2 mutational differences between the three members. 

After grouping them based on their GD, it became apparent that all three members seemed to have the same MDKA (Most Distant Known Ancestor) - one Gabriel Ferrel, who lived in Botetourt County, Virginia (about 1741-1803). This was very encouraging because they were initially grouped together on the basis of the similarity of their DNA profiles (their genetic signatures) and it only became apparent afterwards that they had a similar MDKA. Thus the genealogical evidence lends support to the original genetic evidence, further bolstering the contention that they are all closely related to each other. 

Another "post-grouping discovery" was that they all share a rare marker value of 26 for marker DYS449. This too supports the suggestion of a "close relationship" (or at least one within a genealogical timeframe, or in other words, within the last 1000 years). You can read more about these Indicators of Potential Relatedness for R1b-GF3 in my previous blog post here.

Origins based on Traditional Genealogy
Member 176224 of R1b-GF3 was kind enough to write a short summary of what is known about their MDKA, Gabriel Ferrel:
Gabriel Ferrel is our first known American ancestor, born c. 1741-43 (this is a guess).  He may or may not be our first immigrant ancestor. In theory, his family came from Longford, Ireland. Gabriel was a soldier in the French and Indian War enlisting in Lunenburg County, Virginia. He and wife Anne established a small plantation on Goose Creek in Botetourt County, Virginia, where they raised wheat, flax, and tobacco, and were holders of slaves (noted in 1804 will). Gabriel and Anna Haynes Ferrel raised four children, that we know of Stephen, Mildred, Abner, and Elizabeth. Gabriel was in Bedford County Virginia prior to Moving to Botetourt.

The Virginia plantation was sold in the early 1900’s by descendents of Gabriel Ferrel to purchase Medical Hall Mansion, in Maryland. Built in the 1700’s, Medical Hall was the home of Dr. John Archer, the first medical student to graduate in Philadelphia, and in the American colonies. William Hanyes family website: http://www.haynesfamily.com/ . Gabriel's will: http://www.gabrielferrel.citymax.com/wills.html
This information tells us that there is family lore of origins in Longford.

Origins based on the Most Distant Known Ancestors (MDKA)
As mentioned above, all three members have the same MDKA (Most Distant Known Ancestor) - Gabriel Ferrel, from Botetourt County, Virginia (about 1741-1803). So in this particular instance the MDKA for each individual is the same as the MRCA for the entire group (MRCA being the Most Recent Common Ancestor). This is fortuitous and gives us a place of origin for the members of this genetic family - Botetourt County, Virginia in the mid-1700's. 

Usually we are not lucky enough to be able to identify a common point of origin for group members. In fact, none of the other genetic families in the project have a clearly defined ancestral homeland. So this particular group is lucky in this respect, and anyone who matches this group may also have roots going back to Virginia in the mid-1700's.  

We can also use the DNA data to give us an idea of when (but not where) the MRCA lived. The small Genetic Distance  (GD) between members suggests a common ancestor some time in the past several hundred years. Estimates of when he lived (using the GD based on the first 37 markers) are in the region of 3-6 generations ago, with a range (90% Confidence Interval, using approximate 5% and 95% limits) of 1 to 14 generations ago. Assuming 30 years between generations, this translates into 90-180 years prior to their average date of birth, but with a wide range of 30-420 years ago. And if we assume the three members were born about 1950 (on average), then the common ancestor's year of birth was about 1815 (range 1530-1920). Upgrading to 67 or 111 markers might give a better estimate of when the common ancestor was born (but within a still fairly broad range). However, because we already know the MRCA (Gabriel Ferrel) and we know his year of birth (1741 or 1743), we can say that the estimated year of birth (1815) is out by about 74 years ... but still within the estimated range (1530-1920).

The information so far tells us that we have origins for the group in mid-1700's Virginia. But can we find out anything about where they came from before that?

The answer is yes. There are several clues that we can extract from the data. And the first place to look for clues is in their matches.

Origins based on the Matches' Surnames & MDKA Birth Locations
The 3 members of R1b-GF3 have 13, 15 & 29 matches respectively at 37 markers. We can get a clue to the origins of this group based on the surnames of the people that they match and the birth location of their matches' MDKAs. 

Here are the surnames they match and the origins of each match's MDKA ... where recorded. Most people have not recorded a location for their MDKA, which is very unfortunate, as this limits the usefulness of the comparison. Sometimes the MDKA is not known, but other times it is known but simply not recorded. If more people recorded the location of their MDKA's birth (or death) this would make this type of analysis much more informative. (Note that the number of separate individuals matched is only 1 in each case, apart from Lynch and McConnell, where there are 3 and 2 matches each respectively): 

Gillen ... ... ... ... ... Leitrim, Ireland
Henderson ... ... ... ... ... Baltimore, US
Knight ... ... ... ... ... ... Virginia, US
Lynch x3 ... ... ... ... ... Westmeath, Ireland
McConnell x2
McHugh ... ... ... ... ... Sligo, Ireland
O'Kelley ... ... ... ... ... Meath, Ireland
Vaughn ... ... ... ... ... Virginia, US

Many of their matches' surnames are distinctly Irish (10 names in 13 individuals, marked in green).  Others are distinctly English (10 names in 10 individuals, in blue), or Scottish (2 names in 2 individuals, in orange). What we are hoping for is a preponderance of a particular "nationality" among the surnames' origins, but we do not see this in this particular example. So based on the above analysis, the origins of R1b-GF3 could be equally Irish or English (with Scottish as an unlikely outsider).

We could repeat the exercise for matches at the 67 marker level and the 111 marker level. Matches at these higher levels of resolution might be more "exact" and perhaps we should rely on them more. Only member 176224 has tested above 37 markers. He has tested to 111 markers. He has 43 matches at 67 markers and 5 at 111 markers (Bailey, Kearns, Lynch, McConnell, & Meade - 3 Irish, 2 English).  So even at the higher level of testing, the likely origins of R1b-GF3 remain elusive.

Origins based on Surname Distribution Maps
We can identify the origins of different surnames (and their relative frequency) at the excellent PublicProfiler website. This site also gives you surname distribution maps (keep on clicking on the maps to get to country and regional levels). Other sources of surname distribution maps include the Irish Times Ancestors website and the Forbears website.  Also, Howard Mathieson runs the excellent Facebook group Surname Distribution Maps which is a great place to get information and ask for help.

We could do a Surname Mapping exercise and generate a surname distribution map for each of the 26 surnames in the list (based on data from the 1850's for Ireland, and 1881 or 1901 for the UK). This would tell us where each of the surnames was most prevalent in the mid- to late-1800's and this area of highest prevalence might correspond with the origins of each surname. Then by looking at the degree of overlap among the various maps, we might be able to narrow down the possible ancestral homeland for our R1b-GF3 group to a much more localised area, an area where there is the greatest overlap among the 26 surnames in our sample.

There would be no guarantee that this actually is the "place of origin" of the surname or the "ancestral homeland" for the members of R1b-GF3, but it might be. The trouble with our ancestors is that they tended to move around. Especially after the Industrial Revolution of the mid-1800's. So the highest point of concentration of a particular surname in the 1800's does not always correspond with its point of origin (which was 500-800 years previously). But, it might do. In fact, based on the laws of probability, it might be the "most likely" location for the "ancestral homeland". At the very least, it is a good place to start looking for further clues. There are other confounders and biases likely to be present (including Convergence and NPEs, such as secret adoptions or infidelities) which are likely to have skewed the data in some way, so any interpretation must be taken with a grain of salt.

Surname Distribution Maps - where's the overlap?

So here are the distribution maps for 4 of the 26 surnames in the list. Do you see a pattern? Where is the highest degree of overlap? Is it in north-west Ireland? What about Lancashire (or is this simply an area of high Irish immigration to England)? So are we looking at an origin for R1b-GF3 that is Native Irish, Ulster Scots, or Northern English? At this stage it is still too difficult to tell from the data we have reviewed thus far. And repeating this exercise with all 26 surnames may not make it any more clear. Let's come back to this later (if need be).

Another source of clues is your Ancestral Origins and Haplogroup Origins pages which you can access from your FTDNA Homepage (see below). These tell you which countries your matches have put down as their "Country of Origin". These can reveal a preponderance of one particular "nationality" among your matches. For R1b-GF3, there seems to be a slight preponderance of Irish origins but nothing extremely convincing.

Origins based on the Terminal SNP

Haplogroups & the Tree of Humankind
The next clue is to look at the terminal SNP for the group as this determines to which haplogroup and subclade group members are assigned. You will find your own terminal SNP in the fourth column on the DNA Results page for the project (under the heading Haplogroup). FTDNA assigns everyone to a specific Haplogroup (usually based on your STR markers, but occasionally with some confirmatory SNP marker testing). A Haplogroup is simply a group of people with a broadly similar genetic signature.  Most people in Western Europe belong to  R1b, most people in China are O, and most people in Africa are A, B or E1b1a.

One of the great advantages of DNA its that we can use it as a marker to map the evolution of different species of animal, including our own species. In effect, we can use it to build a family tree of all humankind - the human evolutionary tree. In recent years, we have started to characterise the finer, more downstream branches of this tree (the subclades) and as we do so, we are moving forward in evolutionary time to the extent that we are now beginning to identify sub-branches of the tree that arose within a genealogical timeframe (i.e. within the last 1000 years). Eventually we should be able to identify markers for (some) individual genetic lineages, such as R1b-GF3, and determine who their nearest genetic neighbours are. This will also allow us to confirm whether or not the genetic data is consistent with the information in the Ancient Genealogies (of Ireland and Scotland, for example).

Below is a map showing the evolution of the major Y-DNA Haplogroups since the appearance of Anatomically Modern Humans in Africa about 250,000 years ago. You can see that R1b emerged about 25,000 years ago, somewhere near modern-day western Russia. 

(click to enlarge)

Haplogroup R1b is the old terminology (but it is still used). R1b is now usually written as R-M269 as this is the new terminology: the Haplogroup letter - R in this case - followed by the "terminal SNP" i.e. the SNP marker that defines the most downstream branch identified so far in that particular individual. So the fact that the most members in the Farrell DNA Project belong to Haplogroup R1b suggests that their ancestors came from Western Europe. What a big surprise. We kind of knew that already. But this is merely a starting point. What we are really interested in is what sub-branch of R1b do individuals belong to. And to answer that question we can examine the terminal SNPs of the members' matches.

The Terminal SNP of Matches
With the advent of more extensive SNP marker testing, it is often possible to "guess" your own SNP before undertaking SNP testing.  Simply look at the terminal SNPs of your matches and that should give you a clue. Below are the first 15 matches for one of the members of R1b-GF3, sorted by Y-DNA Haplogroup in order to show the terminal SNPs of the matches. The most frequent terminal SNP is R-M269, but this is a very upstream SNP. The second most common SNP is FGC11134 and this is very much more downstream. This information suggests that it is highly likely that FGC11134 is the terminal SNP for the group R1b-GF3. Thanks to other people testing we can predict the terminal SNP for R1b-GF3. And in fact this turned out to be true because member 176224 did the Big Y test and the results came back positive for FGC11134.

Matches sorted by Y-DNA Haplogroup to show the terminal SNPs

As an aside, there are several ways to undergo SNP marker testing:
  1. you can test for individual markers (either with FTDNA for $39 or YSEQ for $17.50)
  2. you can test for a panel of SNP markers, such as ...
    • the R1b-M343 Backbone SNP Pack (139 SNP markers for $99)
    • or a more specific Deep Clade Panel (the Haplogroup Project Admins will advise on this; the price is usually about $120)
  3. you can test your entire Y-chromosome using an NGS (Next Generation Sequencing) test, such as FTDNA's Big Y test (which tests over 50,000 SNP markers for $575)
In general, the Haplogroup Project Administrator will advise on the best test to do.

The Importance of Haplogroup Projects
Another way of determining your SNP (especially if you have no matches) is to join the appropriate Haplogroup project. Everyone is encouraged to join any Haplogroup projects that are relevant to them. It's free, it's easy to do, and you are helping the advancement of science by doing so. You may also directly benefit from what these Haplogroup projects find, and the Administrators can provide you with useful information on what additional DNA testing would benefit you personally (and such targeted testing helps you avoid spending money on unnecessary tests). In essence, the Project Administrators will look at your Y-DNA data (your STR marker values) and compare yours to everyone else within the group. They will then assign you to a subgroup within the larger group. This subgroup (probably) represents a more downstream sub-branch of the human evolutionary tree. Some members of this subgroup may already have undergone SNP marker testing and it is likely that you would also test positive for the same SNP markers that they are positive for.
As a starting point, anyone who is assigned by FTDNA to Haplogroup R1b (a.k.a. R-M343 or R-M269) should join the R1b and Subclades project. The Project Administrators will in turn add you to additional projects that they think may be of benefit to you (or you may join such projects yourself). Such projects usually represent downstream branches of Haplogroup R1b, which are beautifully illustrated in the diagram below from Mike Walsh, one of the Administrators of the R1b and Subclades project.

Unfortunately none of the members of R1b-GF3 have joined the R1b and Subclades project yet (hint, hint), but two members have joined the Ireland Y-DNA project. This project has over 6600 members.  The administrators have grouped member 319357 into the subgroup R1b-L21 (a sub-branch of the larger group) and the other member (176224) into subgroup R1b-FGC11134 (a much more downstream sub-branch of R1b, represented by the orange group to the left of the midline and half-way down in the diagram above). In fact, this latter member has been assigned the terminal SNP FGC11134 as a result of taking the Big Y test (more on that below). Before the advent of SNP testing, Haplogroup projects were very useful (and still are) for predicting what terminal SNP you are likely to test positive for.

Participant 176224 is also a member of the FGC11134 project but it seems that not all his relevant matches have joined this project and therefore comparisons are limited. 176224 is grouped with a Kearns (kit 237483) and a Cairns (N116392) - one an Irish name, the other Scottish.

Placing the Terminal SNP on the Tree of Humankind
The old terminology for the FGC11134 sub-branch (or sub-clade) is the relatively unwieldy name R1b1a1a2a1a2c1c. The new terminology is simply R-FGC11134. The progression of the main SNP markers from the start of the R1b branch (and hence the major branching points along the tree) are as follows:
R-M343 > L389 > L752 > M269 > L150 > L23 > L51 > L11 > P311 > P312 > L21> DF13 > FGC11134
You can view the relative position of all these SNP markers on various websites including the FTDNA Haplotree (click on your "Haplotree & SNPs" button on your FTDNA Homepage), the ISOGG Y-DNA Haplogroup Tree, the YFULL Haplotree, and several others.  As these are largely experimental trees, with the branching pattern adjusted as new data comes in, there will be some discrepancies between the trees (but these will eventually be ironed out).

The YFULL Experimental Tree gives estimated timelines for the emergence of each SNP marker, and for FGC11134, they reckon it was formed about 4300 years ago. Here are the sub-branches that lie below FGC11134 but for now members of our R1b-GF3 group have not tested positive for any of these more downstream SNP markers.

Alex Williamson has done some incredible analysis of thousands of Y-DNA results from Haplogroup R1b and has generated a fantastic website called The Big Tree. Anyone joining the R1b and Subclades project can get added to the tree if they have sufficiently downstream SNP markers tested (usually via the Big Y).  Member 176224 of R1b-GF3 has been added to the tree and appears here. I also include a screenshot of the relevant section of the tree below.

Interestingly, Alex has postulated several additional terminal SNPs (i.e. downstream of FGC11134) specifically ZZ44 and several SNPs that do not have a name as yet so the location and bases are used instead, namely 19567651-T-C and 7716880-A-G. Also, Alex's SNP analysis suggests that member 176224 is closest to two people called Fitzpatrick, both of whom have Irish origin. I believe that both of these people appear as matches to member 176224 at the 67 marker level (with a GD of 7/67) but it is not possible to confirm this without knowing the matches' kit numbers.

Nevertheless, the evidence seems to point more toward an Irish origin.

Conclusions so far

  • The dominant surname variant in R1b-GF3 is Ferrel or Ferrell. We would assume from this surname variant that the origins of R1b-GF3 could be either Irish, English, or Scottish.
  • Family lore suggests that the origins are Irish.
  • The 3 members share the same MDKA - Gabriel Ferrel, 1741-1803, Virginia, USA.
  • The group's matches have surnames that are predominantly Irish (10) or English (10). The MDKA data from their matches is deficient and thus unhelpful.
  • Surname Mapping exercise may be helpful but a cursory review does not suggest a clearly-defined area of surname overlap.
  • All members should join the R1b and Subclades project and any additional downstream subclade projects of relevance, in particular the FGC11134 project.
  • FTDNA defines the terminal SNP for the group as FGC11134 which (according to the YFULL Tree) was formed 4300 years before the present.
  • Alex Williamson defines the terminal SNP as 7716880-A-G. This terminal SNP is shared with two people called Fitzpatrick.
  • The balance of evidence suggests that it is slightly more likely that the origins for this particular genetic family (R1b-GF3) are Irish. But where in Ireland remains to be discovered. It may be that a surname mapping exercise would be worthwhile as this may give further clues. Alternatively, as more people join the origins of R1b-GF3 will become apparent.

No comments:

Post a Comment