Chromosomes - a closer look
We have 46 chromosomes, arranged in 23 pairs. Each pair has 2 copies, one of which you got from your mother, the other from your father. So for example, you have one paternal chromosome 14 and one maternal chromosome 14. Before you were conceived, your father made a copy of each of his 46 chromosomes but only passed on one copy from each pair to you. Similarly your mother made copies of all her 46 chromosomes but only passed on to you one copy from each pair. In this way the 23 chromosomes you got from your father combined with the 23 from your mother to bring your chromosome quotient back up to the usual 46.
|click to enlarge|
The 23rd pair is also known as the sex chromosomes. There are two types of sex chromosome - an X and a Y. At conception, if two X chromosomes combine, a female child is produced (XX). If an X and a Y chromosome combine, a male child is produced (XY). Women (XX) only have an X chromosome to pass on to their offspring, whereas men (XY) can pass on either an X or a Y to their offspring. Therefore the man's contribution decides the gender of the child. Women do not have a Y chromosome and so cannot do this particular DNA test.
Thus the Y chromosome is only passed on from Father to Son. This is why it is perfect for tracing the father's father's father's line and is the main type of DNA used for surname studies. Be aware though that it only assesses this single ancestral line, and if you go back 10 generations, this represents only 1 of your 1024 ancestors (which is equivalent to about 0.1% of your ancestors at that particular level).
Each of our 46 chromosomes consists of a long double-stranded helix of DNA. If we unwrapped it, it would look like a long ladder extending into infinity, or a railway track running from New York to Los Angeles. It's huge. If you untwisted all 46 chromosomes from a single cell, it would stretch for 2-3 metres (6-10 feet). All the untwisted DNA from the human body would stretch to the moon and back several times.
All along the "ladder" are the nucleotide bases, like rungs in the ladder, binding each strand of the helix to the other strand of the helix. The bases are called A, T, C, and G, after the first letters in their respective names - Adenine, Thymine, Cytosine, & Guanine. A only ever binds with T, C only ever binds with G. You can remember this by thinking the straight-sided letters only bind to each other, and the curved letters bind only to each other. Each base pair effectively forms a rung in the ladder.
|click to enlarge|
Because A only ever binds with T, and C only ever binds with G, if we know the sequence of bases on one strand of the helix, we automatically can tell what bases are on the other strand. Therefore, the sequence of bases along the DNA is only ever written as a single line of letters (e.g. ATCCGAATTGG). The sequence is read from what is called the 5' (5 prime) end of the DNA molecule (and is read toward the 3' end, like reading from left to right).
In each pair of chromosomes, the two copies (maternal and paternal) are virtually identical to each other in terms of size, length, morphology, etc. The exception is the sex chromosome pair, X and Y ... the X chromosome is 3 times bigger than the Y chromosome.
Although each chromosome in a pair is virtually identical, there are subtle differences between the nucleotide bases that run along the entire length. These variations in the bases are called mutations and can be identified because they occur at specific locations along the chromosome. These locations where mutations occur are referred to as DNA "markers". Each marker can be identified because it occurs at a specific position along the chromosome and thus can be given a particular name (e.g. DYS390 or Z255). People who share the same mutation may have inherited it from a shared Common Ancestor, and this is why DNA can be so helpful for genealogy.
A note on terminology: Y-DNA refers to the Y chromosome. Autosomal DNA refers to all the chromosomes EXCEPT the last pair (Pair 23, the sex chromosomes, X and Y - all the other chromosomes are called autosomes, hence autosomal DNA). Mitochondrial DNA refers to the DNA found in mitochondria (the "batteries" that power each cell). For a more detailed introduction to the three types of DNA test and how they are applied in genealogy, watch this YouTube video here.
The different types of DNA marker
There are two types of DNA marker - STR markers and SNP markers.
STR stands for Short Tandem Repeat and the key word here is "repeat". An STR marker is a sequence of bases repeated many times (e.g. CATCATCATCAT). In this example, the sequence is CAT and the repeat value of the sequence is 4. When the DNA is being copied before being passed on to any offspring, there are occasional mistakes made in the copying process. So for example, a copying mistake in the CAT sequence above might result in 3 repeats instead of 4, and so the value of that marker may shift from 4 in the parent to 3 in the offspring. This may be the first mistake to be made in this particular marker for many generations, and so not only will the male child differ from his father, grandfather, and great grandfather, but also from all his male siblings and cousins, who will all have a value of 4 for this particular marker.
The second type of DNA marker is the SNP marker, which stands for Single Nucleotide Polymorphism. The key word here is "substitution" - a single base at a specific location changes from what it normally is to a different base (e.g. an A changes to a C or a T or a G). Whereas the STR markers involve several bases in a row, the SNP marker only involves the substitution of a single base.
|click to enlarge|
Kelly Wheaton has written some excellent blog posts about DNA markers on the Y chromosome. You can read them by clicking here - STR markers & SNP markers.
There are some very important characteristics of STR and SNP markers which are key to understanding how they are applied in surname studies:
- Mutations in STR markers are written as the value of the marker (e.g. 12) whereas mutations in SNP markers are given names (e.g. Z255) or are written as the location on the chromosome followed by the change that occurred in the bases there. For example, 17349992 (G>A) indicates that a G has been replaced by an A at position 17349992.
- The mutation rate of STR markers varies from marker to marker. Some mutate relatively quickly (e.g. 1 mutation every 5 generations) whilst others mutate very slowly (e.g. 1 mutation every 500 generations). Mutations in slow-mutating markers are very good for studying human migration, whereas mutations in fast-mutating markers can be very useful for genealogy research (in the last 500 years or so).
- A big problem with STR markers is that they can mutate back as well as forward. So for example an STR marker may have a value of 4 which changes to a 3 and then back to a 4. The first mutation (4 to 3) may have occurred 1000 years ago, and the second one (3 back to 4) may have occurred 300 years ago. The trouble is that the Back Mutation masks the fact that there was a significant mutation 1000 years ago and this may result in people with the 4 value being assigned to the wrong branch of the human evolutionary tree and hence the wrong family tree!
- Another problem with STR markers is the Parallel Mutation. This happens when two very separate branches of the same family experience the same mutation "in parallel", giving the impression that the two branches are more closely related than they actually are in reality.
- A further problem with STR markers is that it is very difficult to identify a Back Mutation, or a Parallel Mutation. And as a result we don't know how often they occur. We suspect that it happens fairly frequently, perhaps as often as a marker value mutates forward it also mutates back. We really don't know. But such "hidden" back mutations may seriously confound our interpretation of the data and may result in people being placed on the wrong branches of the human evolutionary tree.
- Convergence is the name given to the situation when Back Mutations and Parallel Mutations on STR markers result in people appearing to be more closely related to each other than they actually are. This is a big problem when comparing people at 12 markers, but less of a problem when comparing at higher numbers of markers (e.g. 37, 67, or 111). However, even at 67 markers significant Convergence has been detected.
- On the other hand, SNP markers mutate much more slowly. And because there are so many of them, Back Mutations and Parallel Mutations are extremely rare (and easily spotted). For this reason, when using DNA markers to place people on the human evolutionary tree, SNP markers trump STR markers i.e. more reliance is given to SNP markers than to STR markers.
Y-DNA, Population Migration, & the Human Evolutionary Tree
Because the Y chromosome is passed on virtually unchanged from father to son, and because mutations in the DNA markers along the Y chromosome happen relatively infrequently, it is also an extremely useful tool for studying the last great human migration out of the African Motherland (about 50,000 years ago) that ultimately led to the populating of the entire planet. There is an excellent interactive animation of human migration here, including the various ice ages and the catastrophic eruption of the Mount Toba volcano that almost destroyed Mankind.
Population geneticists have been studying the evolution of mutations on the human Y chromosome (and on mitochondrial DNA) for many years and have developed an evolutionary tree based on these mutations (called the Haplotree). They refer to each of the major branches of the tree as Haplogroups and have named them after the letters of the alphabet (e.g. Haplogroup R, or its subgroup Haplogroup R1b). You can think of a Haplogroup as a group of people with a broadly similar genetic signature.
|click to enlarge|
As modern humans moved around Africa and then moved out of Africa and spread to different places around the world, the humans who moved to Europe developed a totally different set of mutations to those humans who moved to India or Australia (for example). Thus certain haplogroups are found more commonly in Europe (e.g. R1b, I2b) than in India (e.g. H, L) or Australia (e.g. C, T).
Furthermore, genetic genealogy is a very young science, and more markers are being discovered all the time (thanks to novel tests like the Big Y test from FTDNA). As a result, scientists are still discovering finer and finer sub-branches of the human evolutionary tree, and we are approaching the point where we will discover the finer branching patterns associated with individual surnames (such as those in the Farrell DNA Project).
The old nomenclature for the various branches of the tree used a long string of letters (e.g. R1b1a2a1a2c1e) but this has been superseded by a system that simply puts the main Haplogroup letter followed by the "terminal SNP" (e.g. R-Z255). You can still see both terminologies in use on the ISOGG tree.
The terminal SNP refers to the SNP marker that currently occurs at the end of a branch. The word "currently" is important because as new SNP markers are discovered the current terminal SNP marker is likely to be replaced with a new one, and we will continue to move further and further down the finer branches of the tree until we identify SNP markers that are specific for your own family branch and even single individuals.
This will eventually allow us to reconstruct family trees based on DNA marker mutations. These are sometimes called phylogenetic trees, sometimes cladograms or phylograms, but my favourite is Mutation History Trees because it sounds similar to Family History Trees. The difference between the two is that Family History Trees are constructed using named individuals, whereas Mutation History Trees use DNA markers. It should be possible to superimpose one upon the other and in this way we can look 'beyond the Brick Wall" of individual pedigrees and see where different family branches are likely to connect. This in turn will help focus further documentary research.
There are various groups working on the human evolutionary tree and they have produced their own version of the haplotree:
- The YCC Haplotree is produced by the Y-Chromosome Consortium. This is an academic effort and it is frequently out of date, being surpassed by the ISOGG tree which is updated much more frequently and harnesses the continuous output of genetic genealogists working on Haplogroup Projects. The most recent update of the YCC tree is from March 2015 but the tree itself is not user-friendly.
- The ISOGG tree is the result of the efforts of ISOGG (the International Society of Genetic Genealogy) who co-ordinates the analysis and interpretation of the findings from various Haplogroup Projects and as a result has developed a much larger tree than the YCC Tree. It too is quickly out-dated as the pace of new SNP marker discovery advances and further sub-branches are discovered. Project members in group R1b-GF3 can click here and search (Cmd+F or Ctrl+F) for FGC11134 to see where this particular sub-branch sits on the main Haplogroup R branch.
- Several of the commercial companies have developed their own haplotrees which at times may be more advanced than the ISOGG tree, and at times less advanced:
- FTDNA tree - this can be accessed from the Haplotree & SNPs page of your personal FTDNA webpage
- YFULL Experimental Tree - YFULL is a company that offers SNP testing and will interpret the results of SNP testing carried out by other companies. This tree is relatively easy to navigate but again requires use of the Find function (Cmd+F or Ctrl+F).
- FGC tree - like YFULL, FGC (Full Genomes Corporation) also offer SNP testing and interpretation. The visual presentation of the tree is not easy to navigate.
- Haplogroup Project Administrators work at the coal face of scientific discovery in relation to the finer branches of their own particular haplogroup project. Many such projects have developed a tree for their own small corner of the wider human evolutionary tree and update the draft tree periodically as new member results come in to the project. Usually you have to sign up to the project to access these updates. It is important to appreciate the pivotal role that Haplogroup Project Administrators are playing in the ongoing discovery of the finer branches of the tree. Surname Project Admins will work closely with Haplogroup Project Admins to advise their project members regarding which tests to take next and why.
- Alex Williamson's "Big Tree" is a tree that specifically focuses on the Haplogroup R-P312 branch of the human evolutionary tree. Alex has done incredible work placing newly discovered SNP markers in their best estimated position on the tree, and most importantly for us, creating a visual representation that is easy to navigate and makes the current state of the tree so much more understandable. The members of R1b-GF3 feature here too, in the FGC11134 subsection (see diagram below). There are two interesting features to Alex's tree:
- if you click on the name of any individual, an analysis of their unique genetic signature comes up. Here is the analysis for member N74958 showing his position on the tree, his unique mutations, and his putative haplotype progression (i.e. the estimated progression of his mutations from previous ancestors).
- the Overlay STR Feature allows you to compare the results for all STR markers (one by one) across the whole group. Here it is for DYS439.
|The FGC11134 subsection of the Big Tree with R1b-GF3 member 176224|
You may have to read this several times before a lot of the information sinks in but stick with it - it's worth it! Knowing the basics behind the science of Y-DNA and how it can be applied will help you understand a lot of the discussion about SNP testing and Big Y results that will follow in subsequent posts.