Family Tree Reports
Background
Family trees are bread and butter genealogy. They are easy to create with graph methods because they are graphs. There are many perspective that genealogists use, some of which are outlined here.
Pedigree
Data Requirements
The report uses the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
The report creates a pedigree with the specified number of generations
The Neo4j Query
match path = (p:Person{RN:1})-[r:father|mother*0..5]->(a:Person) return path
Enter the RN of the Person desired and replace the 5 with the number of generations you'd like to return.
Pedigree Completeness
Data Requirements
The report uses the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
This report assesses the completeness of the pedigree at each generation for a specified person in your GEDCOM. If you or your DNA match have an incomplete pedigree then multiple common ancestors, if they exist, may not be exposed in analytics. You should be cautious in assigning a common ancestor when pedigree data is limited.
The Neo4j Query
match p=(n:Person{RN:1})-[:father|mother*..99]->(x)
with reduce(n=0,q in nodes(p)|n+1) as gen
RETURN gen, count(*) as Observed,2^(gen-1) as Expected,count(*)/(2^(gen-1)) as Ratio
order by gen
Change the RN to the desired person in your GEDCOM. The query counts the number of ancestors in each generation. Count(*) is an aggregation function and, in this case, it is by gen (the generation). The query uses the Neo4j Cypher reduce which iterates each generation and adds 1 for each iteration; this is a counter. It then uses this count to compute the expected number of ancestors, which is doubled each generation. Finally it computes the ratio or completeness of the pedigree at that generation.
Pedigree Collapse: Coefficient of Relationship
Data Requirements
The report uses the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
The coefficient of relationship is a measure of intermarriage within a family. The calculation compares two individuals in the same family tree and the number and positions in the family tree of their shared ancestors.
The Neo4j Query
return gen.rel.compute_cor(1,600) as COR
This function returns the coefficient of relationship.
return gen.rel.shared_DNA(1,341)
This function returns a detailed report with the propositi (2 persons), and a row each for the common ancestors, relationship through them, coefficient of relationship, actual and expected shared cm. The expected cm is the mean value from the Shared CM Project.
Sorted Pedigree with Ahnentafel
Data Requirements
The report uses the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
The report generates a pedigree with ancestors sorted in Ahnentafel order.
The Neo4j Query
MATCH (n:Person{RN:1})
match p=(n)-[:father|mother*0..99]->(x)
with x.fullname + ' [' + x.RN + '] (' + left(x.BD,4) + '-' + left(x.DD,4) + ')' as Name,
length(p) as gen,
reduce(srt2 ='', q IN nodes(p)| srt2 + replace(replace(q.sex,'M','A'),'F','B')) AS sortOrder,
'1' + reduce(srt ='', q IN nodes(p)|srt + case when q.sex='M' then '0' else '1' end ) AS Anh
with Name,gen,sortOrder,'1' + right(Anh,size(Anh)-2) as Ahnen
return Name,gen,sortOrder,Ahnen,gen.rel.ahnentafel(Ahnen) as Ahnentafel
order by gen,sortOrder
This query illustrates several concepts. The fist two lines traverse the pedigree graph, similar to queries above. The Name is a concatenation of the Person node properties. It then uses the reduce function to create two sortable variables. Both are based on the Person sex property. The reduce function produces a concantenated string (A's and B's) or bitstring (1's and 0's). Because of the order they are added they allow a sort that puts the returned list in pedigree / ahnentafel order. The last feature is the use of the user defined function gen.rel.ahnentafel which converts the Ahnen bitstring (a base 2 number) to a more human friendly base 10 traditional Ahnentafel number. This Ahnentafel function is used elsewhere in reporting.
Before we leave Ahnentafels, let's see what else is in the toolkit. The Ahnentafel is a concatenated bitstring that is, in numeric terms, a base-2 number. When we prepare some reports we that involve paths to a common ancestor, it takes a lot of text to get the fullnames into a report. So, it is more compact and easier to read if we use Ahnentafel numbers. It's qquite simple. Let's take one of you maternal 5 great grandfathers. As you traverse the path to him you collect an Ahnentafel of 11001110 which is 206 in Base-10. The Ahnentafel path is computed by sequentially truncating the bigstring as follows:
11001110 = 206
1100111 = 103
110011 = 51
11001 = 25
1100 = 12
110 = 6
11 = 3
1 = You
We have this user defined function
return gen.rel.ahn_path('11001110')
which returns the list [1, 3, 6, 12, 25, 51, 103, 206], sufficiently and unambiguously defining the path.
Descendant tree from specified ancestor limited to DNA Testers
Data Requirements
The report uses the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
The report generates a tree from a specified ancestor to descendants who were DNA testers.
The Neo4j Query
match (n:Person)-[z:Gedcom_DNA]->(m)
with collect(m.RN) + collect(n.RN) as DM
optional match path=(p:Person{RN:53})<-[:father|mother*0..99]-(q:Person) where q.RN in DM
with p,path,collect(last(nodes(path))) as cEnds
optional match (q:Person)-[r:Gedcom_DNA]->(s:Person) where q in cEnds
with r,p,[m in cEnds|m.fullname + gen.neo4jlib.RNwithBrackets(m.RN)] as E,[n in nodes(path)|n.fullname + gen.neo4jlib.RNwithBrackets(n.RN) + ' (' + left(n.BD,4) + '-' + left(n.DD,4) + ') ' ] as N
return distinct p.fullname + gen.neo4jlib.RNwithBrackets(p.RN) as MRCA, E as Descendant_DNA_Tester,size(N) as generations, N as Path_to_Descendant_Tester
Replace the RN (now 53) with the record number of the desired ancestor. This query utilizes the "last" function to pull out the last node in a path.
return gen.dna.ancestor_descendants_with_dna_test(53)
Patrilineal tree
Data Requirements
These report uses the relationships between Person nodes and the father relationship.
Purpose of the Report
There are three reports with differing perspective on the inheritance of the Y-chromosome. While there are small portions of the Y-chromosome with homologs on the X-chromosome subject to segregation, genealogists general concern themselves with the non-segregating portions that pass strictly father to son. The first report lists the patrilineal ancestral line of the specified person. The second lists the patrilineal descendants of a specified ancestor. The third lists all the patrilineal relatives of the specified man.
Patrilineal Direct Ancestors
MATCH (n:Person{RN:1})
match p=(n)-[:father*0..99]->(x)
with x.fullname + ' [' + x.RN + '] (' + left(x.BD,4) + '-' + left(x.DD,4) + ')' as Name,
length(p) as gen,
reduce(srt2 ='', q IN nodes(p)| srt2 + replace(replace(q.sex,'M','A'),'F','B')) AS sortOrder,
'1' + reduce(srt ='', q IN nodes(p)|srt + case when q.sex='M' then '0' else '1' end ) AS Anh
with Name,gen,sortOrder,'1' + right(Anh,size(Anh)-2) as Ahnen
return Name,gen,sortOrder,Ahnen,gen.rel.ahnentafel(Ahnen) as Ahnentafel
order by gen,sortOrder
This query is the same as the prevous one except it is traversing only the patrilineal line and ignores the matrilineal line.
Patrilineal Direct Descendants
MATCH (n:Person{RN:26493})
match p=(n)<-[:father*0..99]-(x)
with x.fullname + ' [' + x.RN + '] (' + left(x.BD,4) + '-' + left(x.DD,4) + ')' as Name,
length(p) as gen
with Name,gen
return Name,gen
order by gen,Name
This query starts with the specified ancestor and traverses the graph to all nodes (men) in patrilineal descendant line.
All Patrilineal Relatives
MATCH (n:Person{RN:5})
match p=(n)-[:father*0..99]->(patriarch:Person)
with patriarch
match p=(x:Person)<-[:father*0..99]-(patriarch:Person)
with x.fullname + ' [' + x.RN + '] (' + left(x.BD,4) + '-' + left(x.DD,4) + ')' as Name, length(p) as gen
with Name,gen
return Name,gen
order by gen,Name
When genealogist do patrilineal genetic research they want to test men on various branches of the family tree. This query finds them. It starts with any man and traverses the graph to the patriarch (most distant known patrilineal ancestor) and then traverses down to all his patrilineal descendants.
These patrilineal queries discover all the men in their perspective, but they do not order then in a hierarchical manner. To organize them in hierarchical order we need a more sophistocated query.
Matrilineal tree
The matrilineal tree queries are exactly the same as the patrilineal queries except we change 2 letters: father becomes mother and patrilineal becomes matrilineal.
X-Linked tree
Data Requirements
These reports use the relationships between Person nodes and the father and mother relationships.
X-linked tree are fun in graph queries! The X-chromosome is transmitted by fathers only to daughters. Thus, there is no father to son transmission. The X-chromosome has crossovers in women and segments segregate. Thus, over the generations there is a diminished portion of the X-chromosome from female ancestors. With more distant female ancestors there may be no inherited X-chromosome segments.
X-chromosome inheritance by descendants
match p=(n:Person{RN:32})<-[:father|mother*..99]-(m)
with m, reduce(status ='', q IN nodes(p)| q.sex + status ) AS c,
reduce(srt2 ='', q IN nodes(p)| q.RN + '|' + srt2) AS PathOrder,reduce(n =0, q IN nodes(p)| n + 1) as generations where c=replace(c,'MM','')
with distinct m.fullname + ' [' + m.RN + ']' as Fullname, c as Path, PathOrder, generations, '1' + replace(replace(c,'M','1'),'F','0') as Ahnen
return Fullname as Descendant,Path,gen.rel.ahnentafel(left(Ahnen,size(Ahnen)-2) +'1') as Ancestor_Ahnentafel, PathOrder,generations
order by generations,Ancestor_Ahnentafel,PathOrder
The query starts with an ancestor and traverses their descendancy tree. It collects sex (M or F) as it tranverses the family tree and filters out any bitstring containing MM which represents a father to son transmission. The Ahnentafel reported is the position of the ancestor in the descendant's family tree.
gen.rel.X_chromosome_inheritance_from_ancestor(32)
This PlugIn function creates an Excel worksheet with the X-linked descendants using the query above.
X-chromosome direct ancestors
MATCH (n:Person{RN:1})
match p=(n)-[:father|mother*0..99]->(x)
with x.fullname + ' [' + x.RN + '] (' + left(x.BD,4) + '-' + left(x.DD,4) + ')' as Name,
length(p) as gen,
reduce(srt2 ='', q IN nodes(p)| srt2 + q.sex) AS c,
'1' + reduce(srt ='', q IN nodes(p)|srt + case when q.sex='M' then '0' else '1' end ) AS Anh
where c=replace(c,'MM','')
with Name,gen,'1' + right(Anh,size(Anh)-2) as Ahnen
return Name,gen,Ahnen,gen.rel.ahnentafel(Ahnen) as Ahnentafel
order by Ahnentafel
The query starts with a propositus and traverses their tree to direct ancestors. It collects sex (M or F) as it tranverses the family tree and filters out any bitstring containing MM which represents a father to son transmission. The Ahnentafel reported is the position of the ancestor in the propositus' family tree.
All relatives potentially sharing the X-chromosome
match p=(p1:Person{RN:1})-[:father|mother*0..99]->(x)
with x,reduce(srt2 ='', q IN nodes(p)| srt2 + q.sex) AS c
where c=replace(c,'MM','')
with collect (x.RN) as RNs
match m=(p2:Person)<-[:father|mother*0..99]-(y) where p2.RN in RNs
with y, reduce(srt3 ='', s IN nodes(m)| srt3 + s.sex) AS cc
with y where cc=replace(cc,'MM','')
match sp=shortestpath ((y)-[:father|mother*0..99]-(z:Person{RN:1}) )
with y,sp,reduce(ss='', t in nodes(sp)| t.RN + '>' + ss) as Path
return distinct y.fullname as Name, y.RN as RN, length(sp) as genetic_distance,Path order by genetic_distance,Path
The query starts with a propositus and traverses their tree to direct ancestors and then traverses back down the tree of all these ancestors. It collects sex (M or F) as it tranverses the family tree in either direction and filters out any bitstring containing MM which represents a father to son transmission. The genetic distance is the number of hops in the family tree to the relative. Note that the propositus RN appears twice in the query.
return gen.rel.X_chromosome_all_relatives(210)
Deteriming whether two person might share an X-chromosome segment is determined by traversing the family tree between them seeking a path that avoids father to son edges. This function takes record numbers (RN) of two individuals and returns the genetic distance and the RNs of those in the familty tree path.
match p=(p1:Person{RN:1})-[:father|mother*0..99]->(x) with x,reduce(srt2 ='', q IN nodes(p)| srt2 + q.sex) AS c where c=replace(c,'MM','') with collect (x.RN) as RNs match m=(p2:Person)<-[:father|mother*0..99]-(y:Person{RN:600}) where p2.RN in RNs with y, reduce(srt3 ='', s IN nodes(m)| srt3 + s.sex) AS cc with y where cc=replace(cc,'MM','') match sp=shortestpath ((y)-[:father|mother*0..99]-(z:Person{RN:1}) ) with y,sp,reduce(ss='', t in nodes(sp)| t.RN + '>' + ss) as Path return distinct y.fullname as Name, y.RN as RN, length(sp) as genetic_distance,Path order by genetic_distance,Path
This UDF uses to above an returns only the shortest possible genetic distance between two persons in the family tree. It returns zero if there is no path. This is used during the loading of data to create the x_gen_dist property in the match_by_segment relationship. This reveals that some reported shared X-segments are not tenable because there is not path. This can be used to simplify shared X analytics.
return gen.dna.x_chr_min_genetic_distance(1,600)
End of Your Lines
Data Requirements
This reports use the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
This report lists your most distant known ancestor in each line.
The Neo4j Query
MATCH p=(n:Person{RN:1})-[:father|mother*..99]->(x:Person) where (x.uid is null or x.uid=0)
with x.RN as RN,x.fullname as Name,length(p) as generation,reduce(srt2 ='', q IN nodes(p)| srt2 + replace(replace(q.sex,'M','A'),'F','B')) AS sortOrder,'1' + reduce(srt ='', q IN nodes(p)|srt + case when q.sex='M' then '0' else '1' end ) AS Ahn
with RN, Name,generation,sortOrder, Ahn, gen.rel.ahnentafel(Ahn) as Ahnentafel
return RN, Name,generation,sortOrder, Ahn,Ahnentafel
order by Ahnentafel
The query traverses you graph until it reaches an ancestor with no union for his parents.
Double Cousins and Other Relationships
Data Requirements
These reports use the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
Double cousins are more common that you might suspect. This query finds them and reports them out as pairs.
The Neo4j Query
Match (p:Person), (q:Person)
match path=(p:Person)-[:father|mother*..2]->(CA)<-[:father|mother*..2]-(q:Person)
with p.RN as RN1,p.fullname as Name1,q.RN as RN2,q.fullname as Name2, count(*) as MRCA_Ct
where MRCA_Ct>3 and p.fullname=replace(p.fullname,'MRCA','') and q.fullname=replace(q.fullname,'MRCA','')
return distinct RN1,Name1,RN2,Name2,MRCA_Ct order by RN1
The query traverses you graph until it reaches granparents shared by two persons. But it limits the returned rows to pairs who share four grandparents, the defining feature of double cousins. You can modify the query to find those who share 3 grandparents (1½ cousins).
There are other queries that specifically return persons related in a specific way. Note how the queries constrain the results. Here are some of the poosibilities.
Siblings:
match p=(n:Person{RN:1})-[:father|mother]->(m) match (m)<-[:father|mother*..1]-(s) return distinct n.RN,s.RN,s.fullname
or, the set up a more generic approach using the length of the path:
match p=(n:Person{RN:1})-[:father|mother*..2]->(m) where length(p)=1
match q=(m)<-[:father|mother*..1]-(s) where length(q)=1 return distinct n.RN,s.RN,s.fullname
By varying the length and depth of the path we reach other relatives.
Aunts and uncles
match p=(n:Person{RN:1})-[:father|mother*..2]->(m) where length(p)=2
match q=(m)<-[:father|mother*..1]-(s) where length(q)=1 return distinct n.RN,s.RN,s.fullname
Cousins
match p=(n:Person{RN:1})-[:father|mother*..3]->(m) where length(p)=2
match q=(m)<-[:father|mother*..3]-(s) where length(q)=2 return distinct n.RN,s.RN,s.fullname
1st and 2nd cousins
match p=(n:Person{RN:1})-[:father|mother*..4]->(m) where length(p)=3
match q=(m)<-[:father|mother*..4]-(s) where length(q)=3 return distinct n.RN,s.RN,s.fullname
We can then bring in those related by marriage, such as
brothers and sisters in law
match p=(n:Person{RN:1})-[:father|mother*..2]->(m) where length(p)=1 match q=(m)<-[:father|mother*..1]-(s) where length(q)=1 match (s)<-[:spouse]-(t) return distinct n.RN,s.RN,s.fullname,t.RN,t.fullname
Shortest Path between two Persons
Data Requirements
These reports use the relationships between Person nodes and the father and mother relationships.
Purpose of the Report
This reports traverses relationships of any type and is useful in finding those related by marriage or, when data enables, friends, associates and neighbos (FAN groups). The latter is not yet incorporated into Graphs for Genealogists.
The Neo4j Query
Match p=shortestPath((a:Person{RN:1})-[:father|mother|spouse*..99]-(b:Person{RN:4433}))
unwind(nodes(p)) as q
return q.RN as RN,q.surname + ', ' + q.first_name as Name,q.BD as BD,q.DD as DD,q.uid as UID
This query uses the shortestpath methods to travese between two individuals.