Pedigrees and Relatedness
Prerequisites: STAT 311; some programming experience preferred
Description: We will explore statistical theory and methodology as it applies to the study of (human) heredity. The overarching theme of the readings are (1) to compute measures of relatedness (kinship and inbreeding) and conditional trait (disease) risk based on known family trees and (2) to estimate relatedness given dense SNP or entire genome sequence data. Readings will follow UW emeritus professor Elizabeth Thompson’s monograph “Statistical Inference from Genetic on Pedigrees” (SIGDP). We will cover conditional probabilities, likelihood models, Hardy-Weinberg equilibrium, the expectation-maximization algorithm to infer allele frequencies for the ABO blood group, Wright’s path counting formula, and identity by descent. During meetings we will work through practice exercises; for 1 or 2 meetings we will go through brief hands-on labs using current research software.
This site contains assigned readings and exercises. Accompanying videos, scripts, and data files will be shared with mentees through Google Drive. Readings and exercises are to be done the week of and discussed in the following week. Some assignments will be formally submitted to Canvas for course credit.
Students/mentees:
Before Term: Genetics
- Pigeonetics
- Genetics Home Reference
- SIGDP: Chapter 1 (pp. 1-9)
- Play Pigeonetics, and then read some of the other articles on pigeons at the website provided. Come to meetup ready to share 2-3 factoids about pigeons.
- Use the Genetics Home Reference to look up 1 or 2 traits or genetic diseases. Come to meetup ready to share 2-3 factoids about the trait(s) you researched.
- From E.A. Thompson’s SIGDP, translate Mendel’s first and second laws to common language (page 3) or state the laws as a biologist would. Do this without referencing Wikipedia or other sources.
Week 1: Kinship
- SIGDP 3.1, 3.2
- Path Counting Formula: pedigree-kinship-#.mp4 videos, StackExchange
- Cousin-ness: one, two
- We will practice the terminology of cousin-ness. See articles on cousin-ness to answer these questions. It may be helpful to draw family trees.
- My cousin has two children: Jack and Caroline. How are they related to me?
- My grandfather has a sister Florence. How is Florence related to me?
- My father has a female cousin, and she has a child Chasten. How is Chasten related to me?
- Solve the problems on final slide of the video pedigree-kinship-4.mp4.
Week 2: Inbreeding
- SIGDP 3.5
- ibd-states-#.mp4 videos
- Jacquard coefficients
- Inbreeding depression
- Twitter thread on Cleopatra
- British Royals
- Compute the kinship between Lady Louise Windsor and Prince George of Cambridge.
- Compute the kinship of Archie Harrison and Edward, Earl of Wessex.
- Compute the kinship of Lena Elizabeth and Charles, Prince of Wales.
- Cleopatra lineage
- Compute the inbreeding coefficient of Cleopatra III.
- Compute the inbreeding coefficient of Cleopatra IV.
- Compute the inbreeding coefficient of Bernice III.
Week 3: Genotype Probabilities
- Hardy-Weinberg Equilibrium
- SIGDP 2.3
- Khan Academy
- Genotype Probabilitties (Conditional on Relationships)
- SIGDP 3.6
- geno-probs-#.mp4 videos
- Eugenics
- Renaming the Fisher Lectureship: one, two
- An etymology of regression
- Understanding our eugenic past
- Exercises
- Hardy-Weinberg Equilibrium
- Take practice quiz at linked Khan Academy lesson.
- For the ABO blood types, the frequency of alleles A, B, and O are 3/10, 1/10, and 6/10, respectively. Compute the population genotype probabilities for AA, AO, AB, BB, BO, and OO.
- Genotype Probabilitties (Conditional on Relationships)
- Individual B has phenylketonuria, a rare recessive condition, for which the allele frequency is 1/100. Compute the probability that B's relative C has phenylketonuria if (a) C is a child of B or (b) C is a nephew of B.
- Errata: Review the arguments made on page 38 of SIGDP for the "Impossible region" in Figure 3.2
- Readings
- SIGDP 2.1, 2.4, 2.5
- Rachel Ferina's write-up and slides (top of page)
- Expectation-Maximization algorithm for gene counting (gene-counting-#.mp4 videos)
- STAT 516 and BIOST 550 lecture notes (link)
- Expectation-Maximization algorithm for mixture modeling (video)
- Exercises
- Implement in R (or your choice of program) the EM algorithm for Rachel Ferina's project.
- Use Google Scholar to find another example where EM is used in statistical genetics.
- Jot down any questions you have about the EM algorithm.
- Readings
- Multiway ibd
- SIGDP 3.4
- Watch videos multiway-ibd-#.mp4.
- ibd segments
- Watch videos ibd-segs-#.mp4.
- Wikipedia article
- ISOGG wiki
- Ancestry and Phylogeny
- "What is ancestry?"
- Khan Academy on phylogenetic trees
- "What is a tree sequence?"
- Exercises
- Multiway ibd
- Draw the ibd graph for 12 13 34 with genotypes Aa aa Aa.
- Draw the ibd graph for 11 22 33 with genotypes AA aa AA.
- Why is the ibd pattern 11 22 33 for genotypes AA aa Aa impossible?
- Is the ibd pattern 12 34 56 possible for genotypes AA aa AA?
- Write the formula for the probability of genotypes Aa aa Aa given the ibd pattern 12 13 34.
- List three possible genotype combinations for the ibd pattern 12 13 34. (There are in fact many possible.) (a) Draw ibd graphs for your chosen three genotype combinations. (b) Compute probabilities for these three genotype combinations conditonal on the ibd pattern.
- ibd segments
- Based on pairwise ibd segments (pink is 1 ibd and violet is 2 ibd), infer the relationship in ibd-segs-rel-1.png.
- Infer the relationship in ibd-segs-rel-2.png.
- Guess at the relationship in ibd-segs-rel-3.png.
- Guess at the relationship in ibd-segs-rel-4.png.
- Readings
- "Benchmarking Relatedness ..." paper
- hap-ibd paper
- IBDkin paper
- Reading Scholarly Articles
- "How to (seriously) read a scientific paper"
- Exercises
- Papers
- For the Ramstetter paper, summarize Table 2 and Figure 1 in two sentences each.
- Write down three ideas you found to be interesting in the hap-ibd paper.
- Describe the methodology of IBDkin in three sentences.
- Downloads
- RStudio
- Java Runtime Environment
- BEAGLE
- PLINK
- wget compiled is on Dropbox
- hap-ibd
- Readings
- Videos in this order: igsr-sdtemple.mp4, phasing-sdtemple.mp4, hapibd-sdtemple.mp4, ibdkin-sdtemple.mp4, plink-ibd-sdtemple.mp4
- VCF file format
- MAP file format
- Centimorgan (Wikipedia)
- Reference genome (Wikipedia)
- International Hap Map Project (Wikipedia)
- Exercises
- Describe the YRI and CEU populations.
- Download files under subfolders 'phasing-example' and 'relationships-example' in this week's Dropbox folder.
- Install 7-zip to unzip .gz compressed files. (link)
- Exercises
- Begin final project.
- See additional materials below related to your final project.
- Write and send questions about final project to mentor.
- Exercises
- Work on final project.
- Write and send questions about final project to mentor.
- Exercises
- Make and practice final presentation.
- Exercises
- Review Dropbox materials on hidden Markov models (HMMs).
- Write and send questions about HMM to mentor.
- Resources
- Sections I, II(A-B) of the HMM tutorial by Rabiner
- Chapter 3 (mainly 3.1-2) of Durbin's "Biological sequence analysis"
- Methods (until Simulated Data subsection), Appendix A of the Beagle phasing paper
- Slides on genetic distance and HMMs for ibd
- Chapters on Markov Chains and Poisson Processes from Durrett's "Essentials of Stochastic Processes"
Week 4: Expectation-Maximization Algorithm
Week 5: Identity-by-Descent
Week 6: Research Articles
Week 7: Computer Lab
This week we will conduct a lab (in-person or on Zoom). In the lab, we will use PLINK, BEAGLE, hap-ibd, and IBDkin to infer relationships from genetic data.
Week 8: Project
Week 9: Project
Week 10: Presentation
Additional Materials: Phasing and HMMs