Genetics
PART 1 From “Mendel” to “Molecules”
Birth of Genetics
- Genotype determining phenotype, phenotype deducing genotype
- Trait/Character, Variant, Mutants
Single-Gene Inheritance
The Law of Segregation
- Alleles: Genes(described as “Particle” by Mendel) exist in different versions called alleles (e.g., an allele for purple flowers and an allele for white flowers).
- Separation: Diploid organism has two alleles, while only one in germs.
- Random Assortment into Gametes: The separation is random—each allele has an equal chance of ending up in a given gamete.
- Dominance and Recessiveness: If the two alleles an organism inherits are different (heterozygous), one allele (dominant) may mask the expression of the other (recessive). The recessive trait only appears if an individual inherits two copies of the recessive allele .
- In modern perspective, we know that in a diploid organism (with two sets of chromosomes), the two alleles for a gene are located on a pair of homologous chromosomes. During meiosis, when gametes are produced, these homologous chromosomes separate. Consequently, each gamete ends up with only one chromosome from the pair, and therefore, only one allele for the gene.
Allele
- Null allele: no function
- Hypomorphic allele: weak function
- Hypermorphic allele: over-functioned (overexpressed)
- Usually, Null and hypomorphic alleles are recessive
Molecular Mechanisms for Dominance
- Haploid insufficiency :mutation allele encodes an inactive protein; or, null allele;cells need high minimum level of the protein (e.g., >1/2 wt)
- “Spoiler” alleles: mutated protein interferes with the normal protein (e.g., in dimer)
- Gain-of-function alleles: mutated protein has a new activity that causes the phenotype(e.g., kinase no longer responds to negative regulation)
Sex-Linked Single-gene Inheritance
- Different reciprocal crosses(正反交)
- Not necessarily related to sex determination
- eye color of Drosophila
Inheritance Patterns
- Autosomal Recessive: Often inbreeding is implicated.
- Autosomal Dominant: Affected cases appear in every generation. De novo mutation in 1st generation is possible
- X-linked recessive disorders
- X-linked dominant disorders
- Y-linked disorders
Two-Gene Inheritance
Law of Independent Assortment
- During gamete formation, the alleles of one gene segregate into gametes independently of the alleles of another gene, assuming the genes are located on different chromosomes (or are far apart on the same chromosome).
Working with Independent Assortment
- Predicting the progeny ratio :The Product Rule & The Sum Rule
- How many progenies are needed? If one genotype has the probability of 1/256 to appear, can I find at least one targeted progeny from 256 total? ($\alpha=0.95$)

- Does the real data fit the expectation based on a hypothesis? Chi-square test: $\chi ^2 = \sum \frac{(O-E)^2}{O}$
- Synthesizing pure lines with combined traits:

Genetic Interaction
- Genes almost never act alone, but as a pathway
- Types of Interaction:
- Negative Interaction:sick + sick = sicker/death
- Positive Interaction: Mask( one mutation masks the expression of the other ) or Suppression (reduce the defect or reverse it to normal)

Recombination
- Morgan: Chromosomes are the inheritance carrier; Genes are the inheritance unit; Genes are the recombination unit; Genes are linearly arranged on chromosomes
- Chiasmata(染色体交叉点):
- DNA double strand breaks(DSB) are generated throughout the genome (mostly via specific enzyme, Spo11; rarely, via non specific DNA damage or replication stress)
- Probability roughly correlates with the DNA fragment length (distance between two spots)
- There are specific “hot spots” or “cold spots” for recombination in genome
- At least one chiasma per pair of homologous chromosomes is required for proper segregation, often more are present
- Genetic mapping:
- 1 m.u.= 1% RF (recombination frequency)
- relative order and distance, not absolute
- Measured RF is smaller than the calculated value because of double or more crossovers(lead to non-recombination)

Molecular Basis of Genetics
Finding of DNA as the genetic material
- Griffith (1928):
- S strain -> dead;
- R strain -> live;
- S strain heat-killed -> live;
- R strain + S strain heat-killed -> dead
- Avery(1944): S strain extract , destroyed different components, only DNA destroyed group lived.
- Hershey-Chase(1952): T2-Phage, S35 labeled protein and P32 labeled DNA.

- Chargaff’s Rule: A=T, C=G
- Franklin, Watson, Crick(1952): Double helix
Prokaryote/Phage (virus) Genetics
Bacteria Conjugation
- F factor transfers from donor (F+) to receipient (F-) via “rolling circle” replication
- Transfering is restricted to F factor
- Integration of F factor into the genome converts F+ cell to hfr cell ( integrating spontaneously at random site)
- hfr cell transfers part or all of its genome to F- via conjugation

- Imprecise excision of integrated F generates F’ cells :carrying another gene; useful for dominance test complementation, genetic interaction ……

Lac Operon
- Lac operon is transcribed only at the presence of lactose or IPTG

- Pajamo Experiment:
- act at the DNA level
- LacI acts as a repressor
- O is cis-acting; LacI is trans-acting
- Diauxic (diphasic) growth : use glucose first, lag phase, then use lactose
- catabolite repression :when glucose is high, cAMP is low. CAP cannot bind to the lac promoter, little mRNA. While glucose running out , cAMP is high. CAP-cAMP activates transcription of lac mRNA.

$\lambda$ Bacteriophage
- Lytic(溶菌周期) or Lysogenic(溶源周期)
- $\lambda$ repressor -> lytic; Cro -> lysogenic

Chromosome
Nucleosome
- Histone
- Not evenly distributed along chromosome, NFR are easily accessible(nucleosome free region)
- Histone modification: Acetylation, Writer(HAT)&Eraser(HDAC). Methylation, Ubiquitylation, Phosphorylation …….
- Histone Code: Different combinations of histone modifications create unique binding sites that can be read by transcription coregulators, thereby modulating transcription On and Off
- Chromatin remodeling: In addition to modify histones in nucleosomes in situ, nucleosomes with chromatin can also be remodeled via sliding(滑动), ejection(排出) or replacement(替换)

- ChIP : To detect Protein and DNA interaction. Cross-link; Break into pieces; Add antibody and pull down; Reverse cross-link; Purify DNA or Protein for analysis
- Epigenetic Inheritance:
- a change in phenotype that is heritable but does not involve DNA mutation
- cells with the same genotype could have different phenotypes that persist for many generations
- e.g: X–Inactivation: Y-chr. lost most of the genes on X; thus, male (XY) and female (XX) have different dosages for most X-chr. genes. Human females randomly inactivate one X in somatic cells (Barr Body); Drosophila males up-regulates the X-chr. genes by ~ 2X ———- Dosage Compensation.
Centromere
- Centromere guides the assembly of kinetochore(动粒) for microtubule binding
- Centromeres are determined by unique Cenp-A nucleosome (a histone H3 variant)
Telomere
- a special DNA+protein structure, protects the ends of the chromosomes. (T-loop) Without protection, the ends of chromosomes are exposed dsDNA breaks, which are high risk factor causing genome instability
- Telomerase, a special DNA polymerase (reverse transcription), dedicated to telomere replication

- Telomere shortening and aging:
- Stem cells have active telomerase and possess full-length telomeres
- Somatic cells inactivate telomerase and shorten the telomeres with sequential rounds of cell division, eventually lead to cell aging
Karyotype(核型)
- the number and the morphological features of chromosomes in the cell nuclei of a species
- Chromosome mutations: Translocation, Inversion, Deletion, Missing chromosome, Duplication, Extra chromosome

- Copy number change of chromosome:
- Euploid(整倍体 complete set): Bypassing chromosome segregation or Nuclear division (2n->4n)
- Aneuploid (非整倍体individual chromosome): Monoploid (recessive mutations masked in diploid are exposed); Triploid (problem in meiotic homolog-pairing); Tetraploid
- Karyotype changes occur frequently during evolution
[!NOTE]
Why is aneuploidy particularly detrimental?
Gene-dosage effect:
Compared to diploid (euploid), aneuploids have certain genes that are of lower (or higher) dosage, and are out of balance with the rest of the genome. Polyploids maintain gene balance, despite of higher dosage.
Thanks to sex chromosome compensation (X-inactivation), XO and XXY are viable, although with defects. Also the Trisomy 21, because Chr.21 is the shortest in human genome.
- Balancer Chromosome: contains multiple inversions; when it is combined with the corresponding wild-type chromosome, there can be no viable crossover products. So a specific combination of alleles on chromosome can be always preserved.

PART 2-1 Omics
Why is genomics important?
- Before genomics, we usually observe the phenotype first, then find the geneby genetic mapping. After genomics, we can delete/mutate genes directly and see what happens to the phenotype.
- Comparative genomics.
- Making it possible to study genomic elements other than coding genes. (Non-coding DNA)
- Biodiversity and evolution: it’s very cool to know the entirety of the genome of an organism and ponder the meaning of life.
- Precision Medicine
What is a genome?
- Genome is the aggregate of DNA/RNA information for an organism.
- Genome contains all the information necessary for an organism to replicate.
- Genome is organized and highly packed, with high-level interactions between different parts of the genome.
How do we sequence a genome?
Sanger sequencing – 1st generation

NGS– 2nd generation, The actual reading step is similar to that of the sanger sequencing, except in massive scale

Nanopore-3rd generation, Read about to millions of base pairs, but with more errors
How to Assembly a Whole Genome?
- Contig(重叠群) -> Pari-ends Read(配对末端读取) -> Scaffold(有序框架)
- Lander-Waterman Equation:
- $C=\frac{L \times N}{G}$
- C: coverage
- L: read length (bp)
- N: number of reads
- G: haploid genome length
- If we sequence a E. coli genome, suppose we make a paired-end sequencing run with L = 300, N = 3,113,148, and G = 4,578,159. The estimated coverage C would be: $\approx 204$
- Poisson Distribution:
- $P(X=x)=\frac{\lambda ^x e^{-\lambda}}{x!}$
- x: Number of times a genomic position is sequenced
- $\lambda$: sequencing coverage C
- 为了确保基因组99%的区域至少被覆盖1次(即覆盖度达到99%),我们需要多少平均深度?即1%的区域没有被测到,$P(0) ≤ 1\% = 0.01$,解得$λ ≥ -ln(0.01) ≈ 4.6$
The Application of Genomics
- Phylogenetics
- Comparative genomics: copy number variations (CNV)
- Genomic surveillance (基因组监测)
Functional Omics
- ATAC-seq: Tn5 切割-> 加上测序接头 -> 测序,开放的染色质有比较高的峰
- Hi-C :detects chromatin interactions
- Transcriptomics
- Proteomics: Mass Spectrometry, LC/MS, can also identifying protein posttranslational modification
- Metabolomics
PART 3-1 EvoDevo
- Evolution: survival of the fittest vs. Evodevo: arrival of the fittest
- EvoDevo: compare genes over a specific window of developmental process between species to infer how such process evolved
What is evolution?
- Change of heritable traits of biological populations over successive generations
Why do we study evolution?
- Reconstruct the history of gene, genomes, species
- Explain the cause of the change, learn the general principle and exceptions of the tremendous biodiversity
- Offer new ideas to other biology disciplines
What fuels evolution? Mutations!
- Genome Duplications (Polypolidy): evidence—HOX gene
- Chromosome fusion/fission/translocation/inversion:猫叫综合征
- Gene Duplication:转座子
- Substitutions & Indels :镰状细胞贫血(碱基替换)、苯丙酮尿症(3个碱基缺失)、Tay-Sachs病(1个碱基插入,严重移码)
- Alternative Splicing
- RNA-editing
How does evolution work?
- Natural selection acts on a POPULATION (not an individual) with variations
- Variation -> Fitter ones survive -> Change of allele frequency
- Sexual Selection may work in the opposite direction of natural selection
What is evodevo?
- Deep homology: Sharing of the genetic regulatory apparatus that is used to build morphologically and phylogenetically disparate animal features (HOX gene and sex-determination gene DMRT1)
- Toolkit genes: Genes or regulatory elements show deep conservation and control development of body parts and body building plan(Pax6 for eye development)
- Modularity: Development occurs through a series of discrete and interacting modules: morphogenic fields, signaling pathways, imaginal disk, cell lineages
- Heterotopy(location), Heterochrony(time), Heterometry(amount), Heterotypy(kind)
- Hourglass model: mid-embryonic represent the period of highest conservation

[!IMPORTANT]
发育与进化的区别?两者的联系?
区别:Linear Thinking vs. Tree Thinking
联系:EvoDevo 演化发育生物学
发育为演化研究提供素材(深层同源性、工具箱基因、异时发生与异位发生、发育过程本身限制了进化可能的方向)
PART 3-2 Evolution of Sex
What is sex?
- Genetic change between individuals
- Benefit of sex:
- DNA repair
- Sex produces genetic variation that can be exposed to selection
- Sex can make selection process more effective
- Disadvantages of sex:
- 2-fold cost of sex
- Producing the males, finding the mate, male-male competition
- Predation risk
- Disease spread
- Recombination load: break the combination of beneficial alleles
How is sex determined?
- XY (most mammals, Drosophila), ZW (snake, bird, butterfly, silkworm) and UV (algae) of sex chromosomes
- Environmental sex determination: bird, turtle, crocodile
- ==Masters change, slave remain==: whatever sex chromosome changes, Dsx/Dmrt1 gene controls sex. (ON-MALE, OFF-FEMALE, or in different isoform)
- Examples:
- Drosophila: X:A<=0.5, Male. Sxl Alternative Splicing
- Silkworm: ZW, Female
- Human: Sry, Sox9, Wnt4
- Red-earred slider turtle: High temperature, Female
How do sex chromosomes evolve?
[!IMPORTANT]
Y染色体的退化
- 性别决定基因在常染色体上发生,Y形成;
- 性别拮抗基因(sexually antagonistic gene)在Y染色体上积累:在一个性别中是有益的,而在另一个性别中却是有害的
- 为防止性别拮抗基因通过重组跳到另一染色体上对另一性别产生伤害,Y染色体局部发生==重组抑制==,该区域失去了通过重组“净化”有害突变的能力
- 重组抑制区域积累突变(转座子、无义突变、缺失、插入、重复序列),大量基因丢失或功能缺陷
- X染色体的剂量补偿效应开始出现
- Y染色体持续退化,最终可能消失
- The complex Y-linked sequences
- Dosage sensitive genes:退化得非常慢,一旦丢失大概率致死
- Palindrome genes:Y染色体上的多拷贝基因,能延缓退化、修复某个拷贝的突变

How do dosage compensation evolve?
- Drosophila: 2-fold upregulation in male
- Nematode: 50% dampening in female
- Mammal: one X inactive in femal (Barr Body)
- Evolving within 1 million years
What are the remaining questions?
- How do turnovers of sex chromosome systems happen?当Y染色体彻底消失后,原有的性染色体被新的性染色体所取代,从而形成新的性别决定机制性冲突
- Why do some sex chromosomes never degenerate?
- How do species determine their sex? 仍有很多物种的性别决定机制是未知的
PART 3-3 Are “Junk DNA” Junk?
Transportable Elements (TE)
- Class I: Retrotransposons (反转录转座子) Copy and Paste
- Class II: DNA Transposons (DNA转座子) Cut/Peel and Paste

- The Deleterious Effects:
- DNA Damage and Interrupt Genes
- Genomic Rearrangement
- Gene Silencing (TE affects chromatin accessibility,异染色质扩散)
- Change Splicing Pattern(新剪切位点)
- de novo TE insertions and human disease:
- Exon disruption : Haemophilia(血友病)
- Change of alternative splicing: Dent’s Disease
How TEs Contribute to the Genome Organization?
- Act as genes—flamenco: piRNA, lncRNA
- Form centromeres and telomeres
- Participate in 3D Genome Organization:
- TE demarcate the TAD boundaries(划分拓扑关联域边界)
- Insertion of HERV-H retrotransposon creates a new TAD border
How TEs Regulate Gene Expression?
- Providing TF binding sites: 人类催乳素增强子来源于一种DNA转座子
- Provide insulator/boundary elements: 绝缘子、边界元件
- Change Splicing Pattern
- Transposase-TF fusion: create new TF
- Tissue- or cell-specific expression patterns
- Tissue- or stage-specific histone modifications

[!IMPORTANT]
转座子与编码蛋白质的基因的区别
PART 3-4 Mechanisms of Transcriptional Regulation
Histone code and chromatin state
Chromatin architecture
- Important technology: Hi-C. Crosslink -> Digestion -> Biotin Marking -> Loop Ligation -> Break crosslink -> Sequencing Adaptor -> PCR -> Sequencing
- Nucleosome
- TAD: A topologically associating domain is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD
- Chromatin compartments
- A-compartments: internal regions, actively transcribed genes
- B-compartments: the periphery of nuclei , inactive genes

Genome Folding
- Loop Extrusion (环挤压): loops are produced by progressive extrusion of chromatin by a loop extrusion factor(like cohesin),forming TAD

- Self-interactions/Phase Separation(自相互作用/相分离): forming A/B compartment
- attraction of heterochromatin to the nuclear lamina
- preferential attraction of similar chromatin to each other
- higher levels of chromatin mobility in active chromatin
- transcription-related clustering of euchromatin.
TADs and Transcription
Enhancers and promoters can interact through chromatin looping, with TAD facilitating this interaction.

PART 3-5 Human Evolution
Human origin and migration history
- Single Origin Hypothesis: out of Africa, displacing other archaic hominins
- Multiregional: spread out of Africa much earlier, interbreeding with other archaic hominins (gene flow)
- Neanderthals: 尼安德特人
- Denisova: 丹尼索瓦人
Human unique traits
- Intelligence and Language: non-coding DNA, FOXP2
- High Altitude Adaptation: EPAS1
PART 4 人类遗传病
人类基因组组成
- 单拷贝DNA:绝大多数蛋白质编码基因
- 重复DNA:
- 成簇重复序列(卫星DNA):短序列串联重复,不同卫星DNA的重复核心也不一样
- 散在重复序列:短散在重复序列(SINE)如Alu家族,约300bp;长散在重复序列(LINE),可以达到6kbp
- 区段重复:高保守,在染色体上位置较远
染色体疾病
- 异常的染色体分离:非整倍体、单亲二体
- 频发性染色体综合征:在片段重复区域的重组
- 特发性染色体畸变:常染色体缺失(猫叫综合征)
- 性染色体畸变:Turner综合征(XO),Klinefelter综合征(XXY)
单基因遗传病
- 常染色体显性:通常患者的基因型均杂合子,患病概率高,代代出现
- 常染色体隐性:纯合致病,携带者不致病,近亲结婚容易出现
- X连锁显性:无男-男遗传现象
- X连锁隐性:通常仅见于男性,或男性患者远多于女性,女性多为携带者(血友病A、色盲)
- 假常染色体遗传:位于性染色体的假常染色体区域,和常显类似
- 三核苷酸不稳定重复扩增:亨廷顿舞蹈症等神经退行疾病
- mtDNA母系遗传:患病父系与正常母系。后代突变基因丢失,纯质性与杂质性
人类遗传病分析方法
基于家系的连锁分析
- 利用家系追踪疾病在家族成员间的遗传,并检测疾病与特定基因组区域甚至特定突变的共遗传
- 重组:非常近完全连锁;非常远自由组合
- 连锁不平衡LD:分属两个或两个以上基因座的等位基因同时出现在一条染色体上的几率,高于随机出现的频率,说明两个基因在物理位置上非常近
- LD Block: 具有高LD等位基因的位点簇,在人群中不同;LD块之间的边界通常与减数分裂重组热点相吻合
$LOD = log_{10}\frac{L(\theta)}{L(\theta = 0.5)}$
- 带入$\theta$从0-0.5,计算LOD值,最大的LOG>3时,此时的$\theta$ 就是重组率
例子:Family1 如下,可以发现F3中有一个AD不患病,一定是发生了重组,假设$\theta=0.1$计算得到,LOD=1.09。同时有其他几个家族,同样计算,计算LOD之和,发现0.06时候LOD大于3,即$\theta=0.06$



基于人群的全基因组关联分析(GWAS)
- 通过与同一人群中未受影响的对照组进行比较,发现受影响人群队列中特定等位基因频率的增加或减少
- 适用于非孟德尔遗传的复杂和多因素疾病,利于发现微小影响导致复杂特征的变异

- 病例对照研究:$OR=\frac{a/b}{c/d}$(不等于1的时候和标记有关联)
横断面或队列研究:$RR=\frac{a/(a+b)}{c/(c+d)}$
局限性:
- 由人口分层引起的完全人为关联:群体中存在多个亚群(人种、宗教),个体只在亚群内交配
- 统计显著性的临界值宽松易出现假阳性结果:GWAS一般需要$P<5\times 10^{-8}$
- 往往不能直接发现致病基因,只是找到了和易感因素存在连锁不平衡的遗传标记
基于个体全基因组测序分析
- 适用于罕见的、没有足够的家系材料进行连锁分析的孟德尔疾病
- 通常需要筛选数据:
- 保留蛋白质编码序列:全外显子测序
- 丢弃高于罕见病预期的常见变异
- 保留无义、移码、高保守剪接位点突变,丢弃同义突变或内含子突变
- 与可能的遗传模式一致的突变
- 局限性:
- 甲基化疾病、单亲二体疾病无法分析
- 非编码区和调控区的变异无法解释
- 高度重复预测无法检测
PART 4 复习题
如何筛查婴儿的遗传病?
- 孕前筛查:夫妻双方全外显子测序,是否携带致病基因
- 胚胎筛查:胚胎是否出现致病基因纯合
- 新生儿筛查:是否出现致病性状
如何治疗遗传病?
等位基因
- 位于同源染色体相同位置(基因座)上,控制同一性状不同形式的基因
- 传递方式:分离和自由组合定律(如果不连锁的话)
- 如果发生重组,在什么时候分离?不发生重组?减I后期和减II后期
人类基因组
- 变异的类型:基因片段上的(单核苷酸多态性,indel,copy number),染色体上的(整倍体、非整倍体、缺失)
- SNP,indel,微卫星分子标记
- 全外显子测序(为什么被广泛运用?)全基因组测序的基本原理(两者的区别是什么?)
遗传家系图
- 可能的遗传模式:显性隐性,X连锁
- 近亲家系寻找致病基因的策略
- 连锁分析、关联分析










