Upgrade to Pro — share decks privately, control downloads, hide ads and more …

L19 Gene Regulation

Avatar for shaunmahony shaunmahony
March 23, 2022
56

L19 Gene Regulation

BMMB 554 L19 Gene Regulation

Avatar for shaunmahony

shaunmahony

March 23, 2022
Tweet

Transcript

  1. Today’s learning objectives • Understand how transcription factors are important

    for gene regulation. • Learn how to represent the binding preference of a transcription factor. • Learn how to find potential binding sites in DNA sequences.
  2. Regulation of genes/proteins can be affected by many mechanisms •

    Chromatin structure • Epigenetic modifications / histone marks • General transcriptional machinery • Sequence-specific transcription factors • Activators & repressors • Non-coding RNA activity • Post-transcriptional regulators • Alternate splicing • miRNAs, siRNAs, etc. • Post-translational modifications • Protein degradation
  3. Regulation of genes/proteins can be affected by many mechanisms •

    Chromatin structure / nucleosome positions • Epigenetic modifications / histone marks • General transcriptional machinery • Sequence-specific transcription factors • Activators & repressors • Non-coding RNA activity • Post-transcriptional regulators • Alternate splicing • miRNAs, siRNAs, etc. • Post-translational modifications • Protein degradation Assayable by ChIP-seq
  4. TFs can promote transcriptional initiation Distal enhancers can promote pre-

    initiation complex formation and transcriptional initiation via mediator Molecular Biology of the Cell (Garland Science)
  5. TFs can promote transcriptional elongation Promoter-proximal pausing occurs at most

    Pol II genes. Activators can promote the release of the paused polymerase to enable active elongation. Adelman & Lis, Nature Reviews Genetics (2012)
  6. TFs promote chromatin remodeling Chromatin remodeling complexes: • Histone acetyltransferases

    (HATs) / deacetylases • Methyltransferases / demethylases • ATP-dependent chromatin remodelers • Mediator, cohesin • GTFs, Pol II Molecular Biology of the Cell (Garland Science)
  7. Transcription factors are key to cellular identity pluripotent fibroblast motor

    neuron dopa+ neuron cardiomyocyte cortical neuron Ascl1, Brn2, Myt1l Ngn2, Isl1, Lhx3 Ascl1, Nurr1, Lmx1a Ascl1, Brn2, Myt1l Gata4, Mef2c, Tbx5 Ngn3, Pdx1, Mafa exocrine β-cell myoblast MyoD Oct4, Sox2, Klf4, c-Myc
  8. Transcription factors are key to cellular identity pluripotent fibroblast motor

    neuron dopa+ neuron cardiomyocyte cortical neuron Ascl1, Brn2, Myt1l Ngn2, Isl1, Lhx3 Ascl1, Nurr1, Lmx1a Ascl1, Brn2, Myt1l Gata4, Mef2c, Tbx5 Ngn3, Pdx1, Mafa exocrine β-cell myoblast MyoD Oct4, Sox2, Klf4, c-Myc oncogenesis oncogenes (myc) tumor suppressors (p53)
  9. CCGGAA TGACCT..TGACCT ATTA CACGTG Transcription factor binding sites • Short:

    Typically between 6 – 20bp long • Degenerate: TFs have favorite binding sequences but don’t require a perfect match to bind. Likes to bind:
  10. We can represent a TF’s binding preference as a multiple

    alignment GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC GGGGATTCCC GGGATTTCCC GGGAATTCAC GGGGCTTTCC GGGGCTTTCC GGGAAGTCCC TF: NFκB Favorite sequence: GGGAATTTCC Known NFκB binding sites: Consensus sequence Weight matrix GGGRNWTYCC Definition: A motif is a pattern of conservation in a sequence alignment.
  11. Position weight matrices Definition: A position weight matrix (PWM) is

    a matrix of log likelihood values that gives a weighted match to fixed length strings. Also known as position specific scoring matrix (PSSM). GGGAATTTCC GGGAATTTCC GGGAATTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGACTTTCC GGGAAATTCC GGGACTTCCC GGGGATTTCC GGGGATTTCC GGGGATTTCC GGGATTTTCC GGGGATTCCC GGGATTTCCC GGGAATTCAC GGGGCTTTCC 1 2 3 4 5 6 7 8 9 10 A 0 0 0 15 9 1 0 0 1 0 C 0 0 0 0 9 0 0 4 19 20 G 20 20 20 5 0 0 0 0 0 0 T 0 0 0 0 2 19 22 16 0 0 1 2 3 4 5 6 7 8 9 10 A 0.00 0.00 0.00 0.75 0.45 0.05 0.00 0.00 0.05 0.00 C 0.00 0.00 0.00 0.00 0.45 0.00 0.00 0.20 0.95 1.00 G 1.00 1.00 1.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 T 0.00 0.00 0.00 0.00 0.10 0.95 1.00 0.80 0.00 0.00 Count matrix Relative frequency matrix pi,j : probability of observing letter i at position j
  12. Motif logos Sequence logo IC j = 2 −(− p

    i, j log 2 (p i, j ) i=A T ∑ ) Height i, j = p i, j IC j Information content: Letter height in logo: 1 2 3 4 5 6 7 8 9 10 A 0.00 0.00 0.00 0.75 0.45 0.05 0.00 0.00 0.05 0.00 C 0.00 0.00 0.00 0.00 0.45 0.00 0.00 0.20 0.95 1.00 G 1.00 1.00 1.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 T 0.00 0.00 0.00 0.00 0.10 0.95 1.00 0.80 0.00 0.00 Relative frequency matrix pi,j : probability of observing letter i at position j
  13. Position weight matrices 1 2 3 4 5 6 7

    8 9 10 A -4.64 -4.64 -4.64 1.55 0.82 -2.64 -4.64 -4.64 -2.64 -4.64 C -4.64 -4.64 -4.64 -4.64 0.82 -4.64 -4.64 -0.40 1.91 1.96 G 1.96 1.96 1.96 0.00 -4.64 -4.64 -4.64 -4.64 -4.64 -4.64 T -4.64 -4.64 -4.64 -4.64 -1.06 1.91 1.96 1.66 -4.64 -4.64 Position weight matrix: mi,j = log2 (pi,j / bi ) where bi is the background probability of letter i 1 2 3 4 5 6 7 8 9 10 A 0.01 0.01 0.01 0.73 0.44 0.04 0.01 0.01 0.04 0.01 C 0.01 0.01 0.01 0.01 0.44 0.01 0.01 0.19 0.94 0.97 G 0.97 0.97 0.97 0.25 0.01 0.01 0.01 0.01 0.01 0.01 T 0.01 0.01 0.01 0.01 0.12 0.94 0.97 0.79 0.01 0.01 Relative frequency matrix pi,j : probability of observing letter i at position j A 0.25 C 0.25 G 0.25 T 0.25 Background frequencies bi : probability of observing letter i ÷
  14. Motif scanning: finding instances of a known motif 1 2

    3 4 5 6 7 8 9 10 A -4.70 -4.70 -4.70 1.55 0.82 -2.12 -4.70 -4.70 -2.12 -4.70 C -4.70 -4.70 -4.70 -4.70 0.82 -4.70 -4.70 -0.31 1.88 1.96 G 1.96 1.96 1.96 0.00 -4.70 -4.70 -4.70 -4.70 -4.70 -4.70 T -4.70 -4.70 -4.70 -4.70 -1.24 1.88 1.96 1.64 -4.70 -4.70 >Test Sequence TTACGTTTGTCGATTTATGGGACTTTCCTCTTCGTATTTATTAGGCT 1 2 3 4 5 6 7 8 9 10 A -4.70 -4.70 -4.70 1.55 0.82 -2.12 -4.70 -4.70 -2.12 -4.70 C -4.70 -4.70 -4.70 -4.70 0.82 -4.70 -4.70 -0.31 1.88 1.96 G 1.96 1.96 1.96 0.00 -4.70 -4.70 -4.70 -4.70 -4.70 -4.70 T -4.70 -4.70 -4.70 -4.70 -1.24 1.88 1.96 1.64 -4.70 -4.70 T T A C G T T T G T C G A … A T G G G A C T T T C C T C Score = -27.42 Score = 17.57
  15. Motif scanning tools • FIMO: http://meme-suite.org/tools/fimo • Part of the

    MEME suite of motif analysis tools. • MotifViz: http://biowulf.bu.edu/MotifViz/ • Various motif scanning tools included. • MATCH: Kel, et al. Nucleic Acids Res (2003) • TAMO: Gordon, et al. Bioinformatics (2005) • MotifScanner/TOUCAN: Aerts, et al. Nucleic Acids Res (2005)
  16. Databases of known motifs • Jaspar: • http://jaspar.genereg.net • Open

    access. • High quality motifs, medium coverage. • cis-bp: • http://cisbp.ccbr.utoronto.ca • Mostly based on in vitro protein binding microarray experiments. • Comprehensive: known or predicted motifs for most human TFs.
  17. How do transcription factors recognize their binding sites? C T

    A A T T A C T A A T T A T T A A G T A C T A A T G G T T A A T T G T T A A G G A C T A A T G A G T A A T T G Binding sites Sequence that best matches motif. 206,801 exact matches in mouse genome. 2,807 of these are bound by Isl1 in motor neurons. In general (vertebrate genomes): • Millions of motif instances. • Tens of thousands of binding sites. Isl1
  18. Most TF binding motif instances are unbound • Essentially all

    TF binding motif occurrences will have no function in a given cell type*. • How can we focus on motif instances that are more likely to be bound & functional? • Conservation • Cis-regulatory modules (i.e. clusters of sites) • Measuring TF binding (ChIP-seq) • Accessibility (DNaseI hypersensitivity) • Chromatin marks (H3K4me1, H3K27ac) * Wasserman & Sandelin, Nature Reviews Genetics (2004) Cell type dependent: we would need experimental data
  19. Limitations of phylogenetic footprinting • Only ~60% of TF binding

    sites are directly conserved between human & mouse. • Perhaps even fewer TF binding sites are functionally conserved. • Phylogenetic shadowing approaches allows conservation analysis of binding sites across more closely related species.
  20. Cis-regulatory modules • Idea: motif instances should be clustered at

    real enhancers. • Different combinations of TFs may have different regulatory effects. Regulatory synergy
  21. Cis-regulatory modules • Multiple TF binding sites will form a

    stronger binding region. • Motif instances for cooperating TFs should be located together.
  22. How do transcription factors recognize their regulatory targets in a

    given cell type? CHROMATIN STATE DEPENDENT ACCESSIBILITY TF XYZ COOPERATIVE INTERACTIONS TF XYZ PIONEER BINDING TF BEFORE AFTER BEFORE AFTER BEFORE AFTER NO DEPENDENCE ON PRIOR STATE…
  23. Chromatin structure may determine TF binding locations • Regulatory sites

    are typically located in regions of accessible chromatin. • Enhancers and promoters have characteristic histone modifications. • H3K4me1 at enhancers • H3K4me3 at promoters • TFs may interact differently with methylated DNA. • Higher-order genome topology may also play a role in TF binding. Rosa & Shaw, Biology (2013)
  24. Chromatin immunoprecipitation (ChIP) 1. Crosslink: Use formaldehyde or UV light

    to covalently crosslink proteins to DNA (i.e. stick everything together) 2. Lyse: Use a lysis buffer to break down cell walls and release chromatin. 3. Shear: Use sonication or nuclease digestion to break the crosslinked chromatin into small fragments.
  25. Chromatin immunoprecipitation (ChIP) 4. Immunoprecipitate: Use antibodies attached to beads

    to select for DNA fragments that have a protein of interest attached. Separate out beads using magnet for magnetic beads or centrifugation for agarose beads. Y Y Y Y Y magnetic bead with attached antibodies Y
  26. Chromatin immunoprecipitation (ChIP) 5. Reverse crosslinks Incubate at 70°C to

    reverse cross-linking (i.e. unstick DNA from protein), and separate DNA.
  27. Chromatin immunoprecipitation (ChIP) 6. Library preparation: Select DNA fragments that

    are ~200bp in length. Amplify the fragments using PCR. Size select PCR amplify
  28. ChIP-seq • Directly sequence ChIPed DNA fragments using a high-throughput

    sequencer. • Sequence one or both fragment ends? • à FASTQ file • Map the sequenced reads back to the genome. • Bowtie / BWA • à BAM file
  29. Summary • We cannot (yet) predict where a given transcription

    factor will bind in a given cell type. • Motif scanning yields too many potential sites. • ChIP-seq or other experimental approaches are required.