Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ISMB2018 reading club: Predicting CTCF-mediated chromatin loops using CTCF-MP

396bc88acc93b94735c1c5d47a377c5f?s=47 Haruka Ozaki
August 17, 2018

ISMB2018 reading club: Predicting CTCF-mediated chromatin loops using CTCF-MP

This is the slide for "Predicting CTCF-mediated chromatin loops using CTCF-MP" in ISMB2018 reading club

396bc88acc93b94735c1c5d47a377c5f?s=128

Haruka Ozaki

August 17, 2018
Tweet

Transcript

  1. ISMB2018ಡΈձ ୲౰࿦จ: Predicting CTCF-mediated chromatin loops using CTCF-MP Zhang et

    al., Bioinformatics, 34, 2018, i133–i141 ใࠂऀඌ࡚ྒྷʢ!ZVJGVʣஜ೾େ
  2. CTCFʹΑΔΫϩϚνϯͷϧʔϓܗ੒ ΫϩϚνϯͷϧʔϓܗ੒ ΫϩϚνϯͷ཭ΕͨՕॴཱ͕ମతʹۙ઀͢Δ ϧʔϓܗ੒ͷϝΧχζϜ͸ɺҨ఻ࢠൃݱ੍ޚ΍࣬ױͷཧղʹॏཁ CTCFλϯύΫ࣭ͷ݁߹෦Ґಉ࢜͸ϧʔϓΛܗ੒͢Δ ഉଞతʹϖΞΛ࡞ΔΘ͚Ͱ͸ͳ͍ʢ̏ՕॴҎ্΋͋Γ͏Δʣ   doi: 10.1093/bioinformatics/bty248

    CTCF݁߹෦Ґ CTCFʹΑΔ ΫϩϚνϯϧʔϓ
  3. CTCFʹΑΔϧʔϓܗ੒ͷܾఆҼࢠ͸ʁ શͯͷCTCF݁߹෦Ґ͕ϧʔϓܗ੒͢Δ༁Ͱ͸ ͳ͍ CTCF݁߹෦Ґͷ71.3%͕ϧʔϓܗ੒ʹؔ༩ Կ͕ϧʔϓܗ੒ΛܾΊ͍ͯΔ͔ʁ CTCFͷϞνʔϑͷϖΞͷ޲͖͕convergentͰ͋Δ ͱϧʔϓܗ੒͠΍͍͢ͱߟ͑ΒΕ͍ͯΔ ͔͠͠ɺ͜Ε͚ͩͰ͸ෆे෼ʢΒ͍͠ʣ ʢϞνʔϑͷ޲͖Ҏ֎ͷʣ഑ྻͷಛ௃͸͋Δ͔ʁ 

     CTCFͷϞνʔϑ ʢCTCFͷ݁߹഑ྻʹ ಛ௃తͳύλʔϯʣ CTCF
  4. ໰୊ઃఆ: CTCFͷϖΞ͕ϧʔϓΛܗ੒͢Δ͔Λ༧ଌ ʮCTCFʹ݁߹͞Ε͍ͯΔϞνʔϑʯͷίϯ όʔδΣϯτͳϖΞʢ※̍ʣͰϧʔϓ͕ܗ੒ ͞ΕΔ͔Λڭࢣ͋ΓֶशͰ༧ଌ ਖ਼ྫ: ʢ※̍ʣͷ͏ͪɺϧʔϓܗ੒͍ͯ͠Δ΋ͷ ChIA-PETͰ࣮ଌ ෛྫ: ʢ※̍ʣͷ͏ͪɺϧʔϓܗ੒͍ͯ͠ͳ͍΋ͷ

    ਖ਼ྫͱϖΞؒͷڑ཭͕ಉ͘͡Β͍ͳ΋ͷΛબͿ ʢਖ਼ྫͱಉ͡αϯϓϧ਺ʣ ഑ྻ͕   ʮCTCFʹ݁߹͞Ε͍ͯΔϞνʔϑʯ ͷίϯόʔδΣϯτͳϖΞ
  5. ఏҊख๏ CTCF-MP   doi: 10.1093/bioinformatics/bty248 9(#PPTUͰ༧ଌ ಛ௃ྔઃܭͷ࿩ 8PSEWFD ഑ྻอଘੑ

    ͦͷଞ XGBoost: ऑ෼ྨثͱܾͯ͠ఆ໦Λ࢖༻͢Δ ϒʔεςΟϯάΞϧΰϦζϜͷҰछ
  6. Continuous bag-of-words (CBOW) in word2vec   word2vec͸୯ޠͷ෼ࢄදݱΛֶश͢ Δख๏ʢCBOW͸ͦͷϞσϧͷҰछʣ ͋Δ୯ޠΛपลͷ୯ޠʢcontextʣ͔

    Β༧ଌ͢ΔϞσϧ ӅΕ૚͕̍૚ͷશ݁߹NN पลͷ୯ޠͷॱং͸ແࢹ ୯ޠͷϕΫτϧදݱΛಘΒΕΔ Vछྨͷ୯ޠΛN࣍ݩϕΫτϧͰදݱ पลͷ୯ޠ (1-of-Vදݱ) (ॱং͸ແࢹͯ͠ฏۉ) ͋Δ୯ޠ (1-of-Vදݱ) ϕΫτϧදݱ (N࣍ݩ) http://mccormickml.com/assets/word2vec/Alex_Minnaar_Word2Vec_Tutorial_Part_II_The_Continuous_Bag-of-Words_Model.pdf
  7. k-mer ʹ CBOW Λద༻ CTCF݁߹෦Ґͷ഑ྻʹؚ·ΕΔk-mer ΛϕΫτϧʹ͠ɺϕΫτϧͷฏۉΛͱͬ ͨ (doc2vecతൃ૝?) ύϥϝʔλ Ϟνʔϑपล

    ±250 bpΛ࢖༻ k=6, N=100   पลͷk-mer (1-of-Vදݱ) (ॱং͸ແࢹͯ͠ฏۉ) ͋Δk-mer (1-of-Vදݱ) ϕΫτϧදݱ (N࣍ݩ) doi: 10.1093/bioinformatics/bty248
  8. ݁Ռ k-mer͚ͩΑΓword2vecͷํ͕ਫ਼౓͕ߴ͔ͬͨ word2vecͱͦΕҎ֎ͷ഑ྻؔ࿈ͷಛ௃ྔ͸ิ׬తͳ৘ใΛ࣋ͭ ഑ྻؔ࿈ͷಛ௃ྔ͸ΤϐήϊϜؔ࿈ͷಛ௃ྔͱิ׬తͳ৘ใΛ࣋ͭ   doi: 10.1093/bioinformatics/bty248 ਐԽతอଘੑ ϖΞؒڑ཭

    ϖΞͷؒʹϞνʔϑ͕ग़ݱ͢Δ͔
  9. ݁Ռ: ղऍ   word2vecͷಛ௃ྔͰt-SNE͢Δͱ ਖ਼ྫͱෛྫ͕෼͔Εͨʢݟͨ໨ʣ د༩ͷߴ͍word2vecͷಛ௃͸
 ࡉ๔ܕؒͰڞ௨ doi: 10.1093/bioinformatics/bty248

    doi: 10.1093/bioinformatics/bty248
  10. ࡶײ ͳͥISMBʹ࠾୒͞Ε͔ͨʁ ໰୊ઃఆ͕৽͍͠ ENCODEͰػೳήϊϛΫεσʔλΛ࢖ͬͨ༧ଌ͕ྲྀߦ͕ͬͨɺ࠷ۙ͸ʮ഑ྻ͸Ͳͷ͘Β͍৘ใΛ࣋ͬͯ ͍Δ͔ʯΛػցֶशͰΞϓϩʔν͢Δͷ͕ྲྀߦ͍ͬͯΔؾ͕͢Δ ࡉ͔͍ղऍ΍࣮ݧ͕ߦ͖ಧ͍͍ͯΔʢམͱ͞ΕͮΒ͍ʣ e.g. ಛ௃ྔͷ෼ੳɺࡉ๔ܕಛҟੑͱ༧ଌਫ਼౓ͷؔ࿈ͷ෼ੳɺҟͳΔࡉ๔ܕͰֶशͨ͠Ϟσϧͷਫ਼౓ධՁ ෆຬ word2vecҎ֎ͷ഑ྻؔ࿈ͷಛ௃ྔ͚ͩͰ΋word2vecʹ͍ۙਫ਼౓͕ग़ͯͨʢߩݙ͕ബ͍ؾ͕͢Δʣ

    ݁ہֶश݁ՌΛղऍͰ͖ͳ͍ʢීஈ͔ΒΤϐήϊϜʹ৮Ε͍ͯΔਓ͔Β͢Δͱத్൒୺ʣ ͦͷଞ k-merස౓ϕʔεͷλεΫʹ word2vec ͸࢖͑ͦ͏ʁʢ͢Ͱʹ͋Δ͔͸ະௐࠪʣ  
  11.  

  12. References word2vec - Distributed representations of words and phrases and

    their compositionality https:// arxiv.org/abs/1310.4546 - Efficient Estimation of Word Representations in Vector Space https://arxiv.org/abs/ 1301.3781 - word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method https://arxiv.org/abs/1402.3722 XGBoost - XGBoost: A Scalable Tree Boosting System https://arxiv.org/abs/1603.02754 k-merͷword2vecʹؔ͢ΔஶऀΒͷઌߦݚڀ - Exploiting sequence-based features for predicting enhancer–promoter interactions https://doi.org/10.1093/bioinformatics/btx257  
  13. ิҨ: ཁ໿ CTCF motifͷconvergentͳϖΞ͕ϧʔϓΛܗ੒͢Δ͔ΛɺDNA഑ྻʹجͮ͘ಛ௃ྔ Ͱ༧ଌͰ͖Δ͔Λௐ΂ͨ ઌߦݚڀͱͷҧ͍: CTCFϞνʔϑͷϖΞͷ޲͖΍ΤϐήϊϜҎ֎ͷɺϧʔϓܗ੒ͷ༧ ଌʹॏཁͳ഑ྻಛ௃Λݟग़ͨ͠ ٕज़: k-mer

    Λ word2vecͰϕΫτϧԽɺ഑ྻɾਐԽϕʔεͷಛ௃ྔɺXGboost ݕূ: ܇࿅ͱݕূʹҟͳΔࡉ๔ܕͷσʔλΛ༻͍ͯ΋ߴਫ਼౓Ͱ༧ଌͰ͖ͨɻk-merʹ word2vec͕ൺ΂ͯΑ͔ͬͨ ٞ࿦: ਐԽతอଘੑ͸࢖͑Δɺ഑ྻϕʔεͷಛ௃ྔ͚ͩͰ΋CTCFͷϖΞΛ༧ଌͰ͖ Δ  
  14. ิҨ: ஶऀΒͷઌߦݚڀͷઆ໌ ஶऀΒ͸ࡢ೥ͷISMB/ECCB2018ʹ ͯɺk-merͷword2vec + XGBoostΛ ผͷ໰୊ʹద༻ Yang,Y. et al.

    (2017) Exploiting sequence-based features for predicting enhancer–promoter interactions. Bioinformatics, 33, i252–i260. ഑ྻ৘ใͷΈͰΤϯϋϯαʔ-ϓϩϞʔ λʔ૬ޓ࡞༻Λ༧ଌͰ͖Δ͜ͱΛ͍ࣔͯ͠ ͨ  
  15. ิҨ: CBOWͷઆ໌   Ϟσϧͷ௚ײతཧղ Ϟσϧͷ໨తؔ਺ ͋Δ୯ޠΛपลͷ୯ޠʢcontextʣ͔Β༧ଌ͢ΔϞσϧ είΞΛܭࢉ͢Δͨͼʹ ୯ޠͷछྨ਺෼ͷ૯࿨͕ඞཁ →

    େม ͋Δw, c ͷϖΞʹରͯ̍͠ճ Ϟσϧͷ໨తؔ਺ʢNegative samplingʣ
  16. ิҨ: Negative samplingͷઆ໌ CBOWϞσϧΛޮ཰Αֶ͘श͢ΔͨΊʹಋೖ͞Εͨ໨తؔ਺ ͋Δcontextʢपลޠʣʹରͯ͠ɺਅͷ୯ޠͱϊΠζ෼෍͔ΒαϯϓϦϯά ͞ΕͨϊΠζͷ୯ޠΛೋ஋൑ผϞσϧ͢ΔΑ͏ʹ͢Δ Ұͭͷ୯ޠ-ίϯςΩετηοτʹରͯ͠ɺِͷ୯ޠ͸ෳ਺ (m)ݸ༻ҙ͢Δ  

    ਖ਼ղͷ୯ޠɾίϯςΩετϖΞ ෆਖ਼ղͷ୯ޠɾίϯςΩετϖΞ