Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ISMB2018 reading club: Predicting CTCF-mediated...

Haruka Ozaki
August 17, 2018

ISMB2018 reading club: Predicting CTCF-mediated chromatin loops using CTCF-MP

This is the slide for "Predicting CTCF-mediated chromatin loops using CTCF-MP" in ISMB2018 reading club

Haruka Ozaki

August 17, 2018
Tweet

More Decks by Haruka Ozaki

Other Decks in Science

Transcript

  1. ISMB2018ಡΈձ ୲౰࿦จ: Predicting CTCF-mediated chromatin loops using CTCF-MP Zhang et

    al., Bioinformatics, 34, 2018, i133–i141 ใࠂऀඌ࡚ྒྷʢ!ZVJGVʣஜ೾େ
  2. ఏҊख๏ CTCF-MP   doi: 10.1093/bioinformatics/bty248 9(#PPTUͰ༧ଌ ಛ௃ྔઃܭͷ࿩ 8PSEWFD ഑ྻอଘੑ

    ͦͷଞ XGBoost: ऑ෼ྨثͱܾͯ͠ఆ໦Λ࢖༻͢Δ ϒʔεςΟϯάΞϧΰϦζϜͷҰछ
  3. Continuous bag-of-words (CBOW) in word2vec   word2vec͸୯ޠͷ෼ࢄදݱΛֶश͢ Δख๏ʢCBOW͸ͦͷϞσϧͷҰछʣ ͋Δ୯ޠΛपลͷ୯ޠʢcontextʣ͔

    Β༧ଌ͢ΔϞσϧ ӅΕ૚͕̍૚ͷશ݁߹NN पลͷ୯ޠͷॱং͸ແࢹ ୯ޠͷϕΫτϧදݱΛಘΒΕΔ Vछྨͷ୯ޠΛN࣍ݩϕΫτϧͰදݱ पลͷ୯ޠ (1-of-Vදݱ) (ॱং͸ແࢹͯ͠ฏۉ) ͋Δ୯ޠ (1-of-Vදݱ) ϕΫτϧදݱ (N࣍ݩ) http://mccormickml.com/assets/word2vec/Alex_Minnaar_Word2Vec_Tutorial_Part_II_The_Continuous_Bag-of-Words_Model.pdf
  4. k-mer ʹ CBOW Λద༻ CTCF݁߹෦Ґͷ഑ྻʹؚ·ΕΔk-mer ΛϕΫτϧʹ͠ɺϕΫτϧͷฏۉΛͱͬ ͨ (doc2vecతൃ૝?) ύϥϝʔλ Ϟνʔϑपล

    ±250 bpΛ࢖༻ k=6, N=100   पลͷk-mer (1-of-Vදݱ) (ॱং͸ແࢹͯ͠ฏۉ) ͋Δk-mer (1-of-Vදݱ) ϕΫτϧදݱ (N࣍ݩ) doi: 10.1093/bioinformatics/bty248
  5. References word2vec - Distributed representations of words and phrases and

    their compositionality https:// arxiv.org/abs/1310.4546 - Efficient Estimation of Word Representations in Vector Space https://arxiv.org/abs/ 1301.3781 - word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method https://arxiv.org/abs/1402.3722 XGBoost - XGBoost: A Scalable Tree Boosting System https://arxiv.org/abs/1603.02754 k-merͷword2vecʹؔ͢ΔஶऀΒͷઌߦݚڀ - Exploiting sequence-based features for predicting enhancer–promoter interactions https://doi.org/10.1093/bioinformatics/btx257  
  6. ิҨ: ཁ໿ CTCF motifͷconvergentͳϖΞ͕ϧʔϓΛܗ੒͢Δ͔ΛɺDNA഑ྻʹجͮ͘ಛ௃ྔ Ͱ༧ଌͰ͖Δ͔Λௐ΂ͨ ઌߦݚڀͱͷҧ͍: CTCFϞνʔϑͷϖΞͷ޲͖΍ΤϐήϊϜҎ֎ͷɺϧʔϓܗ੒ͷ༧ ଌʹॏཁͳ഑ྻಛ௃Λݟग़ͨ͠ ٕज़: k-mer

    Λ word2vecͰϕΫτϧԽɺ഑ྻɾਐԽϕʔεͷಛ௃ྔɺXGboost ݕূ: ܇࿅ͱݕূʹҟͳΔࡉ๔ܕͷσʔλΛ༻͍ͯ΋ߴਫ਼౓Ͱ༧ଌͰ͖ͨɻk-merʹ word2vec͕ൺ΂ͯΑ͔ͬͨ ٞ࿦: ਐԽతอଘੑ͸࢖͑Δɺ഑ྻϕʔεͷಛ௃ྔ͚ͩͰ΋CTCFͷϖΞΛ༧ଌͰ͖ Δ  
  7. ิҨ: ஶऀΒͷઌߦݚڀͷઆ໌ ஶऀΒ͸ࡢ೥ͷISMB/ECCB2018ʹ ͯɺk-merͷword2vec + XGBoostΛ ผͷ໰୊ʹద༻ Yang,Y. et al.

    (2017) Exploiting sequence-based features for predicting enhancer–promoter interactions. Bioinformatics, 33, i252–i260. ഑ྻ৘ใͷΈͰΤϯϋϯαʔ-ϓϩϞʔ λʔ૬ޓ࡞༻Λ༧ଌͰ͖Δ͜ͱΛ͍ࣔͯ͠ ͨ