Slide 1

Slide 1 text

ISMB2018ಡΈձ ୲౰࿦จ: Predicting CTCF-mediated chromatin loops using CTCF-MP Zhang et al., Bioinformatics, 34, 2018, i133–i141 ใࠂऀඌ࡚ྒྷʢ!ZVJGVʣஜ೾େ

Slide 2

Slide 2 text

CTCFʹΑΔΫϩϚνϯͷϧʔϓܗ੒ ΫϩϚνϯͷϧʔϓܗ੒ ΫϩϚνϯͷ཭ΕͨՕॴཱ͕ମతʹۙ઀͢Δ ϧʔϓܗ੒ͷϝΧχζϜ͸ɺҨ఻ࢠൃݱ੍ޚ΍࣬ױͷཧղʹॏཁ CTCFλϯύΫ࣭ͷ݁߹෦Ґಉ࢜͸ϧʔϓΛܗ੒͢Δ ഉଞతʹϖΞΛ࡞ΔΘ͚Ͱ͸ͳ͍ʢ̏ՕॴҎ্΋͋Γ͏Δʣ   doi: 10.1093/bioinformatics/bty248 CTCF݁߹෦Ґ CTCFʹΑΔ ΫϩϚνϯϧʔϓ

Slide 3

Slide 3 text

CTCFʹΑΔϧʔϓܗ੒ͷܾఆҼࢠ͸ʁ શͯͷCTCF݁߹෦Ґ͕ϧʔϓܗ੒͢Δ༁Ͱ͸ ͳ͍ CTCF݁߹෦Ґͷ71.3%͕ϧʔϓܗ੒ʹؔ༩ Կ͕ϧʔϓܗ੒ΛܾΊ͍ͯΔ͔ʁ CTCFͷϞνʔϑͷϖΞͷ޲͖͕convergentͰ͋Δ ͱϧʔϓܗ੒͠΍͍͢ͱߟ͑ΒΕ͍ͯΔ ͔͠͠ɺ͜Ε͚ͩͰ͸ෆे෼ʢΒ͍͠ʣ ʢϞνʔϑͷ޲͖Ҏ֎ͷʣ഑ྻͷಛ௃͸͋Δ͔ʁ   CTCFͷϞνʔϑ ʢCTCFͷ݁߹഑ྻʹ ಛ௃తͳύλʔϯʣ CTCF

Slide 4

Slide 4 text

໰୊ઃఆ: CTCFͷϖΞ͕ϧʔϓΛܗ੒͢Δ͔Λ༧ଌ ʮCTCFʹ݁߹͞Ε͍ͯΔϞνʔϑʯͷίϯ όʔδΣϯτͳϖΞʢ※̍ʣͰϧʔϓ͕ܗ੒ ͞ΕΔ͔Λڭࢣ͋ΓֶशͰ༧ଌ ਖ਼ྫ: ʢ※̍ʣͷ͏ͪɺϧʔϓܗ੒͍ͯ͠Δ΋ͷ ChIA-PETͰ࣮ଌ ෛྫ: ʢ※̍ʣͷ͏ͪɺϧʔϓܗ੒͍ͯ͠ͳ͍΋ͷ ਖ਼ྫͱϖΞؒͷڑ཭͕ಉ͘͡Β͍ͳ΋ͷΛબͿ ʢਖ਼ྫͱಉ͡αϯϓϧ਺ʣ ഑ྻ͕   ʮCTCFʹ݁߹͞Ε͍ͯΔϞνʔϑʯ ͷίϯόʔδΣϯτͳϖΞ

Slide 5

Slide 5 text

ఏҊख๏ CTCF-MP   doi: 10.1093/bioinformatics/bty248 9(#PPTUͰ༧ଌ ಛ௃ྔઃܭͷ࿩ 8PSEWFD ഑ྻอଘੑ ͦͷଞ XGBoost: ऑ෼ྨثͱܾͯ͠ఆ໦Λ࢖༻͢Δ ϒʔεςΟϯάΞϧΰϦζϜͷҰछ

Slide 6

Slide 6 text

Continuous bag-of-words (CBOW) in word2vec   word2vec͸୯ޠͷ෼ࢄදݱΛֶश͢ Δख๏ʢCBOW͸ͦͷϞσϧͷҰछʣ ͋Δ୯ޠΛपลͷ୯ޠʢcontextʣ͔ Β༧ଌ͢ΔϞσϧ ӅΕ૚͕̍૚ͷશ݁߹NN पลͷ୯ޠͷॱং͸ແࢹ ୯ޠͷϕΫτϧදݱΛಘΒΕΔ Vछྨͷ୯ޠΛN࣍ݩϕΫτϧͰදݱ पลͷ୯ޠ (1-of-Vදݱ) (ॱং͸ແࢹͯ͠ฏۉ) ͋Δ୯ޠ (1-of-Vදݱ) ϕΫτϧදݱ (N࣍ݩ) http://mccormickml.com/assets/word2vec/Alex_Minnaar_Word2Vec_Tutorial_Part_II_The_Continuous_Bag-of-Words_Model.pdf

Slide 7

Slide 7 text

k-mer ʹ CBOW Λద༻ CTCF݁߹෦Ґͷ഑ྻʹؚ·ΕΔk-mer ΛϕΫτϧʹ͠ɺϕΫτϧͷฏۉΛͱͬ ͨ (doc2vecతൃ૝?) ύϥϝʔλ Ϟνʔϑपล ±250 bpΛ࢖༻ k=6, N=100   पลͷk-mer (1-of-Vදݱ) (ॱং͸ແࢹͯ͠ฏۉ) ͋Δk-mer (1-of-Vදݱ) ϕΫτϧදݱ (N࣍ݩ) doi: 10.1093/bioinformatics/bty248

Slide 8

Slide 8 text

݁Ռ k-mer͚ͩΑΓword2vecͷํ͕ਫ਼౓͕ߴ͔ͬͨ word2vecͱͦΕҎ֎ͷ഑ྻؔ࿈ͷಛ௃ྔ͸ิ׬తͳ৘ใΛ࣋ͭ ഑ྻؔ࿈ͷಛ௃ྔ͸ΤϐήϊϜؔ࿈ͷಛ௃ྔͱิ׬తͳ৘ใΛ࣋ͭ   doi: 10.1093/bioinformatics/bty248 ਐԽతอଘੑ ϖΞؒڑ཭ ϖΞͷؒʹϞνʔϑ͕ग़ݱ͢Δ͔

Slide 9

Slide 9 text

݁Ռ: ղऍ   word2vecͷಛ௃ྔͰt-SNE͢Δͱ ਖ਼ྫͱෛྫ͕෼͔Εͨʢݟͨ໨ʣ د༩ͷߴ͍word2vecͷಛ௃͸
 ࡉ๔ܕؒͰڞ௨ doi: 10.1093/bioinformatics/bty248 doi: 10.1093/bioinformatics/bty248

Slide 10

Slide 10 text

ࡶײ ͳͥISMBʹ࠾୒͞Ε͔ͨʁ ໰୊ઃఆ͕৽͍͠ ENCODEͰػೳήϊϛΫεσʔλΛ࢖ͬͨ༧ଌ͕ྲྀߦ͕ͬͨɺ࠷ۙ͸ʮ഑ྻ͸Ͳͷ͘Β͍৘ใΛ࣋ͬͯ ͍Δ͔ʯΛػցֶशͰΞϓϩʔν͢Δͷ͕ྲྀߦ͍ͬͯΔؾ͕͢Δ ࡉ͔͍ղऍ΍࣮ݧ͕ߦ͖ಧ͍͍ͯΔʢམͱ͞ΕͮΒ͍ʣ e.g. ಛ௃ྔͷ෼ੳɺࡉ๔ܕಛҟੑͱ༧ଌਫ਼౓ͷؔ࿈ͷ෼ੳɺҟͳΔࡉ๔ܕͰֶशͨ͠Ϟσϧͷਫ਼౓ධՁ ෆຬ word2vecҎ֎ͷ഑ྻؔ࿈ͷಛ௃ྔ͚ͩͰ΋word2vecʹ͍ۙਫ਼౓͕ग़ͯͨʢߩݙ͕ബ͍ؾ͕͢Δʣ ݁ہֶश݁ՌΛղऍͰ͖ͳ͍ʢීஈ͔ΒΤϐήϊϜʹ৮Ε͍ͯΔਓ͔Β͢Δͱத్൒୺ʣ ͦͷଞ k-merස౓ϕʔεͷλεΫʹ word2vec ͸࢖͑ͦ͏ʁʢ͢Ͱʹ͋Δ͔͸ະௐࠪʣ  

Slide 11

Slide 11 text

 

Slide 12

Slide 12 text

References word2vec - Distributed representations of words and phrases and their compositionality https:// arxiv.org/abs/1310.4546 - Efficient Estimation of Word Representations in Vector Space https://arxiv.org/abs/ 1301.3781 - word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method https://arxiv.org/abs/1402.3722 XGBoost - XGBoost: A Scalable Tree Boosting System https://arxiv.org/abs/1603.02754 k-merͷword2vecʹؔ͢ΔஶऀΒͷઌߦݚڀ - Exploiting sequence-based features for predicting enhancer–promoter interactions https://doi.org/10.1093/bioinformatics/btx257  

Slide 13

Slide 13 text

ิҨ: ཁ໿ CTCF motifͷconvergentͳϖΞ͕ϧʔϓΛܗ੒͢Δ͔ΛɺDNA഑ྻʹجͮ͘ಛ௃ྔ Ͱ༧ଌͰ͖Δ͔Λௐ΂ͨ ઌߦݚڀͱͷҧ͍: CTCFϞνʔϑͷϖΞͷ޲͖΍ΤϐήϊϜҎ֎ͷɺϧʔϓܗ੒ͷ༧ ଌʹॏཁͳ഑ྻಛ௃Λݟग़ͨ͠ ٕज़: k-mer Λ word2vecͰϕΫτϧԽɺ഑ྻɾਐԽϕʔεͷಛ௃ྔɺXGboost ݕূ: ܇࿅ͱݕূʹҟͳΔࡉ๔ܕͷσʔλΛ༻͍ͯ΋ߴਫ਼౓Ͱ༧ଌͰ͖ͨɻk-merʹ word2vec͕ൺ΂ͯΑ͔ͬͨ ٞ࿦: ਐԽతอଘੑ͸࢖͑Δɺ഑ྻϕʔεͷಛ௃ྔ͚ͩͰ΋CTCFͷϖΞΛ༧ଌͰ͖ Δ  

Slide 14

Slide 14 text

ิҨ: ஶऀΒͷઌߦݚڀͷઆ໌ ஶऀΒ͸ࡢ೥ͷISMB/ECCB2018ʹ ͯɺk-merͷword2vec + XGBoostΛ ผͷ໰୊ʹద༻ Yang,Y. et al. (2017) Exploiting sequence-based features for predicting enhancer–promoter interactions. Bioinformatics, 33, i252–i260. ഑ྻ৘ใͷΈͰΤϯϋϯαʔ-ϓϩϞʔ λʔ૬ޓ࡞༻Λ༧ଌͰ͖Δ͜ͱΛ͍ࣔͯ͠ ͨ  

Slide 15

Slide 15 text

ิҨ: CBOWͷઆ໌   Ϟσϧͷ௚ײతཧղ Ϟσϧͷ໨తؔ਺ ͋Δ୯ޠΛपลͷ୯ޠʢcontextʣ͔Β༧ଌ͢ΔϞσϧ είΞΛܭࢉ͢Δͨͼʹ ୯ޠͷछྨ਺෼ͷ૯࿨͕ඞཁ → େม ͋Δw, c ͷϖΞʹରͯ̍͠ճ Ϟσϧͷ໨తؔ਺ʢNegative samplingʣ

Slide 16

Slide 16 text

ิҨ: Negative samplingͷઆ໌ CBOWϞσϧΛޮ཰Αֶ͘श͢ΔͨΊʹಋೖ͞Εͨ໨తؔ਺ ͋Δcontextʢपลޠʣʹରͯ͠ɺਅͷ୯ޠͱϊΠζ෼෍͔ΒαϯϓϦϯά ͞ΕͨϊΠζͷ୯ޠΛೋ஋൑ผϞσϧ͢ΔΑ͏ʹ͢Δ Ұͭͷ୯ޠ-ίϯςΩετηοτʹରͯ͠ɺِͷ୯ޠ͸ෳ਺ (m)ݸ༻ҙ͢Δ   ਖ਼ղͷ୯ޠɾίϯςΩετϖΞ ෆਖ਼ղͷ୯ޠɾίϯςΩετϖΞ