ࣗલγʔέϯε࣮ݧσβΠϯͷํ͕େࣄ • ެ։σʔλʮ࣮ݧσβΠϯͷใΛ͍͔ʹखʹೖΕΔ͔ʯ͕େࣄ the hardest part is designing whole sequencing experiment, for both self-sequencing and using public sequencing data
πʔϧख๏ɺจଟ͘ग़ճ͓ͬͯΓɺղੳ͕େมͳ࣌͏ऴΘΔ • ܭࢉػࢿݯͷެڞϦιʔεͳͲʹΑͬͯղܾͰ͖Δ • ॏཁͳͷΑ͘σβΠϯ͞Ε࣮ͨݧͱ࣭ͷߴ͍ϥΠϒϥϦ ޙʹͳ͔ͬͯΒͲ͏͠Α͏ͳ͍ ٕज़తͳͳͷͰͲ͏ʹ͔ͳΔ over the data processing, just a technical part, now researcher must care about designing experiment
σʔλͷ࣭ͷஅʹ࣮ݧ݅ͳͲͷϝλใͷॆ࣮͕ඞཁ • େྔͷσʔλ͔ΒޮΑ͘ඞཁͳσʔλΛ୳͞ͳͯ͘ͳΒͳ͍ • αΠζͷେ͖ͳσʔλDLɾల։ʹ͕͔͔࣌ؒΔͷͰʮϋζϨʯΛҾ͖ͨ͘ͳ͍ ղౚ on-line local using public data requires retrieving detailed metadata to control the quality of sequencing
• ϦʔυใͷՃ (ϦʔυɼϦʔυɼΤϥʔɼetc.) • ʮϋζϨʯΛආ͚Δ͜ͱͰDL/ղౚͷίετΛݮ • ༧ΊΫΦϦςΟΛ֬ೝ͢Δ͜ͱͰQCॲཧΛলུ an approach from the database: improving data search system with method description from papers as metadata
ʮͳ͍ͷΛ୳͠ଓ͚Δʯ͜ͱΛ͙ • ʮಉ͡ͷ͕ෳ͋ΔͳΒྑ͍ํΛ͍͍ͨʯΛαϙʔτ͢Δ • ݕࡧͷࣗಈԽαϙʔτ ղౚ 2$ on-line local “retrieving data that works for one’s study from the public database with minimum effort”
so good: amount and variation of data, data distribution to various public DB, insufficient quality of metadata, difficulty with linking data to publication
0 50000 100000 150000 200000 total publication #sample 0 100000 200000 300000 400000 total publication #run 115440 3059 194338 31787 376904 51202 26.5% 16.4% 13.6% not all the published data has paper publication (or never update after the first data submission)
࣭ͷߴ͍σʔλొΛͯ͘͠ΕΔݚڀऀʹʮ͝རӹʯΛ! • ࠓϝλใͷ࣭ʮળҙϕʔεʯ • จ͕cite͞ΕΔɼάϥϯτ͕औΕΔͳͲͷධՁʹܨ͛Δඞཁ͕͋Δ Improving the DB ecosystem to make submission with high-quality metadata easy, giving rewards to researchers who made highly cited submission, etc.
σʔλΛไೲ͢Δਓʑʹ͝རӹΛ! • ળҙϕʔεͰݶք͕͋ΔͷͰ࣭ͷߴ͍σʔλΛެ։͢ΔΠϯηϯςΟϒ͕ඞཁ Summary: well-designed sequencing project for highly reusable data, make an incentive to submit high-quality metadata