Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nagoya_Univ_Class_2018

 Nagoya_Univ_Class_2018

Yuichi Shiraishi

June 13, 2018
Tweet

More Decks by Yuichi Shiraishi

Other Decks in Science

Transcript

  1.  #;+-25 4 6013,85* ")' ! : 9 (+.7/8% 

    !+-25.7/8# -25 &%$)' ! Kataoka et al., Nature, 2016 ded uta- able ene ons un- iple uta- 12 gets edi- ving ause ding and d in g. 1, nge- tein ll as the itial NA nda U2 F2)– site o be ning 1)19. cing SR2 th a ome with three additional spliceosome-related genes, including U2AF65, SF1 and SRSF1, in a large series of myeloid neoplasms (N 5 582) using a high-throughput mutation screen of pooled DNA followed by con- firmation/identification of candidate mutations (refs 21 and 22 and Supplementary Methods II). In total, 219 mutationswere identified in 209 out ofthe582 specimens of myeloid neoplasms through validating 313 provisional positive events in the pooled DNA screen (Supplementary Tables 4 and 5). The muta- tions among four genes, U2AF35 (N5 37), SRSF2 (N5 56), ZRSR2 (N 5 23) and SF3B1 (N5 79), explained most of the mutations with much lower mutational rates for SF3A1 (N 5 8), PRPF40B (N5 7), U2AF65 (N 54) and SF1 (N 5 5) (Fig. 2). Mutations of the splicing machinery were highly specific to diseases showing myelodysplastic fea- tures, including MDS either with (84.9%) or without (43.9%) increased ring sideroblasts, chronic myelomonocytic leukaemia (CMML) (54.5%), and therapy-related AML or AML with myelodysplasia-related changes (25.8%), but were rare in de novo AML (6.6%) and myeloproliferative neoplasms (MPN) (9.4%) (Fig. 3a). The mutually exclusive pattern of the mutations in these splicing pathway genes was confirmed in this large case series, suggesting a common impact of these mutations on RNA splicing and the pathogenesis of myelodysplasia (Fig. 3b). The frequencies of mutations showed significant differences across disease types. Surprisingly, SF3B1 mutations were found in the majority of the cases with MDS characterized by increased ring sideroblasts, that is, refractory anaemia withring sideroblasts(RARS)(19/23 or 82.6%)and refractory cytopenia with multilineage dysplasia with $ 15% ring side- roblasts (RCMD-RS) (38/50 or 76%) with much lower mutation fre- quencies in other myeloid neoplasms. RARS and RCMD-RS account P to F65, U2AF35 (21q22.3) Zn UHM RS 240 aa Zn S34F(20) S34Y(5) Q157R(7) Q157P(4) ZRSR2 (Xp22.1) Zn UHM RS Zn N382K* C302R H330R N261Y I202N 483 aa I53T* N327fs G323fs W291X L237fs S40X A96fs R126X E118fs R68sp K257sp F239V E362X E148X E133G C326R PRPF40B (12q13.12) 871 aa SF3A1 Surf UbqL Surf (22q12.2) A57S I141M* Y772C 793 aa E373D T374P K166T M117I M667V RRM RS P95H(31)/L(14)/R(11) SRSF2 (17q25.1) 221 aa Y347X A26V P383L FF FF P15H* P540S D442N M58I* P212L* PR WW WW SF3B1 (2q33.1) 1,304 aa K700E(44) HD K666N(6)/T(3)/E(2)/R(2) H662Q(8)/D(2) E622D(4) Y623C R625L(2)/C(1) N626D K182E G347V D781G U2AF65 (19q13.42) UHM RS M144I R18W 475 aa L187V UHM UHM SF1 KH PR (11q13.1) Zn T474A A508G G372V Y476C T454M HD HD HD HD HD HD HD HD HD HD ARTICLE RESEARCH Yoshida et al., Nature, 2011
  2. LIFJN • *;55;<U46; C/6+AV – : 22';C20 5V – :

    • , X8YC05V • , XC2V • 1B2B;< U - ?,?B3=;V • *;<A,C,G,T;%-? 9AS$730 T • 2003:&LIFJN":>@ !#.B3) • Genome Reference Consortium: >46(KRGOEHM.B6+ A) –  <QDRGOP38)
  3.    •   – Silence mutation –

    Missense mutation – Non-sense mutation – Splicing mutation •  – Insertion, Deletion • Frameshift indel • In-frame indel – Structural variation 5’ . . . T T C . . . 3’ T T T T C C A T C Lys STOP Arg Lys silent missense nonsense
  4.     abnormal proliferation random DNA damage and

    somatic mutation Identical cells with germline DNA passenger mutation    2nd driver mutation driver mutation cancerous cell • chemicals • radiation • virus • aging
  5.     A turning point in cancer research:

    sequencing the human genome, Dulbecco, Science, 1986
  6. Cancer driver gene • Tumor suppressor –   !

      "!$DNA  % – TP53, RB1, BRCA1, 2  Hecht et al.,Cancer Treat Rev., 2015 • Oncogene –    # ! – RAS, EGFR, PIK3CA  OpenStax, Biology, OpenStax CNX. May 27, 2016
  7. - ' • 20/20 rule – Oncogene • 20%!"0./1. 

    – TSG (tumor supressor gene) • 20%truncating  • '$*,+  • Back ground mutation rate$' –  ' (TTN(%#) & 2 – GC contents, ' –   – replication timing • Software – MutSig – Music (Dees et al, Genome Research, 2012) https://confluence.broadinstitute.org/display/CGAToo ls/MutSig Vogelstein et al., Science, 2013 Fig. 4. Distribution of mutations in two oncogenes (P suppressor genes (RB1 and VHL) The distribution of missense mutations (red arrow arrowheads) in representative oncogenes and tum were collected from genome-wide studies annotat version 61). For PIK3CA and IDH1, mutations ob randomized by the Excel RAND function, and the mutations recorded in COSMIC are plotted. aa, am NIH-PA Author Manuscript NIH-PA A Vogelstein et al. NIH-PA Author Man
  8. Frequent splicing gene mutations in MDS • 29 %MDS(myelodysplasia)% whole

    exome sequencing6 • "268% somatic mutation - 6 12 %" (+ 6 – 8&MDS" $ cancer driver gene# (+!*'% (TP53, NRAS, KRAS, RUNX1). – 3 (U2AF35, SRSF2, ZRSR2) & splicing $,* (new cancer driver genes!) • splicing$*7$)5600 % 0.2413/-550% %" splicing% (+ *#-6 Yoshida et al. Nature, 2011
  9. Mutation detection using high-throughput sequencing tumor & normal DNAs from

    the same patient! exome, 5000*-$12*-$ 0.*-$50bp1 150bp)  *-$*&+,"! %') ,(,#/!% '   0
  10. ,<->:>5 • 2;?5=?6079$!$ $'$+  • $2;?5=?6 # – 100bp$=?6:

    * – 079$1-3:  bp – #%079#$*"!#(* #&)"=?6' • Reference genome – %hg38 (GRCh38) • 485./, – DNA: BWA >> $ – RNA: STAR, HISAT2, kalisto"! 100 4 ~ 60 10 9 3×10
  11. 1?2A>A9((MYC ) • WXS#)5’UTR3’UTR&%)4<B /&C +357A(589' ,-$/. *.DE • sequence

    depth: .6;=(04<B!".@B:( • Mean sequence depth)QC'* /.C')WGS#)30 ~ 40x, WXS#) 60 ~ 150x (  WGS WXS sequence depth
  12. '!)%)# (MYC) reference genome mismatch (sequence error or true genomic

    variant?) • reference genome reference  • "&*#(*$ mismatch 
  13. +-./*," 1 tumor normal •  ##target sequence"'$)% ( •

    exome&whole genome!+-.#(#$  •   !(00
  14. 59=B38+  H • 6C:>D7D, "%1 – 2 $1@D@2/1 –

    59=- !*,-2(samtools mpileup) – &- 59=- ,()@D@2'$0 -2 E;A4?<C4#)6C:>D7,"%1F chr pos ref A C G T A C G T 1 1 G 0 1 38 0 1 0 31 0 1 2 A 39 0 1 1 28 0 1 1 1 3 G 0 1 21 17 1 0 30 0 1 4 T 2 0 0 41 0 0 0 32 1 5 C 0 39 0 0 0 29 1 0 tumor normal - .@D@*#) reference* G 
  15. 686,  •  – ,1+0.$)%+!9 "!: – 286&'" ,"

    !9" !: – 686, #"%/*(-+!927 45838, "+ :
  16. somatic mutation && • sequence error – Hiseq&17:'0.1? 1% #.-",=

    – ,<%!" '<17:'$+ ,= – 3:219417:#  /,(%< $068:5 ; )sequencing error$ *> tumor normal True somatic mutation!!
  17. somatic mutation22  • ?HBHA247 ,% – 22 $<2 3J

    16+-%& 09 (25% ~ 75%)K – ?HBHA3 1#70!'/5 !K – @D>GIH2 In pure (100%) tumor cells tumor tumor DNA2FIC2"* $; < .!9'/$(:9K In 50%tumor cells =EI/2$68)&09KK
  18. somatic mutation   • sequence depth  – 

    ) "(! '#$%# * • high GC contents region informative (high sequencing depth) region tumor tumor "(! '# &( * non-informative (low sequencing depth) region
  19. Fisher% !-7;=3 • mutation call&  – depth&(* (%exome sequencing%

    "> – 2=/.<1=&.8=?alignment&.8―. – 0<5<4&68 @ • tumor$normal#-:9$ -:9& ,Fisher’s exact test %)+@ – '0.01* • &9=9& – Fisher p-value < 0.01 – Normal, tumor&depth > 10 – Tumor&mismatch10% – Normal&mismatch3% ref variant tumor 7 6 normal 14 1 p-value = 0.02862 tumor normal : base different from the reference genome
  20. Fisher&(+,' •   – p-value < 0.001#$ 0 -.sanger

    " validate%/ – 00.01 < p-value < 0.05!11 • sequencing depth, &*) 1 – !00.01#0.001 1 ref alt tumor 75 30 normal 80 1 p-value = 1.32e-7 ref alt tumor 20 7 normal 18 0 p-value = 0.03132 ref alt tumor 90 5 normal 70 0 p-value = 0.07306 almost surely true! low sequencing depth! low variant allele frequency! P < 0.001 0.001<p<0.01 0.01<p<0.05 95.7% 64.3% 29.0% accuracy rate
  21. !$" ref alt tumor 27 6 normal 33 0 ref

    alt tumor 14 5 normal 21 0 chr19:19646174 C -> A (p-value=0.024) chr19:42867205 A -> C (p-value=0.018)   #$&   %%   #$&  #$ '
  22. EBCall (Shiraishi et al., NAR, 2013) ref alt tumor 14

    5 normal 21 0 chr19:42867205 A -> C (p-value=0.018) •  4control (matched control) ,&052%44control6  )9 • 351"!.+EFEGG • ()204non-matched control4 "-mismatch$3%4control $3 • '<2(4=BDA?2 06 *:2835%!%$- ! • EBCall (Shiraishi et al.) • C>@3783 0#/ 2EFE; 
  23. structural variation • point mutation, short indel/0 '%!1 .5+#*56 •

    234 &"$1 -6 •  * junction intron(1)%5whole genome sequencing),%6
  24. Soft Clipping • =@9-,4<5?;?8 2C4<5?;?8 2+%# 0/21A#$! )!'. 2'1DB •

    -4<5?;?87@>.soft clipping36:@8!'1D • 75bpE-=@9,&'.Csoft clipping 2#=@93 4<5?;?8"1)(C*- -+-3" 1)D
  25.  evidence improper read pair soft clipping read sequence depth

    change tumor normal • Soft clipping read, improper read pairchr2:286238293  • Sequence depth • control 
  26. Structural variation9  • Structural variation97  – 9 A

    • Soft clipping read519 =- • Improper read pair • Sequence depth (.?:@7), )K – (>&GJG'A;3$CEFHBD8GJGA 20<9A – 6*/3<GJG9!", #87>2;8$7 +7+ I9A20<9:$ 8<4-3)7)%%%
  27. ATLG=?KPD-L1H3’UTR=?KSVH$ • T)-%0#(adult T-cell Leukemia)HPRT3(Kataoka et al., Nature Genetics, 2015)G=

    ;DYPD-L1H3’UTR6G1 ESVM • (27%H,EZ • SVH!H @<H +I9E:CBZ • SVH:KG=;DIY7 /G8;PD-L1H$"5>2 JLBZ • ATLFI • HTLV-1ONVQM FAK%0#XUWS.E:KZ • EI*&F Y EI '4 G;Z Kataoka, Shiraishi, Takeda et al., Nature, 2016
  28. TCGA?U`S:QN\`W_O • Pan-Cancer:<?1C,PD-L1?3’UTR? /%CFE .ee • TCGA?10,210?RNA-seqa33/I!b:QN\`W_Od • ATL?U`S:QN\`W_OH #3c

    *>U`S H%=/C'd • QN\`W_O?*>36 – PD-L1?) – PD-L1?$( ? – PD-L1?exon4;3’UTR?G?)? • ;D+-5c"A?: – 8..86B?@cIGV:9YW[J]: &d – 4ELMR`Z, PXZB:0E72TK_^`V39 &d
  29. TCGA#0  • 10,210 TCGA RNA-seq 1HGC&/2 • 0*, ',0*.0

    4 1. /(+ /-0% 2. !"1/(+QC3/+/%5)/%52 3. /(+0$ 4. Genomon2 RNA
  30. TCGA(&.0*/'  • PD-L1 1 SV!#"2 • PD-L1-%*),SV" #"2 •

     • B ./+38%1 $32% Kataoka, Shiraishi, Takeda et al., Nature, 2016
  31. Short summary • PD-L103’UTR0SV0 8%+ – 0!9,2D 0,0! 365$-!7 )'E

    – Structural variation(#,1.";:A>2 . 8'&$-!7 )'E • !9/+DPD-L103’UTR0 !D PD-1 0/*+0?:< @B=B/.45C
  32.      Standard Model of Computational Analysis

    Local Data U N I V E R S I T Y U N I V E R S I T Y Locally Developed Software Publicly Available Software Local storage and compute resources Network Download Public Data https://www.genome.gov/multimedia/slides/tcga4/23_davidsen.pdf
  33. -9837!GAI7% • AL?7 ! – TCGA7AL?*#32.5PB (2015, 5N • RNA-seq7bamD:;I2,3&70TB

    – 8/@<KJLC* ''' – FHL>;B7*&6$.(' • 09197=ILE3&TCGA7AL?7 "* M( .*3+5(N' • TCGA47* OO – 7 +( 2,.)& !*3 +5(''''
  34. $   #% Co-located Compute & Data API Data

    Access Security Resource Access Core Data (TCGA) User Data Computational Capacity Standard tools User uploaded tools https://www.genome.gov/multimedia/slides/tcga4/23_davidsen.pdf ('&(    !"()
  35. Democratize Cancer Genomics! • NCI cloud pilot –  

     –     www.isb-cgc.org Institute for Systems Biology The goals of the NCI Cloud Pilots are to democratize access to NCI-generated genomic and related data, and to create a cost-effective way to provide scalable computational capacity to the cancer research community. The Institute for Systems Biology (ISB) Cloud provides interactive and programmatic access to data, leveraging many aspects of the Google Cloud Platform. The interactive ISB-CGC web-app allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes or pathways of interest, and share insights with collaborators. For computational users, programmatic interfaces and GCP tools such as BigQuery, Genomics, and Compute Engine allow users to perform complex queries from R or Python scripts, or run Dockerized workflows on sequence data available in cloud storage. www.isb-cgc.org Institute for Systems Biology Seven Bridges Genomics www.cancergenomicscloud.org The goals of the NCI Cloud Pilots are to democratiz genomic and related data, and to create a cost-effec computational capacity to the cancer rese The Institute provides inte data, leveragi Cloud Platfor allows scienti compare coh data for speci and share ins computationa and GCP tool Compute Eng queries from Dockerized w in cloud stora Seven Bridge Cloud enable analysis of lar secure, repro rich query sy exact data of own private d Common Wo makes it easy bench biologi reproducible genomics dat www.cancergenomicscloud.org Broad Institute www.firecloud.org own private Common W makes it ea bench biolo reproducib genomics d Broad Insti Firehose an facilitates c scalable pla at-large. Us Google Clou tool develo perform lar curation, an upload thei workspaces tools and p
  36. Genomon! % 56 • Python (2.7.10) • Perl (5.14.4) •

    R (3.3.1) • bwa (0.7.8) • blat (v34) • samtools (1.2) • Biobambam (0.0.191) • PCAP-core (20150511) • htslib (1.3) • bedtools (2.24.0) • GenomonPipeline (2.5.3) • GenomonSV (0.4.2rc) • GenomonFisher (0.2.0) • GenomonMutationFilter (0.2.1) • EBFilter (0.2.1) • GenomonPostAnalysis (1.4.0) • GenomonQC (2.0.1) • GenomonExpression (0.3.0) • fusionfusion (0.3.0) • paplot (0.5.5) • sv_utils (0.4.0b2) • annot_utils (0.1.0) • fusion_utils (0.2.0   #"#$ OS   &
  37.    • HGC  •   https://mycloudblog7.wordpress.com

    • Amazon AWS • Google Cloud • Microsoft Azure • Seven Bridge Genomics
  38. Microsoft AzureGenomon2 RNA"  $2016 9 % • 774(Cancer Cell

    Line Encyclopedia (CCLE)) RNA-seq   " • STAR + fusionfusion (https://github.com/Genomon- Project/fusionfusion) • !230#! By  https://www.microsoft.com/ja- jp/casestudies/imsut.aspx  
  39. Extraction Transfer Load (ETL) approach VM VM VM &( $')

    sequence data 1 sequence data 2 sequence data 3 analytical result 1 analytical result 2 analytical result 3 1. Virtual Machine (VM)  3. VMdocker("%   4.  VM  $')  2. #)! $')VM  5. VM • dsub (google cloud platform) • Amazon AWS Batch • Azure Batch
  40. Extraction Transfer Load (ETL) approach $ awsub ¥ --tasks ./my-samples.csv

    ¥ --script ./my-workflow.sh ¥ --image my/work-image ¥ --platform aws ¥ --verbose • Amazon AWS, Google Cloud)-#'.* ETL batch job engine1 • SpotInstance, preemptible VM1 • https://github.com/otiai10/awsub • +..  •  Docker image •  !,(%/ -!,(%0 • $."$."&! TSV'-
  41. Successive ETL as a Pipeline (SEaaP) • https://github.com/Genomon-Project/genomon_pipeline_cloud 61 VM

    VM VM      fastq 1 fastq 2 fastq 3 VM VM VM fastq 1 fastq 2 fastq 3 bam 1 bam 2 bam 3 vcf 1 vcf 2 vcf 3 bam 1 bam 2 bam 3 bam 1 bam 2 bam 3 vcf 1 vcf 2 vcf 3 $ genomon_pipeline_cloud sample.csv bucket param.cfg
  42. Cloud genome analytical workflow Dockstore: https://dockstore.org GA4GH: Containers and Workflows

    working group Common Workflow Language: http://www.commonwl.org
  43. “bring the analysis to the data” 63 • 9FVECUGLVI@JBUSVM/40>-6)87=33&># •

    GA4CH598('4"LVI9:LVI@0>2,5:7+"0> 1;9$ %@0>-6);<?>6.?4'># • IaaS EQBM 5&=")9TVENSV@ 5*>-6# • LVI9HDPRKAV@$ %83-6# • LVI9)8$!%5*>FGKO@0>-6# Data Bio-sphere; by Benedict Paten
  44. 2D.=+  ,@E4I5CH (SeqPod) • 5I20H6;I8 $ 1. 7?<./,+$GI1F #

    9.HGI=-H6<IF 2. 7?<./,+( 3. 5I20H6;I8+=D:3&=G: @ 4. >:20H=!Amazon2D.=$ A5H'%(J 5. *(" BIF! &) (J
  45. Short summary • 5G32F7;G:.#3B0= !-, %$* ;G:(/38 7(',H;G:51/CF4)I • >G=01/OS'&*(%'

    +9@<01/)?G6AF* (' ,H EG3@DG)51/CF4)I –   .reproducible(", % (
  46. 9<5= • HNGFMI 9%8?@&(7, 88: 4-A?*871/' –  – Indel

    – Structural variation – <0 982)3:O.;67 ,$C4)7)9,PPP • !8+*A!9GLEJB =/DMKL82)3><0<0#", '