Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Genomon2 Tutorial

Yuichi Shiraishi
November 19, 2017

Genomon2 Tutorial

Genomon 2 Tutorial held on Oct. 30th, 2017 (in Japansese).

Yuichi Shiraishi

November 19, 2017
Tweet

More Decks by Yuichi Shiraishi

Other Decks in Science

Transcript

  1. Nature 2 Science 1 N Engl J Med 1 Nature

    geneGcs 8 Blood 6 Nat Commun 2 LETTER doi:10.1038/nature18294 Aberrant PD-L1 expression through 3′-UTR disruption in multiple cancers Keisuke Kataoka1*, Yuichi Shiraishi2*, Yohei Takeda3*, Seiji Sakata4, Misako Matsumoto3, Seiji Nagano5, Takuya Maeda5, Yasunobu Nagata1, Akira Kitanaka6, Seiya Mizuno7, Hiroko Tanaka2, Kenichi Chiba2, Satoshi Ito2, Yosaku Watatani1, Nobuyuki Kakiuchi1, Hiromichi Suzuki1, Tetsuichi Yoshizato1, Kenichi Yoshida1, Masashi Sanada8, Hidehiro Itonaga9, Yoshitaka Imaizumi10, Yasushi Totoki11, Wataru Munakata12, Hiromi Nakamura11, Natsuko Hama11, Kotaro Shide6, Yoko Kubuki6, Tomonori Hidaka6, Takuro Kameda6, Kyoko Masuda5, Nagahiro Minato13, Koichi Kashiwase14, Koji Izutsu15, Akifumi Takaori-Kondo16, Yasushi Miyazaki10, Satoru Takahashi7, Tatsuhiro Shibata11,17, Hiroshi Kawamoto5, Yoshiki Akatsuka18,19, Kazuya Shimoda6, Kengo Takeuchi4, Tsukasa Seya3, Satoru Miyano2 & Seishi Ogawa1 Successful treatment of many patients with advanced cancer using antibodies against programmed cell death 1 (PD-1; also known as PDCD1) and its ligand (PD-L1; also known as CD274) has highlighted the critical importance of PD-1/PD-L1-mediated immune escape in cancer development1–6. However, the genetic basis for the immune escape has not been fully elucidated, with the exception of elevated PD-L1 expression by gene amplification and utilization of an ectopic promoter by translocation, as reported in Hodgkin and other B-cell lymphomas, as well as stomach adenocarcinoma6–10. Here we show a unique genetic mechanism of immune escape caused by structural variations (SVs) commonly disrupting the 3′ region of the PD-L1 gene. Widely affecting multiple common human cancer types, including adult T-cell leukaemia/lymphoma (27%), diffuse large B-cell lymphoma (8%), and stomach adenocarcinoma (2%), these SVs invariably lead to a marked elevation of aberrant PD-L1 transcripts that are stabilized by truncation of the 3′-untranslated region (UTR). Disruption of the Pd-l1 3′-UTR in mice enables immune evasion of EG7-OVA tumour cells with elevated Pd-l1 expression in vivo, which is effectively inhibited by Pd-1/Pd-l1 blockade, supporting the role of relevant SVs in clonal selection through immune evasion. Our findings not only unmask a novel regulatory mechanism of PD-L1 expression, but also suggest that PD-L1 3′-UTR disruption could serve as a genetic marker to identify cancers that actively evade anti-tumour applied to a set of WGS data from 49 cases of adult T-cell leukaemia/ lymphoma (ATL), a retrovirus-associated aggressive peripheral T-cell neoplasm15. RNA sequencing (RNA-seq) data were also available for 43 samples (Extended Data Fig. 1a and Supplementary Table 1). Genome-wide mapping of SV-associated breakpoints revealed a number of recurrent breakpoint cluster regions. Among these, the most prominent corresponded to breakpoints at chromosome 9p24.1 found in 13 (26.5%) samples, which were narrowly clustered in a 3.1 kilobase (kb) region within the 3′ region of the PD-L1 locus (Extended Data Fig. 1b and Supplementary Table 2). Depending on samples, a variety of SV types were observed, including a large deletion (n = 1), tandem duplications (n = 4), inversions (n = 4), and translocations (n = 4) (Fig. 1a and Extended Data Fig. 1c). However, irrespective of under- lying SV types, an aberrant PD-L1 allele was generated in all cases, where the authentic 3′ exons were replaced by an ectopic sequence derived from the rearranged loci (n = 12) or a short 327 base pair (bp) sequence within the last exon was inverted (ATL017). It was appar- ent that these SVs were invariably associated with markedly elevated expression of PD-L1, except for a single case (ATL068) with very low tumour content (Fig. 1b). As expected from the underlying SV structure, all overexpressed PD-L1 transcripts underwent structural alterations, which, on the basis of RNA-seq, fused varying lengths of the 5′ region of the PD-L1 sequence to a short tract of intronic or
  2. abnormal proliferaGon random DNA damage and somaGc mutaGon IdenGcal cells

    with germline DNA passenger mutaGon 2nd driver mutaGon driver mutaGon cancerous cell •  chemicals •  radiaGon •  virus •  aging
  3. Finding cancer driver genes •  cancer driver gene •  • 

    driver gene 1.  somaGc 2.  passenger mutaGon driver mutaGon 3.  •  Human Genome Project 10 paGent 1 paGent 2 paGent 3 is a cancer driver gene recurrently mutated
  4. •  •  –  100bp : –  : bp –  • 

    Reference genome –  hg38 (GRCh38) –  hg19 (GRCh37) •  –  DNA: BWA >> novoalign >> –  RNA: STAR, TopHat2 > MapSplice >> 100 4 ~ 60 10 9 3×10
  5. (MYC ) •  WXS 5’UTR 3’UTR •  sequence depth: • 

    Mean sequence depth QC WGS 30 ~ 40x, WXS 60 ~ 150x WGS WXS sequence depth
  6. (MYC ) reference genome mismatch (sequence error or true genomic

    variant?) •  reference genome reference •  mismatch
  7. •  –  –  (samtools mpileup) –  chr pos ref A

    C G T A C G T 1 1 G 0 1 38 0 1 0 31 0 1 2 A 39 0 1 1 28 0 1 1 1 3 G 0 1 21 17 1 0 30 0 1 4 T 2 0 0 41 0 0 0 32 1 5 C 0 39 0 0 0 29 1 0 tumor normal reference
  8. somaGc mutaGon •  sequence error – Hiseq 0.1 1% –  – 

    sequencing error tumor normal True somaGc mutaGon!!
  9. somaGc mutaGon •  –  (25% ~ 75%) –  –  In

    pure (100%) tumor cells tumor tumor DNA In 50%tumor cells
  10. somaGc mutaGon •  sequence depth –  •  high GC contents

    region informaGve (high sequencing depth) region tumor tumor non-informaGve (low sequencing depth) region
  11. isher •  mutaGon call –  depth ( exome sequencing – 

    alignment ―. –  •  tumor normal Fisher’s exact test –  0.01 •  –  Fisher p-value < 0.01 –  normal, tumor depth > 10 –  tumor mismatch 10% –  normal mismatch 3% ref variant tumor 7 6 normal 14 1 p-value = 0.02862 tumor normal : base different from the reference genome
  12. Fisher •  –  p-value < 0.001 sanger validate –  0.01

    < p-value < 0.05 •  sequencing depth, –  0.01 0.001 ref alt tumor 75 30 normal 80 1 p-value = 1.32e-7 ref alt tumor 20 7 normal 18 0 p-value = 0.03132 ref alt tumor 90 5 normal 70 0 p-value = 0.07306 almost surely true! low sequencing depth! low variant allele frequency! P < 0.001 0.001<p<0.01 0.01<p<0.05 95.7% 64.3% 29.0% accuracy rate
  13. ref alt tumor 27 6 normal 33 0 ref alt

    tumor 14 5 normal 21 0 chr19:19646174 C -> A (p-value=0.024) chr19:42867205 A -> C (p-value=0.018)
  14. •  (noisy loci) •  – exon SureSelect50M bait –  11 mismatch

    0.02 4 dbSNP131, 1000 genome, inhouse SNP •  5463
  15. EBCall (Shiraishi et al., NAR, 2013) ref alt tumor 14

    5 normal 21 0 chr19:42867205 A -> C (p-value=0.018) mismatch sequence error •  control (matched control) control •  •  ( )20 non-matched control mismatch 3% control 3 •  •  EBCall (Shiraishi et al.) • 
  16. MutaGons with moderate allele frequencies •  •  –  sequencing depth

    –  allele frequency •  red: true posiGve called only by EBCall, 48 •  blue: false posiGve called only by EBCall, 10 Shiraishi et al., NAR, 2013
  17. evidence improper read pair som clipping read sequence depth change

    tumor normal •  Som clipping read, improper read pair chr2:286238293 •  Sequence depth •  control
  18. Structural variaGon •  Structural variaGon –  •  Som clipping read

    •  Improper read pair •  Sequence depth ( –  – 
  19. bowGe blat duplicate Som clipping fusion fusion conGg conGg aligned

    unaligned •  •  0 •  Genomon Fusion
  20. MLL4-HBV-MLL4 fusion transcript •  human-HBV fusion transcript HBV 5’ • 

    MLL4 HBV integraGon HBV 5’ 3’ HBV fusion transcript •  2 MLL4-HBV-MLL4 PCR •  over-expression in-frame Loss-of-funcGon By Dr. Shiraishi
  21. Genomon DNA •  –  Genomon MutaGon –  Genomon-SV •  – 

    GC –  •  > genomon_pipeline dna input.csv output_dir genomon.cfg
  22. enomon MutaGon Fisher’s exact test based detection GenomonMutationFilter EBFilter ad-hoc

    •  realignment filter •  indel filter •  breakpoint filter •  simple repeat filter EBCall * ControlPanel Fisher {sample}.genomon_mutaGon.result.txt: •  Fisher {sample}.genomon_mutaGon.result.filt.txt: • 
  23. Realingment filter mutaGon reference sequence reference sequence without variant reference

    sequence with variant •  •  Tumor matched control reference read pair, variant read pair •  “read pair” double count
  24. MutaGon Double Count •  Insert size –  –  •  Duplicate

    insert size WXS, target- seq) read1 read2 adaptor
  25. EBCall control panel •  Control panel •  •  Depth • 

    •  Hot spot •  Control Panel •  20 •  10 Fisher Shiraishi et al. NAR, 2013 :
  26. DNA (Costello et al., NAR, 2013)    

                            0 200 400 600 sample #mutation signature 1 2 3  25000 50000 75000 #mutation signature 1 2 3 4 •  CpCp[AG] C>A •  Shiraishi et al., PLoS GeneGcs, 2015 TCGA (KIRC)
  27. enome Viewer •  mutaGon –  Genomon2 •  Genome Viewer • 

    ”manually curated” false posiGve •  IGV –  (p q) –  IGV (´ ω ) •  –  lacklist •  MUC6, FLG
  28. Genomon SV •  90% (Kataoka et al., Nature geneGcs, 2015)

    •  ATL •  PD-L1 Kataoka et al., Nature, 2016) •  mid-size (10bp ~ 1000bp) indel mid-size deleGon Short tandem duplicaGon
  29. TP73 exon skipping •  TP73 exon skip –  TP73 exon2,

    3 skipping –  exon2, exon3 intra genic deleGon: 5 –  splicing moGf 88 bp mid-range deleGon: 1 exon 1 exon 4 exon 1 exon 4 10 kbp long deleGon Miya23 Miya13 exon 1 exon 4 exon 1 exon 4 exon 2 88bp deleGon Kataoka t al., Nature eneGcs, 2015
  30. TCGA •  TCGA •  88 •  mutaGon (exonic with splicing)

    •  Genomon-SV •  BAP1 50% (19→30)
  31. Internal Tandem DuplicaGon •  Exon •  AML FLT3-ITD 25 – 

    Ligand •  ITD •  enomon-ITDetector –  enichi Chiba et al., BioinformaGcs, 2015 Ligand Ligand SGrewalt et al., Nature Reviews GeneGcs, 2002.
  32. FLT3 •  TCGA 140 AML FLT3-ITD –  37 FLT3-ITD – 

    16 ITD duplicaGon –  ITD 1% !! duplicated region inserted nucleoGdes
  33. wo step filtering of SV in Genomon2 •  GenomonSV ({sample}.genomonSV.result.txt)

    sv_uGls ({sample}.genomonSV.result.filt.txt). –  sv_uGls –  SV recurrent --min_junc_num 2 --max_control_variant_read_pair 10 --min_overhang_size 30 → 50 GenomonSV filtering step sv_uGls --min_tumor_allele_freq 0.07 --max_control_variant_read_pair 1 --control_depth_thres 10 --inversion_size_thres 1000 --remove_simple_repeat --min_overhang_size 100 SV_uGls(hps://github.com/friend1ws/sv_uGls)
  34. --min_junc_num •  •  juncGon read consistent •  –  , ,

    •  –  , , •  –  WGS: 1 or 2, WXS: 2 or 3, Target: 3 ~ 10
  35. SV Control panel •  GenomonSV control panel false posiGve – 

    false posiGve •  Genomon2 –  GenomonSV control panel –  Sv_filter •  20 –  •  SNV, indel breakpoint •  barcode •  --max_control_variant_read_pair 1 –  Control panel juncGon
  36. over hang size •  SV over hang region) •  over

    hang region reference genome •  ver hang region 75 ~ 150
  37. MutaGonal signature in cancer genome •  –  –  NA DNA

    repair defects •  TP53 (Pfeifer et al., 2002) –  skin cancer: C>T, CC>TT –  lung cancer (smoking history): C>A •  –  C>T on NpCpG site •  –  selecGon advantages – 
  38. MutaGonal Signature ExtracGon using NMF We can generalize Equation 2

    for all K mutation types and G genomes by expressing exposures to mutational pro- cesses and mutational catalogs as matrices (Experimental Procedures): or this equation can be simplified in a matrix form as: MzP 3 E: (Equation 3) Deciphering the Signatures of Mutational Processes Figure 1. Modeling Signatures of Mutational Processes Operative in Cancer Genomes (A) Simulated example of three mutational processes operative in a single cancer genome. The mutational catalog of the cancer genome is modeled as a linear superposition of the signatures of the three processes and the respective number of mutations contributed by each signature, plus added nonsystematic noise. (B) Simulated example illustrating mutational processes operative in a set of G cancer genomes. The mutational catalogs of these G cancer genomes can be used to decipher the signatures of N mutational processes as well as the number of mutations caused by each of the processes in each of the genomes. The extracted signatures and contributions do not allow an exact reconstruction of the original set, thus resulting in genome-specific reconstruction error. 1.  96 –  e.g., TpApG TpCpG –  4 (previous base)×4 (original base)×4 (next base) ×3 (alternate base) ÷2 (complementary strand redundancy) 2.  96 paern 3.  NonnegaGve Matrix FactorizaGon
  39. C > T at NpCpG C > A (smoking) MutaGon

    signature base on 96 categorizaGon hp://cancer.sanger.ac.uk/cosmic/signatures
  40. ProbabilisGc mutaGon signature •  SubsGtuGon paern –  carcinogen C or

    T –  •  (3071 → 18) pos -2 -1 0 1 2 A 0.2 0.05 0 0.4 0.3 C 0.3 0 0.95 0.1 0.1 G 0.05 0 0 0.1 0.4 T 0.45 0.95 0.05 0.4 0.2 alt A C G T C 0.1 0 0.4 0.5 T 0.3 0.4 0.3 0 flanking sequence alternate base C T T C A T G T G T + − (a) APOBEC signature for the independent model strand specificity + - strand 0.475 0.525 Shiraishi et al. PLoS GeneGcs, 2015
  41. •  20/20 rule –  Oncogene •  20% –  TSG (tumor

    supressor gene) •  20% truncaGng •  •  Back ground mutaGon rate –  (TTN –  GC contents, –  –  replicaGon Gming •  Somware –  utSig –  Music (Dees et al, Genome Research, 2012) hps://confluence.broadinsGtute.org/display/ CGATools/MutSig Vogelstein et al., Science, 2013 Fig. 4. Distribution of mutations in two oncogenes (P suppressor genes (RB1 and VHL) The distribution of missense mutations (red arrow arrowheads) in representative oncogenes and tum were collected from genome-wide studies annotat version 61). For PIK3CA and IDH1, mutations ob randomized by the Excel RAND function, and the mutations recorded in COSMIC are plotted. aa, am NIH-PA Author Manuscript NIH-PA A Vogelstein et al. NIH-PA Author Man
  42. bowGe blat duplicate Som clipping fusion fusion conGg conGg aligned

    unaligned •  •  0 •  Genomon Fusion
  43. bowGe blat duplicate Som clipping fusion fusion conGg conGg aligned

    unaligned Genomon Fusion STAR, mapsplice2, TopHat2 RNA-seq alignment chimeric read
  44. fusionfusion •  STAR, mapsplice2 chimeric read fusion •  genomon fusion

    •  TopHat2 •  –  6core, 32G, 1h fusion –  Genomon fusion 1 fastq chimeric read chimeric read fusion fusion filtering mapsplice2 STAR
  45. fusionfusion DU145 K562 MCF7 0 25 50 75 0 25

    50 75 fusionfusion genomon_fusion 50 75100 50 75100 50 75100 base_length #fusion comparison included not−included •  Genomon-fusion fusion golden set •  Genomon-fusion 100bp fusion •  fusionfusion 50bp •  fusionfusion Genomon- fusion fusion splicing variant