2015 GCC Visualization Workshop

2015 GCC Visualization Workshop

Slides from 2015 Galaxy Community Conference (GCC) Visualization Workshop. Datasets and visualizations from the workshop are available at https://usegalaxy.org/u/jeremy/p/visualization-workshop

4f34bca33e4f7b830f5f1cb3ce26958b?s=128

Jeremy Goecks

July 06, 2015
Tweet

Transcript

  1. Jeremy Goecks Galaxy Community Conference 2015: Visualization Workshop Aysam Guerler

    and Carl Eberhard
  2. Recommended Web Browsers • Chrome will probably work best •

    Updated Safari/Firefox should work well • Internet Explorer and old versions of Safari/Firefox may have problems 2
  3. Topics • Visualization history and introduction • Numerical Visualizations •

    Biological Visualizations • Adding your own visualizations 3
  4. Why Visualize? 4

  5. Why Visualize? • Quick check: did it work? • Exploration

    and hypothesis generation • Sharing/publishing 5
  6. Anscombe’s Quartet 6 http://en.wikipedia.org/wiki/Anscombe's_quartet Property Value Mean x 9 Variance

    x 11 Mean y 7.5 Variance y ~4.125 Correlation 0.816 Linear regression y = 0.5x + 3
  7. Timeline of Visualization in Galaxy 7 2005 2015 2010 1st

    Galaxy paper published Visualization development started 2011 1st visualization paper published 2008 Display applications
  8. Timeline of Visualization in Galaxy 8 2005 2015 2010 1st

    Galaxy paper published Visualization development started 2011 1st visualization paper published 2008 Display applications 1. visualization in Galaxy is nascent 2. you will be working with awesome new features 3. there may be bugs — help us fix them!
  9. Workshop Goals • Participants: learn about how to visualize your

    data in Galaxy ✦ biological visualizations ✦ numerical visualizations ✦ what Galaxy is doing underneath the covers • Instructors: feedback from you about what you like, don’t like, and where to go next 9
  10. Galaxy Visualizations • Visualizations are first-class objects in Galaxy, just

    like tools • A visualization can be added to Galaxy via a configuration file that specifies: ✦ datasets that can be used ✦ location of visualization code (client-side or on server) • Galaxy handles visualization integration and data management ✦ users can focus on analyzing data ✦ developers can focus on creating visualizations 10
  11. Visualizations are 1st class Galaxy objects • Can be saved

    and versioned for reproducibility • Have a human-readable URL for sharing a fully interactive visualization: 
 http://usegalaxy.org/u/jgoecks/v/tumor-mutations • Can embed interactive visualizations in online supplementary materials via Galaxy Pages 11
  12. Visualization Architecture • Client-server architecture • Lots of moving pieces

    ✦ prepare/process data on server ✦ send to client ✦ render on client 12
  13. Topics • Visualization history and introduction • Numerical Visualizations •

    Biological Visualizations • Adding your own visualizations 13
  14. • Analysis goal: what similarities and differences can be found

    in cancer cell lines using exome and transcriptome sequencing? 14
  15. Sequencing and Analysis • Sequenced exomes and transcriptomes of 3

    pancreatic cancer cell lines ✦ MiaPaCa2, HPAC, and PANC-1 • Datasets available in published history: ✦ Exome subset: KRAS, STK11, ERBB2 aligned reads, removed dups, created read pileup ✦ transcriptome subset: KRAS, STK11, ERBB2 aligned reads ✦ gene fusions from all cell lines ✦ whole transcriptome aligned reads coverage ✦ (gene annotation) 15
  16. Display Applications 16

  17. Display Applications 17 1 Used throughout slides to show actions

    to take
  18. Display Applications 18

  19. Display Applications • Advantages ✦ use familiar tools ✦ easy

    to view your data alongside public datasets • Disadvantages ✦ cannot easily share/version visualization ✦ many more visualizations than display applications in Galaxy ✦ no data processing or visual analysis, only visualization 19
  20. Trackster—Galaxy’s Genome Browser 20

  21. • Genome browsers are a foundational genome visualization tool •

    Trackster is for the high-throughput sequencing era ✦ very large datasets, numerous simultaneous tracks ✦ maximum flexibility for customization (e.g. rainbow tracks) ✦ 2-3 indices per dataset for fast visualization • BED, GFF/GTF, interval, SAM/BAM, VCF, Wiggle, BigWig, BigBed, BedGraph 21 Trackster—Galaxy’s Genome Browser
  22. Let’s visualize our data in Trackster • 1. Create visualization

    • 2. Add gene annotation (RefSeq) • 3. Save visualization • 4. Exit • 5. Reopen visualization 22
  23. Let’s visualize our data in Trackster • 1. Create visualization

    23 1
  24. Let’s visualize our data in Trackster • 1. Create visualization

    24 1 2
  25. Let’s visualize our data in Trackster • 1. Create visualization

    • 2. Add gene annotation (RefSeq) 25 1 2 3
  26. Let’s visualize our data in Trackster • 1. Create visualization

    • 2. Add gene annotation (RefSeq) 26
  27. Let’s visualize our data in Trackster • 1. Create visualization

    • 2. Add gene annotation (RefSeq) • 3. Save visualization 27 1
  28. Let’s visualize our data in Trackster • 1. Create visualization

    • 2. Add gene annotation (RefSeq) • 3. Save visualization • 4. Exit 28 1
  29. Let’s visualize our data in Trackster • 1. Create visualization

    • 2. Add gene annotation (RefSeq) • 3. Save visualization • 4. Exit • 5. Reopen visualization 29 1
  30. Behind the Scenes • Galaxy is indexing datasets for ✦

    viewing large genomic regions (coverage plots) ✦ viewing small genomic regions (getting individual data points) ✦ feature names and locations • Indexes is the primary way that big datasets are visualized quickly 30
  31. Display Modes • Tracks can be displayed differently ✦ coverage

    to individual features ✦ similar language to UCSC
 • Let’s try different modes ✦ this is fast because data is sent from Galaxy server and rendered in your Web browser
 
 
 
 
 
 31
  32. Searching • Can search for named features such as gene

    annotations ✦ BED, GFF/GTF • Let’s try searching for a gene: ERBB2 
 
 
 
 
 
 32
  33. Let’s Call Variants • VarScan ✦ Sample names: MiaPaCa2, PANC1,

    HPAC ✦ Run • Rename output: “Cell line variants”
 
 
 
 
 33 1 2 3
  34. Let’s Assemble Transcripts • Cufflinks ✦ select transcriptome datasets ✦

    run • Rename assembled transcripts for MiaPaCa2: “MiaPaCa2 Assembled Transcripts” 34 1 2
  35. Let’s add data to Trackster • Add exome data for

    all cell lines and called variants… • …but where is our data?
 
 
 
 
 
 
 35 1 2 3
  36. Let’s add data to Trackster • Add exome data for

    all cell lines and called variants… • …but where is our data?
 
 
 
 
 
 
 Save again 36 1
  37. Circster • Interactive Circos plot • Whole genome view with

    structural variation 37
  38. Let’s view our data in Circster 38 1 1

  39. Let’s view our data in Circster 39

  40. Let’s view our data in Circster 40 • Double-click or

    use trackpad to zoom in • Drag around using mouse/trackpad • What do we see? 1
  41. Let’s view our data in Circster 41 • Change min/

    max by clicking on labels • What do we see? 1 2
  42. Let’s add data to Circster and adjust options • 1.

    Add transcriptome coverage data 42 1
  43. Let’s add data to Circster and adjust options • 1.

    Add transcriptome coverage data 43 1 2
  44. Let’s add data to Circster and adjust options • 1.

    Add transcriptome coverage data 44
  45. Let’s add data to Circster and adjust options • 2.

    Change arc dataset height 45 1 2
  46. Let’s add data to Circster and adjust options • 2.

    Change arc dataset height 46
  47. Let’s add data to Circster and adjust options • 3.

    Change max for tracks 47
  48. Let’s add data to Circster and adjust options • 3.

    Change max for tracks what do we see? 48
  49. Let’s add data to Circster and adjust options • Add

    gene fusions
 
 
 
 
 
 49 1 2
  50. Let’s add data to Circster and adjust options • Add

    gene fusions
 
 
 
 
 
 50
  51. Let’s add data to Circster and adjust options • 4.

    Save visualization 51 1
  52. Back to Trackster: Rainbow Track for Coverage • 1. Remove

    gene fusions track • 2. Navigate to ERBB2 gene • 3. Create group • 4. Add transcriptome coverage tracks to group • 5. Create composite track • 6. Adjust max • 7. what do we see? 52
  53. Back to Trackster: Rainbow Track for Coverage • 1. Remove

    gene fusions track 53 1
  54. Back to Trackster: Rainbow Track for Coverage • 2. Navigate

    to ERBB2 gene 54 1
  55. Back to Trackster: Rainbow Track for Coverage • 3. Create

    group 55 1
  56. Back to Trackster: Rainbow Track for Coverage • 4. Add

    transcriptome coverage tracks to group 56
  57. Back to Trackster: Rainbow Track for Coverage • 5. Create

    composite track 57 1
  58. Back to Trackster: Rainbow Track for Coverage • 6. Adjust

    max and height; change name
 
 
 
 
 
 58 1 2
  59. Add More Data • Add RNA-seq mapped reads, variants, and

    assembled transcripts • Look at ERBB2 ✦ bookmark • Look at STK11 ✦ bookmark • Look at KRAS ✦ bookmark 59
  60. Share and Publish 60 1

  61. Share and Publish 61 1

  62. Share and Publish 62

  63. Demo: Visual Analysis 63

  64. Topics • Visualization history and introduction • Numerical Visualizations •

    Biological Visualizations • Adding your own visualizations 64
  65. Create&Tabular&Results Visualize&with&Galaxy!Charts Use&Galaxy What&is&Galaxy&Charts?

  66. Import&data&files Click&on&Shared&Data&and&select&Data&Libraries.&Navigate&to&the&Chart&library&and&import&it& into&your&history&(data$reference:$http://dna.cs.byu.edu/treesaap$and$bacteriome.org). 2 1 3 4

  67. Make&a&new&chart&(1&of&4) Wait&for&the&upload&to&complete.&Select&your&Dataset&and&click&on&the&Visualization&Icon&then& select&Charts. 3 2 1

  68. Give&your&chart&a&name Name&your&chart&Unclustered&Heatmap.

  69. Select&a&chart&type Double&click&on&the&Heatmap&icon.

  70. Select&data&columns At&first&click&on&Row&labels&and&select&Column&2.&Then,&click&on&Draw. 1 2

  71. Unclustered&Heatmap

  72. Make&a&new&chart&(2&of&4) Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts. 3 2 1

  73. Give&your&chart&a&name Name&your&chart&Clustered&Heatmap.

  74. Select&a&new&chart&type Double&click&on&the&Clustered&Heatmap&icon.

  75. Select&data&columns At&first&click&on&Row&labels&and&select&Column&2.&Then,&click&on&Draw. 1 2

  76. Clustered&Heatmap Use&the&mouse&wheel&or&your&touch&pad&to&zoom&into&the&highlighted&area.

  77. Enlarged&view Tooltips&popup&if&you&move&the&mouse&pointer&over&a&box.&Here&the&interaction&between& B4143&and&B3295&is&highlighted.&Click&on&Editor&again&to&further&customize&this&chart.

  78. Chart&configuration Go&to&the&Configuration&tab.

  79. Chart&settings Heatmap&specific&options&are&highlighted.&Feel&free&to&set&axis&labels&or&other&options.&

  80. Define&a&URL&template Paste&a&database&URL&into&the&template&URL&field&and&add&the&__LABEL__&tag.&You&may&use& http://www.ncbi.nlm.nih.gov&or&any&other&database.&Click&on&Draw&to&redraw&the&chart.

  81. Data&points&linked&to&web&sources Double& click& on& a& box& and& the& browser& will&

    open& two& new& tabs& using& the& previously& defined& URL& template.
  82. Cluster&selection&and&analysis Select&one&element&from&each&highlighted&row.&What&are&the&corresponding&protein& functions?

  83. Identified&protein&categories Chemotaxis RNA&Polymerase Chaperone Flagella Please&return&to&the&Editor.

  84. Make&a&new&chart&(3&of&4) Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts. 3 2 1

  85. Give&your&chart&a&name Name&your&chart&Score&Histogram.

  86. Analyze&the&score&distribution Double&click&on&the&Histogram&icon&and&click&on&Draw.

  87. Give&your&chart&a&name Click&on&Draw.

  88. Export&as&PNG Click&on&Screenshot&and&select&Save&as&PNG.&Finally,&return&to&the&Editor&again. 2 1

  89. Make&a&new&chart&(4&of&4) Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts. 3 2 1

  90. Give&your&chart&a&name Name&your&chart&Discrete&Histogram.

  91. Analyze&the&protein&distribution Double&click&on&the&Discrete&Histogram&icon.

  92. Add&more&data Click&on&Add&Data.

  93. Select&a&second&data&group At&first&click&on&Observations&and&select&Column&2.&Then,&click&on&Draw. 1 2

  94. Which&proteins&have&most&interactions? Chaperone RNA&Polymerase Done&with&Part&I.

  95. Scratchbook

  96. Activate&the&Scratchbook Activate&the&Scratchbook&by&clicking&on&the&above&icon.

  97. Activate&the&Scratchbook Click&on&Saved&Visualizations.

  98. Activate&the&Scratchbook Select&a&Visualization&and&repeat&the&process&by&selecting&Saved&Visualizations&again.

  99. Scratchbook&for&multiple&charts Resize&all&visualizations&so&they&fit&into&the&screen.

  100. More&Examples

  101. Create&a&pie&chart Select&the&imported&datasets,&create&a&new&chart&and&select&Pie&chart.&Then,&click&on&Add& data.

  102. Add&first&data&group Configure&the&Helix&frequency&column.

  103. Add&second&data&group Configure&the&Beta&frequency&column.

  104. Configure&the&pie&chart Configure&the&Pie&chart&as&shown&above.&Then,&click&on&Draw.

  105. Configure&the&pie&chart Glutamic&acids&seem&to&fit&much&better&into&helices&than&beta&sheets.&In&other&words,& “Aspartic&and&Glutamic&Acids&are&Important&for&AlphaPhelix&Folding”,&JBSD&2007.

  106. Create&a&bar&diagram Create&data&groups&for&the&following&features:&Hydrophobicity,&Membrane&frequency,& Flexibility,&Helix&frequency&and&Beta&frequency.

  107. Bar&diagram&of&amino&acid&features Use&the&tooltips&to&identify&the&amino&acids&which&are&likely&to&be&found&within&membrane& proteins. Methionine Leucine

  108. Topics • Visualization history and introduction • Numerical Visualizations •

    Biological Visualizations • Adding your own visualizations 108
  109. Go&to&config/plugins/visualizations/charts& Create&a&directory&in:&charts/others/YOURVIZNAME& Add&three&files&to&this&directory:& Logo&(logo.png)& Configuration&(config.js)& Wrapper&(wrapper.js)& Add&your&visualization&to&the&list&in:&charts/types.js& Rebuild&by&typing&‘npm&install’&and&‘grunt’ Adding&your&own&Visualizations

  110. Workshop Materials • Will be available on training day page:

    
 http://gcc2015.tsl.ac.uk/training-day/ • For this workshop: ✦ Galaxy page on usegalaxy.org with: ✦ published history ✦ published visualization • https://usegalaxy.org/u/jeremy/p/visualization-workshop 110