Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2015 GCC Visualization Workshop

2015 GCC Visualization Workshop

Slides from 2015 Galaxy Community Conference (GCC) Visualization Workshop. Datasets and visualizations from the workshop are available at https://usegalaxy.org/u/jeremy/p/visualization-workshop

Jeremy Goecks

July 06, 2015
Tweet

More Decks by Jeremy Goecks

Other Decks in Science

Transcript

  1. Jeremy Goecks
    Galaxy Community Conference 2015:
    Visualization Workshop
    Aysam Guerler and
    Carl Eberhard

    View full-size slide

  2. Recommended Web
    Browsers

    Chrome will probably work best

    Updated Safari/Firefox should work well

    Internet Explorer and old versions of
    Safari/Firefox may have problems
    2

    View full-size slide

  3. Topics

    Visualization history and introduction

    Numerical Visualizations

    Biological Visualizations

    Adding your own visualizations
    3

    View full-size slide

  4. Why Visualize?
    4

    View full-size slide

  5. Why Visualize?

    Quick check: did it work?

    Exploration and hypothesis generation

    Sharing/publishing
    5

    View full-size slide

  6. Anscombe’s Quartet
    6
    http://en.wikipedia.org/wiki/Anscombe's_quartet
    Property Value
    Mean x 9
    Variance x 11
    Mean y 7.5
    Variance y ~4.125
    Correlation 0.816
    Linear
    regression
    y = 0.5x + 3

    View full-size slide

  7. Timeline of Visualization
    in Galaxy
    7
    2005 2015
    2010
    1st Galaxy paper published Visualization development started
    2011
    1st visualization paper
    published
    2008
    Display applications

    View full-size slide

  8. Timeline of Visualization
    in Galaxy
    8
    2005 2015
    2010
    1st Galaxy paper published Visualization development started
    2011
    1st visualization paper
    published
    2008
    Display applications
    1. visualization in Galaxy is nascent
    2. you will be working with awesome new features
    3. there may be bugs — help us fix them!

    View full-size slide

  9. Workshop Goals

    Participants: learn about how to visualize
    your data in Galaxy
    ✦ biological visualizations
    ✦ numerical visualizations
    ✦ what Galaxy is doing underneath the covers

    Instructors: feedback from you about what
    you like, don’t like, and where to go next
    9

    View full-size slide

  10. Galaxy Visualizations

    Visualizations are first-class objects in Galaxy, just like tools

    A visualization can be added to Galaxy via a configuration
    file that specifies:
    ✦ datasets that can be used
    ✦ location of visualization code (client-side or on server)

    Galaxy handles visualization integration and data
    management
    ✦ users can focus on analyzing data
    ✦ developers can focus on creating visualizations
    10

    View full-size slide

  11. Visualizations are 1st class
    Galaxy objects

    Can be saved and versioned for reproducibility

    Have a human-readable URL for sharing a fully
    interactive visualization: 

    http://usegalaxy.org/u/jgoecks/v/tumor-mutations

    Can embed interactive visualizations in online
    supplementary materials via Galaxy Pages
    11

    View full-size slide

  12. Visualization Architecture

    Client-server architecture

    Lots of moving pieces
    ✦ prepare/process data on server
    ✦ send to client
    ✦ render on client
    12

    View full-size slide

  13. Topics

    Visualization history and introduction

    Numerical Visualizations

    Biological Visualizations

    Adding your own visualizations
    13

    View full-size slide


  14. Analysis goal: what similarities and
    differences can be found in cancer cell
    lines using exome and transcriptome
    sequencing?
    14

    View full-size slide

  15. Sequencing and Analysis

    Sequenced exomes and transcriptomes of 3 pancreatic
    cancer cell lines
    ✦ MiaPaCa2, HPAC, and PANC-1

    Datasets available in published history:
    ✦ Exome subset: KRAS, STK11, ERBB2 aligned reads, removed dups,
    created read pileup
    ✦ transcriptome subset: KRAS, STK11, ERBB2 aligned reads
    ✦ gene fusions from all cell lines
    ✦ whole transcriptome aligned reads coverage
    ✦ (gene annotation)
    15

    View full-size slide

  16. Display Applications
    16

    View full-size slide

  17. Display Applications
    17
    1
    Used throughout
    slides to show
    actions to take

    View full-size slide

  18. Display Applications
    18

    View full-size slide

  19. Display Applications

    Advantages
    ✦ use familiar tools
    ✦ easy to view your data alongside public datasets

    Disadvantages
    ✦ cannot easily share/version visualization
    ✦ many more visualizations than display applications
    in Galaxy
    ✦ no data processing or visual analysis, only
    visualization
    19

    View full-size slide

  20. Trackster—Galaxy’s Genome Browser
    20

    View full-size slide


  21. Genome browsers are a foundational genome
    visualization tool

    Trackster is for the high-throughput sequencing era
    ✦ very large datasets, numerous simultaneous tracks
    ✦ maximum flexibility for customization (e.g. rainbow tracks)
    ✦ 2-3 indices per dataset for fast visualization

    BED, GFF/GTF, interval, SAM/BAM, VCF, Wiggle, BigWig,
    BigBed, BedGraph
    21
    Trackster—Galaxy’s Genome Browser

    View full-size slide

  22. Let’s visualize our data in
    Trackster

    1. Create visualization

    2. Add gene annotation (RefSeq)

    3. Save visualization

    4. Exit

    5. Reopen visualization
    22

    View full-size slide

  23. Let’s visualize our data in
    Trackster

    1. Create visualization
    23
    1

    View full-size slide

  24. Let’s visualize our data in
    Trackster

    1. Create visualization
    24
    1
    2

    View full-size slide

  25. Let’s visualize our data in
    Trackster

    1. Create visualization

    2. Add gene annotation (RefSeq)
    25
    1
    2
    3

    View full-size slide

  26. Let’s visualize our data in
    Trackster

    1. Create visualization

    2. Add gene annotation (RefSeq)
    26

    View full-size slide

  27. Let’s visualize our data in
    Trackster

    1. Create visualization

    2. Add gene annotation (RefSeq)

    3. Save visualization
    27
    1

    View full-size slide

  28. Let’s visualize our data in
    Trackster

    1. Create visualization

    2. Add gene annotation (RefSeq)

    3. Save visualization

    4. Exit
    28
    1

    View full-size slide

  29. Let’s visualize our data in
    Trackster

    1. Create visualization

    2. Add gene annotation (RefSeq)

    3. Save visualization

    4. Exit

    5. Reopen visualization
    29
    1

    View full-size slide

  30. Behind the Scenes

    Galaxy is indexing datasets for
    ✦ viewing large genomic regions (coverage plots)
    ✦ viewing small genomic regions (getting
    individual data points)
    ✦ feature names and locations

    Indexes is the primary way that big datasets
    are visualized quickly
    30

    View full-size slide

  31. Display Modes

    Tracks can be displayed differently
    ✦ coverage to individual features
    ✦ similar language to UCSC


    Let’s try different modes
    ✦ this is fast because data is sent from Galaxy server and rendered in
    your Web browser






    31

    View full-size slide

  32. Searching

    Can search for named features such as gene annotations
    ✦ BED, GFF/GTF

    Let’s try searching for a gene: ERBB2 






    32

    View full-size slide

  33. Let’s Call Variants

    VarScan
    ✦ Sample names:
    MiaPaCa2, PANC1, HPAC
    ✦ Run

    Rename output: “Cell line
    variants”





    33
    1
    2
    3

    View full-size slide

  34. Let’s Assemble Transcripts

    Cufflinks
    ✦ select transcriptome
    datasets
    ✦ run

    Rename assembled
    transcripts for
    MiaPaCa2: “MiaPaCa2
    Assembled Transcripts”
    34
    1
    2

    View full-size slide

  35. Let’s add data to Trackster

    Add exome data for all cell lines and called variants…

    …but where is our data?







    35
    1
    2
    3

    View full-size slide

  36. Let’s add data to Trackster

    Add exome data for all cell lines and called variants…

    …but where is our data?







    Save again
    36
    1

    View full-size slide

  37. Circster

    Interactive Circos
    plot

    Whole genome
    view with structural
    variation
    37

    View full-size slide

  38. Let’s view our data in Circster
    38
    1
    1

    View full-size slide

  39. Let’s view our data in Circster
    39

    View full-size slide

  40. Let’s view our data in Circster
    40

    Double-click or use
    trackpad to zoom
    in

    Drag around using
    mouse/trackpad

    What do we see?
    1

    View full-size slide

  41. Let’s view our data in Circster
    41

    Change min/
    max by clicking
    on labels

    What do we see?
    1
    2

    View full-size slide

  42. Let’s add data to Circster
    and adjust options

    1. Add transcriptome coverage data
    42
    1

    View full-size slide

  43. Let’s add data to Circster
    and adjust options

    1. Add transcriptome coverage data
    43
    1
    2

    View full-size slide

  44. Let’s add data to Circster
    and adjust options

    1. Add transcriptome coverage data
    44

    View full-size slide

  45. Let’s add data to Circster
    and adjust options

    2. Change arc dataset height
    45
    1
    2

    View full-size slide

  46. Let’s add data to Circster
    and adjust options

    2. Change arc dataset height
    46

    View full-size slide

  47. Let’s add data to Circster
    and adjust options

    3. Change max for tracks
    47

    View full-size slide

  48. Let’s add data to Circster
    and adjust options

    3. Change max for tracks what do we see?
    48

    View full-size slide

  49. Let’s add data to Circster
    and adjust options

    Add gene fusions






    49
    1
    2

    View full-size slide

  50. Let’s add data to Circster
    and adjust options

    Add gene fusions






    50

    View full-size slide

  51. Let’s add data to Circster
    and adjust options

    4. Save visualization
    51
    1

    View full-size slide

  52. Back to Trackster: Rainbow
    Track for Coverage

    1. Remove gene fusions track

    2. Navigate to ERBB2 gene

    3. Create group

    4. Add transcriptome coverage tracks to group

    5. Create composite track

    6. Adjust max

    7. what do we see?
    52

    View full-size slide

  53. Back to Trackster: Rainbow
    Track for Coverage

    1. Remove gene fusions track
    53
    1

    View full-size slide

  54. Back to Trackster: Rainbow
    Track for Coverage

    2. Navigate to ERBB2 gene
    54
    1

    View full-size slide

  55. Back to Trackster: Rainbow
    Track for Coverage

    3. Create group
    55
    1

    View full-size slide

  56. Back to Trackster: Rainbow
    Track for Coverage

    4. Add transcriptome coverage tracks to group
    56

    View full-size slide

  57. Back to Trackster: Rainbow
    Track for Coverage

    5. Create composite track
    57
    1

    View full-size slide

  58. Back to Trackster: Rainbow
    Track for Coverage

    6. Adjust max and height; change name






    58
    1
    2

    View full-size slide

  59. Add More Data

    Add RNA-seq mapped reads, variants, and assembled
    transcripts

    Look at ERBB2
    ✦ bookmark

    Look at STK11
    ✦ bookmark

    Look at KRAS
    ✦ bookmark
    59

    View full-size slide

  60. Share and Publish
    60
    1

    View full-size slide

  61. Share and Publish
    61
    1

    View full-size slide

  62. Share and Publish
    62

    View full-size slide

  63. Demo: Visual Analysis
    63

    View full-size slide

  64. Topics

    Visualization history and introduction

    Numerical Visualizations

    Biological Visualizations

    Adding your own visualizations
    64

    View full-size slide

  65. Create&Tabular&Results Visualize&with&Galaxy!Charts
    Use&Galaxy
    What&is&Galaxy&Charts?

    View full-size slide

  66. Import&data&files
    Click&on&Shared&Data&and&select&Data&Libraries.&Navigate&to&the&Chart&library&and&import&it&
    into&your&history&(data$reference:$http://dna.cs.byu.edu/treesaap$and$bacteriome.org).
    2
    1
    3
    4

    View full-size slide

  67. Make&a&new&chart&(1&of&4)
    Wait&for&the&upload&to&complete.&Select&your&Dataset&and&click&on&the&Visualization&Icon&then&
    select&Charts.
    3
    2
    1

    View full-size slide

  68. Give&your&chart&a&name
    Name&your&chart&Unclustered&Heatmap.

    View full-size slide

  69. Select&a&chart&type
    Double&click&on&the&Heatmap&icon.

    View full-size slide

  70. Select&data&columns
    At&first&click&on&Row&labels&and&select&Column&2.&Then,&click&on&Draw.
    1
    2

    View full-size slide

  71. Unclustered&Heatmap

    View full-size slide

  72. Make&a&new&chart&(2&of&4)
    Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts.
    3
    2
    1

    View full-size slide

  73. Give&your&chart&a&name
    Name&your&chart&Clustered&Heatmap.

    View full-size slide

  74. Select&a&new&chart&type
    Double&click&on&the&Clustered&Heatmap&icon.

    View full-size slide

  75. Select&data&columns
    At&first&click&on&Row&labels&and&select&Column&2.&Then,&click&on&Draw.
    1
    2

    View full-size slide

  76. Clustered&Heatmap
    Use&the&mouse&wheel&or&your&touch&pad&to&zoom&into&the&highlighted&area.

    View full-size slide

  77. Enlarged&view
    Tooltips&popup&if&you&move&the&mouse&pointer&over&a&box.&Here&the&interaction&between&
    B4143&and&B3295&is&highlighted.&Click&on&Editor&again&to&further&customize&this&chart.

    View full-size slide

  78. Chart&configuration
    Go&to&the&Configuration&tab.

    View full-size slide

  79. Chart&settings
    Heatmap&specific&options&are&highlighted.&Feel&free&to&set&axis&labels&or&other&options.&

    View full-size slide

  80. Define&a&URL&template
    Paste&a&database&URL&into&the&template&URL&field&and&add&the&__LABEL__&tag.&You&may&use&
    http://www.ncbi.nlm.nih.gov&or&any&other&database.&Click&on&Draw&to&redraw&the&chart.

    View full-size slide

  81. Data&points&linked&to&web&sources
    Double& click& on& a& box& and& the&
    browser& will& open& two& new& tabs&
    using& the& previously& defined& URL&
    template.

    View full-size slide

  82. Cluster&selection&and&analysis
    Select&one&element&from&each&highlighted&row.&What&are&the&corresponding&protein&
    functions?

    View full-size slide

  83. Identified&protein&categories
    Chemotaxis
    RNA&Polymerase
    Chaperone
    Flagella
    Please&return&to&the&Editor.

    View full-size slide

  84. Make&a&new&chart&(3&of&4)
    Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts.
    3
    2
    1

    View full-size slide

  85. Give&your&chart&a&name
    Name&your&chart&Score&Histogram.

    View full-size slide

  86. Analyze&the&score&distribution
    Double&click&on&the&Histogram&icon&and&click&on&Draw.

    View full-size slide

  87. Give&your&chart&a&name
    Click&on&Draw.

    View full-size slide

  88. Export&as&PNG
    Click&on&Screenshot&and&select&Save&as&PNG.&Finally,&return&to&the&Editor&again.
    2
    1

    View full-size slide

  89. Make&a&new&chart&(4&of&4)
    Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts.
    3
    2
    1

    View full-size slide

  90. Give&your&chart&a&name
    Name&your&chart&Discrete&Histogram.

    View full-size slide

  91. Analyze&the&protein&distribution
    Double&click&on&the&Discrete&Histogram&icon.

    View full-size slide

  92. Add&more&data
    Click&on&Add&Data.

    View full-size slide

  93. Select&a&second&data&group
    At&first&click&on&Observations&and&select&Column&2.&Then,&click&on&Draw.
    1
    2

    View full-size slide

  94. Which&proteins&have&most&interactions?
    Chaperone
    RNA&Polymerase
    Done&with&Part&I.

    View full-size slide

  95. Activate&the&Scratchbook
    Activate&the&Scratchbook&by&clicking&on&the&above&icon.

    View full-size slide

  96. Activate&the&Scratchbook
    Click&on&Saved&Visualizations.

    View full-size slide

  97. Activate&the&Scratchbook
    Select&a&Visualization&and&repeat&the&process&by&selecting&Saved&Visualizations&again.

    View full-size slide

  98. Scratchbook&for&multiple&charts
    Resize&all&visualizations&so&they&fit&into&the&screen.

    View full-size slide

  99. More&Examples

    View full-size slide

  100. Create&a&pie&chart
    Select&the&imported&datasets,&create&a&new&chart&and&select&Pie&chart.&Then,&click&on&Add&
    data.

    View full-size slide

  101. Add&first&data&group
    Configure&the&Helix&frequency&column.

    View full-size slide

  102. Add&second&data&group
    Configure&the&Beta&frequency&column.

    View full-size slide

  103. Configure&the&pie&chart
    Configure&the&Pie&chart&as&shown&above.&Then,&click&on&Draw.

    View full-size slide

  104. Configure&the&pie&chart
    Glutamic&acids&seem&to&fit&much&better&into&helices&than&beta&sheets.&In&other&words,&
    “Aspartic&and&Glutamic&Acids&are&Important&for&AlphaPhelix&Folding”,&JBSD&2007.

    View full-size slide

  105. Create&a&bar&diagram
    Create&data&groups&for&the&following&features:&Hydrophobicity,&Membrane&frequency,&
    Flexibility,&Helix&frequency&and&Beta&frequency.

    View full-size slide

  106. Bar&diagram&of&amino&acid&features
    Use&the&tooltips&to&identify&the&amino&acids&which&are&likely&to&be&found&within&membrane&
    proteins.
    Methionine
    Leucine

    View full-size slide

  107. Topics

    Visualization history and introduction

    Numerical Visualizations

    Biological Visualizations

    Adding your own visualizations
    108

    View full-size slide

  108. Go&to&config/plugins/visualizations/charts&
    Create&a&directory&in:&charts/others/YOURVIZNAME&
    Add&three&files&to&this&directory:&
    Logo&(logo.png)&
    Configuration&(config.js)&
    Wrapper&(wrapper.js)&
    Add&your&visualization&to&the&list&in:&charts/types.js&
    Rebuild&by&typing&‘npm&install’&and&‘grunt’
    Adding&your&own&Visualizations

    View full-size slide

  109. Workshop Materials

    Will be available on training day page: 

    http://gcc2015.tsl.ac.uk/training-day/

    For this workshop:
    ✦ Galaxy page on usegalaxy.org with:
    ✦ published history
    ✦ published visualization

    https://usegalaxy.org/u/jeremy/p/visualization-workshop
    110

    View full-size slide