Slide 1

Slide 1 text

Jeremy Goecks Galaxy Community Conference 2015: Visualization Workshop Aysam Guerler and Carl Eberhard

Slide 2

Slide 2 text

Recommended Web Browsers • Chrome will probably work best • Updated Safari/Firefox should work well • Internet Explorer and old versions of Safari/Firefox may have problems 2

Slide 3

Slide 3 text

Topics • Visualization history and introduction • Numerical Visualizations • Biological Visualizations • Adding your own visualizations 3

Slide 4

Slide 4 text

Why Visualize? 4

Slide 5

Slide 5 text

Why Visualize? • Quick check: did it work? • Exploration and hypothesis generation • Sharing/publishing 5

Slide 6

Slide 6 text

Anscombe’s Quartet 6 http://en.wikipedia.org/wiki/Anscombe's_quartet Property Value Mean x 9 Variance x 11 Mean y 7.5 Variance y ~4.125 Correlation 0.816 Linear regression y = 0.5x + 3

Slide 7

Slide 7 text

Timeline of Visualization in Galaxy 7 2005 2015 2010 1st Galaxy paper published Visualization development started 2011 1st visualization paper published 2008 Display applications

Slide 8

Slide 8 text

Timeline of Visualization in Galaxy 8 2005 2015 2010 1st Galaxy paper published Visualization development started 2011 1st visualization paper published 2008 Display applications 1. visualization in Galaxy is nascent 2. you will be working with awesome new features 3. there may be bugs — help us fix them!

Slide 9

Slide 9 text

Workshop Goals • Participants: learn about how to visualize your data in Galaxy ✦ biological visualizations ✦ numerical visualizations ✦ what Galaxy is doing underneath the covers • Instructors: feedback from you about what you like, don’t like, and where to go next 9

Slide 10

Slide 10 text

Galaxy Visualizations • Visualizations are first-class objects in Galaxy, just like tools • A visualization can be added to Galaxy via a configuration file that specifies: ✦ datasets that can be used ✦ location of visualization code (client-side or on server) • Galaxy handles visualization integration and data management ✦ users can focus on analyzing data ✦ developers can focus on creating visualizations 10

Slide 11

Slide 11 text

Visualizations are 1st class Galaxy objects • Can be saved and versioned for reproducibility • Have a human-readable URL for sharing a fully interactive visualization: 
 http://usegalaxy.org/u/jgoecks/v/tumor-mutations • Can embed interactive visualizations in online supplementary materials via Galaxy Pages 11

Slide 12

Slide 12 text

Visualization Architecture • Client-server architecture • Lots of moving pieces ✦ prepare/process data on server ✦ send to client ✦ render on client 12

Slide 13

Slide 13 text

Topics • Visualization history and introduction • Numerical Visualizations • Biological Visualizations • Adding your own visualizations 13

Slide 14

Slide 14 text

• Analysis goal: what similarities and differences can be found in cancer cell lines using exome and transcriptome sequencing? 14

Slide 15

Slide 15 text

Sequencing and Analysis • Sequenced exomes and transcriptomes of 3 pancreatic cancer cell lines ✦ MiaPaCa2, HPAC, and PANC-1 • Datasets available in published history: ✦ Exome subset: KRAS, STK11, ERBB2 aligned reads, removed dups, created read pileup ✦ transcriptome subset: KRAS, STK11, ERBB2 aligned reads ✦ gene fusions from all cell lines ✦ whole transcriptome aligned reads coverage ✦ (gene annotation) 15

Slide 16

Slide 16 text

Display Applications 16

Slide 17

Slide 17 text

Display Applications 17 1 Used throughout slides to show actions to take

Slide 18

Slide 18 text

Display Applications 18

Slide 19

Slide 19 text

Display Applications • Advantages ✦ use familiar tools ✦ easy to view your data alongside public datasets • Disadvantages ✦ cannot easily share/version visualization ✦ many more visualizations than display applications in Galaxy ✦ no data processing or visual analysis, only visualization 19

Slide 20

Slide 20 text

Trackster—Galaxy’s Genome Browser 20

Slide 21

Slide 21 text

• Genome browsers are a foundational genome visualization tool • Trackster is for the high-throughput sequencing era ✦ very large datasets, numerous simultaneous tracks ✦ maximum flexibility for customization (e.g. rainbow tracks) ✦ 2-3 indices per dataset for fast visualization • BED, GFF/GTF, interval, SAM/BAM, VCF, Wiggle, BigWig, BigBed, BedGraph 21 Trackster—Galaxy’s Genome Browser

Slide 22

Slide 22 text

Let’s visualize our data in Trackster • 1. Create visualization • 2. Add gene annotation (RefSeq) • 3. Save visualization • 4. Exit • 5. Reopen visualization 22

Slide 23

Slide 23 text

Let’s visualize our data in Trackster • 1. Create visualization 23 1

Slide 24

Slide 24 text

Let’s visualize our data in Trackster • 1. Create visualization 24 1 2

Slide 25

Slide 25 text

Let’s visualize our data in Trackster • 1. Create visualization • 2. Add gene annotation (RefSeq) 25 1 2 3

Slide 26

Slide 26 text

Let’s visualize our data in Trackster • 1. Create visualization • 2. Add gene annotation (RefSeq) 26

Slide 27

Slide 27 text

Let’s visualize our data in Trackster • 1. Create visualization • 2. Add gene annotation (RefSeq) • 3. Save visualization 27 1

Slide 28

Slide 28 text

Let’s visualize our data in Trackster • 1. Create visualization • 2. Add gene annotation (RefSeq) • 3. Save visualization • 4. Exit 28 1

Slide 29

Slide 29 text

Let’s visualize our data in Trackster • 1. Create visualization • 2. Add gene annotation (RefSeq) • 3. Save visualization • 4. Exit • 5. Reopen visualization 29 1

Slide 30

Slide 30 text

Behind the Scenes • Galaxy is indexing datasets for ✦ viewing large genomic regions (coverage plots) ✦ viewing small genomic regions (getting individual data points) ✦ feature names and locations • Indexes is the primary way that big datasets are visualized quickly 30

Slide 31

Slide 31 text

Display Modes • Tracks can be displayed differently ✦ coverage to individual features ✦ similar language to UCSC
 • Let’s try different modes ✦ this is fast because data is sent from Galaxy server and rendered in your Web browser
 
 
 
 
 
 31

Slide 32

Slide 32 text

Searching • Can search for named features such as gene annotations ✦ BED, GFF/GTF • Let’s try searching for a gene: ERBB2 
 
 
 
 
 
 32

Slide 33

Slide 33 text

Let’s Call Variants • VarScan ✦ Sample names: MiaPaCa2, PANC1, HPAC ✦ Run • Rename output: “Cell line variants”
 
 
 
 
 33 1 2 3

Slide 34

Slide 34 text

Let’s Assemble Transcripts • Cufflinks ✦ select transcriptome datasets ✦ run • Rename assembled transcripts for MiaPaCa2: “MiaPaCa2 Assembled Transcripts” 34 1 2

Slide 35

Slide 35 text

Let’s add data to Trackster • Add exome data for all cell lines and called variants… • …but where is our data?
 
 
 
 
 
 
 35 1 2 3

Slide 36

Slide 36 text

Let’s add data to Trackster • Add exome data for all cell lines and called variants… • …but where is our data?
 
 
 
 
 
 
 Save again 36 1

Slide 37

Slide 37 text

Circster • Interactive Circos plot • Whole genome view with structural variation 37

Slide 38

Slide 38 text

Let’s view our data in Circster 38 1 1

Slide 39

Slide 39 text

Let’s view our data in Circster 39

Slide 40

Slide 40 text

Let’s view our data in Circster 40 • Double-click or use trackpad to zoom in • Drag around using mouse/trackpad • What do we see? 1

Slide 41

Slide 41 text

Let’s view our data in Circster 41 • Change min/ max by clicking on labels • What do we see? 1 2

Slide 42

Slide 42 text

Let’s add data to Circster and adjust options • 1. Add transcriptome coverage data 42 1

Slide 43

Slide 43 text

Let’s add data to Circster and adjust options • 1. Add transcriptome coverage data 43 1 2

Slide 44

Slide 44 text

Let’s add data to Circster and adjust options • 1. Add transcriptome coverage data 44

Slide 45

Slide 45 text

Let’s add data to Circster and adjust options • 2. Change arc dataset height 45 1 2

Slide 46

Slide 46 text

Let’s add data to Circster and adjust options • 2. Change arc dataset height 46

Slide 47

Slide 47 text

Let’s add data to Circster and adjust options • 3. Change max for tracks 47

Slide 48

Slide 48 text

Let’s add data to Circster and adjust options • 3. Change max for tracks what do we see? 48

Slide 49

Slide 49 text

Let’s add data to Circster and adjust options • Add gene fusions
 
 
 
 
 
 49 1 2

Slide 50

Slide 50 text

Let’s add data to Circster and adjust options • Add gene fusions
 
 
 
 
 
 50

Slide 51

Slide 51 text

Let’s add data to Circster and adjust options • 4. Save visualization 51 1

Slide 52

Slide 52 text

Back to Trackster: Rainbow Track for Coverage • 1. Remove gene fusions track • 2. Navigate to ERBB2 gene • 3. Create group • 4. Add transcriptome coverage tracks to group • 5. Create composite track • 6. Adjust max • 7. what do we see? 52

Slide 53

Slide 53 text

Back to Trackster: Rainbow Track for Coverage • 1. Remove gene fusions track 53 1

Slide 54

Slide 54 text

Back to Trackster: Rainbow Track for Coverage • 2. Navigate to ERBB2 gene 54 1

Slide 55

Slide 55 text

Back to Trackster: Rainbow Track for Coverage • 3. Create group 55 1

Slide 56

Slide 56 text

Back to Trackster: Rainbow Track for Coverage • 4. Add transcriptome coverage tracks to group 56

Slide 57

Slide 57 text

Back to Trackster: Rainbow Track for Coverage • 5. Create composite track 57 1

Slide 58

Slide 58 text

Back to Trackster: Rainbow Track for Coverage • 6. Adjust max and height; change name
 
 
 
 
 
 58 1 2

Slide 59

Slide 59 text

Add More Data • Add RNA-seq mapped reads, variants, and assembled transcripts • Look at ERBB2 ✦ bookmark • Look at STK11 ✦ bookmark • Look at KRAS ✦ bookmark 59

Slide 60

Slide 60 text

Share and Publish 60 1

Slide 61

Slide 61 text

Share and Publish 61 1

Slide 62

Slide 62 text

Share and Publish 62

Slide 63

Slide 63 text

Demo: Visual Analysis 63

Slide 64

Slide 64 text

Topics • Visualization history and introduction • Numerical Visualizations • Biological Visualizations • Adding your own visualizations 64

Slide 65

Slide 65 text

Create&Tabular&Results Visualize&with&Galaxy!Charts Use&Galaxy What&is&Galaxy&Charts?

Slide 66

Slide 66 text

Import&data&files Click&on&Shared&Data&and&select&Data&Libraries.&Navigate&to&the&Chart&library&and&import&it& into&your&history&(data$reference:$http://dna.cs.byu.edu/treesaap$and$bacteriome.org). 2 1 3 4

Slide 67

Slide 67 text

Make&a&new&chart&(1&of&4) Wait&for&the&upload&to&complete.&Select&your&Dataset&and&click&on&the&Visualization&Icon&then& select&Charts. 3 2 1

Slide 68

Slide 68 text

Give&your&chart&a&name Name&your&chart&Unclustered&Heatmap.

Slide 69

Slide 69 text

Select&a&chart&type Double&click&on&the&Heatmap&icon.

Slide 70

Slide 70 text

Select&data&columns At&first&click&on&Row&labels&and&select&Column&2.&Then,&click&on&Draw. 1 2

Slide 71

Slide 71 text

Unclustered&Heatmap

Slide 72

Slide 72 text

Make&a&new&chart&(2&of&4) Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts. 3 2 1

Slide 73

Slide 73 text

Give&your&chart&a&name Name&your&chart&Clustered&Heatmap.

Slide 74

Slide 74 text

Select&a&new&chart&type Double&click&on&the&Clustered&Heatmap&icon.

Slide 75

Slide 75 text

Select&data&columns At&first&click&on&Row&labels&and&select&Column&2.&Then,&click&on&Draw. 1 2

Slide 76

Slide 76 text

Clustered&Heatmap Use&the&mouse&wheel&or&your&touch&pad&to&zoom&into&the&highlighted&area.

Slide 77

Slide 77 text

Enlarged&view Tooltips&popup&if&you&move&the&mouse&pointer&over&a&box.&Here&the&interaction&between& B4143&and&B3295&is&highlighted.&Click&on&Editor&again&to&further&customize&this&chart.

Slide 78

Slide 78 text

Chart&configuration Go&to&the&Configuration&tab.

Slide 79

Slide 79 text

Chart&settings Heatmap&specific&options&are&highlighted.&Feel&free&to&set&axis&labels&or&other&options.&

Slide 80

Slide 80 text

Define&a&URL&template Paste&a&database&URL&into&the&template&URL&field&and&add&the&__LABEL__&tag.&You&may&use& http://www.ncbi.nlm.nih.gov&or&any&other&database.&Click&on&Draw&to&redraw&the&chart.

Slide 81

Slide 81 text

Data&points&linked&to&web&sources Double& click& on& a& box& and& the& browser& will& open& two& new& tabs& using& the& previously& defined& URL& template.

Slide 82

Slide 82 text

Cluster&selection&and&analysis Select&one&element&from&each&highlighted&row.&What&are&the&corresponding&protein& functions?

Slide 83

Slide 83 text

Identified&protein&categories Chemotaxis RNA&Polymerase Chaperone Flagella Please&return&to&the&Editor.

Slide 84

Slide 84 text

Make&a&new&chart&(3&of&4) Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts. 3 2 1

Slide 85

Slide 85 text

Give&your&chart&a&name Name&your&chart&Score&Histogram.

Slide 86

Slide 86 text

Analyze&the&score&distribution Double&click&on&the&Histogram&icon&and&click&on&Draw.

Slide 87

Slide 87 text

Give&your&chart&a&name Click&on&Draw.

Slide 88

Slide 88 text

Export&as&PNG Click&on&Screenshot&and&select&Save&as&PNG.&Finally,&return&to&the&Editor&again. 2 1

Slide 89

Slide 89 text

Make&a&new&chart&(4&of&4) Select&your&Dataset&and&click&on&the&Visualization&Icon&then&select&Charts. 3 2 1

Slide 90

Slide 90 text

Give&your&chart&a&name Name&your&chart&Discrete&Histogram.

Slide 91

Slide 91 text

Analyze&the&protein&distribution Double&click&on&the&Discrete&Histogram&icon.

Slide 92

Slide 92 text

Add&more&data Click&on&Add&Data.

Slide 93

Slide 93 text

Select&a&second&data&group At&first&click&on&Observations&and&select&Column&2.&Then,&click&on&Draw. 1 2

Slide 94

Slide 94 text

Which&proteins&have&most&interactions? Chaperone RNA&Polymerase Done&with&Part&I.

Slide 95

Slide 95 text

Scratchbook

Slide 96

Slide 96 text

Activate&the&Scratchbook Activate&the&Scratchbook&by&clicking&on&the&above&icon.

Slide 97

Slide 97 text

Activate&the&Scratchbook Click&on&Saved&Visualizations.

Slide 98

Slide 98 text

Activate&the&Scratchbook Select&a&Visualization&and&repeat&the&process&by&selecting&Saved&Visualizations&again.

Slide 99

Slide 99 text

Scratchbook&for&multiple&charts Resize&all&visualizations&so&they&fit&into&the&screen.

Slide 100

Slide 100 text

More&Examples

Slide 101

Slide 101 text

Create&a&pie&chart Select&the&imported&datasets,&create&a&new&chart&and&select&Pie&chart.&Then,&click&on&Add& data.

Slide 102

Slide 102 text

Add&first&data&group Configure&the&Helix&frequency&column.

Slide 103

Slide 103 text

Add&second&data&group Configure&the&Beta&frequency&column.

Slide 104

Slide 104 text

Configure&the&pie&chart Configure&the&Pie&chart&as&shown&above.&Then,&click&on&Draw.

Slide 105

Slide 105 text

Configure&the&pie&chart Glutamic&acids&seem&to&fit&much&better&into&helices&than&beta&sheets.&In&other&words,& “Aspartic&and&Glutamic&Acids&are&Important&for&AlphaPhelix&Folding”,&JBSD&2007.

Slide 106

Slide 106 text

Create&a&bar&diagram Create&data&groups&for&the&following&features:&Hydrophobicity,&Membrane&frequency,& Flexibility,&Helix&frequency&and&Beta&frequency.

Slide 107

Slide 107 text

Bar&diagram&of&amino&acid&features Use&the&tooltips&to&identify&the&amino&acids&which&are&likely&to&be&found&within&membrane& proteins. Methionine Leucine

Slide 108

Slide 108 text

Topics • Visualization history and introduction • Numerical Visualizations • Biological Visualizations • Adding your own visualizations 108

Slide 109

Slide 109 text

Go&to&config/plugins/visualizations/charts& Create&a&directory&in:&charts/others/YOURVIZNAME& Add&three&files&to&this&directory:& Logo&(logo.png)& Configuration&(config.js)& Wrapper&(wrapper.js)& Add&your&visualization&to&the&list&in:&charts/types.js& Rebuild&by&typing&‘npm&install’&and&‘grunt’ Adding&your&own&Visualizations

Slide 110

Slide 110 text

Workshop Materials • Will be available on training day page: 
 http://gcc2015.tsl.ac.uk/training-day/ • For this workshop: ✦ Galaxy page on usegalaxy.org with: ✦ published history ✦ published visualization • https://usegalaxy.org/u/jeremy/p/visualization-workshop 110