Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bridging data analysis and interactive visualization

Bridging data analysis and interactive visualization

Clickme is an R package that lets you generate interactive visualizations directly from R. I presented the latest iteration at the 2013 IBSB conference in Kyoto

Nacho Caballero

August 02, 2013
Tweet

More Decks by Nacho Caballero

Other Decks in Technology

Transcript

  1. Bridging Data Analysis Interactive Visualization & Nacho Caballero Boston University

    I’m going to talk about data exploration, which is something that most of us do all day. We explore data to answer questions like: what genes have expression patterns that can discriminate between different types of tumor, or what are the oscillation dynamics of yeast metabolites.
  2. ? These are the big questions, but before they can

    be answered we need to tackle the little questions: what format should I use to store my data, or how should I normalize it.
  3. ? ? ? ? ? ? ? ? ? ?

    ? ? ? ? ? ? ? ? ? ? These are the big questions, but before they can be answered we need to tackle the little questions: what format should I use to store my data, or how should I normalize it.
  4. ISG20,2.378414,5.61778,14.123248,2.234574,18.635737, KCTD14,0.036147,0.01323,0.100134,0.01243,0.111105,0. SOCS1,2.297397,0.687771,2.531513,0.532185,2.584706,1 GADD45B,0.687771,0.303549,0.823591,0.496546,1.101905 TAP1,1.729074,8.282119,9.713559,1.94531,13.454343,8. TNFAIP6,0.036147,0.01323,0.046714,0.01243,0.034435,0 ARL5B,5.278032,0.742262,1.301342,2.250117,0.267943,0 CD63,5.775717,9.986644,14.320401,5.81589,17.387758,2 HSH2D,0.659754,0.965936,2.361985,0.420448,3.052518,1 TRIM25,0.493116,3.5801,7.061624,0.366021,5.278032,9.

    SIGLEC1,0.090246,0.0819,0.702222,0.01243,0.840896,2. LAP3,1.028114,2.013911,6.916298,1.265757,5.278032,19 CORO2A,0.732043,0.175556,0.697372,0.031467,0.532185, BTG3,1.853176,0.566442,0.283221,0.594604,0.189465,0. IFIT2,0.747425,1.180993,16.449821,0.417544,15.032364 SIGLEC9,0.036147,0.111105,0.219151,0.065154,0.205898 TP53,0.972655,3.810552,3.630077,0.403321,4.469149,1. IFIT3,0.325335,1.670176,16,0.173139,20.677645,65.799 NR4A2,10.410735,0.578344,1.414214,3.482202,0.858565, STAT1,3.810552,10.410735,24.590003,1.42405,23.425371 TNFRSF21,0.135842,0.094078,0.476319,0.043586,0.59460 IRF7,0.496546,1.248331,5.775717,0.267943,13.832596,7 CXCR4,41.355291,11.004335,89.884472,35.260964,59.301 RSAD2,0.036147,0.435275,4.112455,0.026096,5.314743,1 FCGR1B,0.795536,0.435275,1.443929,0.127627,1.866066, There are two types of little questions: those that require manipulating data in bulk, and those that require presenting it visually. I have found the R programming language an incredibly useful tool to work with data in bulk because it’s fast and it’s flexible. Unfortunately, R can only display static plots, which slow down the process of data exploration, wasting time that could be better spent thinking about the big questions.
  5. ISG20,2.378414,5.61778,14.123248,2.234574,18.635737, KCTD14,0.036147,0.01323,0.100134,0.01243,0.111105,0. SOCS1,2.297397,0.687771,2.531513,0.532185,2.584706,1 GADD45B,0.687771,0.303549,0.823591,0.496546,1.101905 TAP1,1.729074,8.282119,9.713559,1.94531,13.454343,8. TNFAIP6,0.036147,0.01323,0.046714,0.01243,0.034435,0 ARL5B,5.278032,0.742262,1.301342,2.250117,0.267943,0 CD63,5.775717,9.986644,14.320401,5.81589,17.387758,2 HSH2D,0.659754,0.965936,2.361985,0.420448,3.052518,1 TRIM25,0.493116,3.5801,7.061624,0.366021,5.278032,9.

    SIGLEC1,0.090246,0.0819,0.702222,0.01243,0.840896,2. LAP3,1.028114,2.013911,6.916298,1.265757,5.278032,19 CORO2A,0.732043,0.175556,0.697372,0.031467,0.532185, BTG3,1.853176,0.566442,0.283221,0.594604,0.189465,0. IFIT2,0.747425,1.180993,16.449821,0.417544,15.032364 SIGLEC9,0.036147,0.111105,0.219151,0.065154,0.205898 TP53,0.972655,3.810552,3.630077,0.403321,4.469149,1. IFIT3,0.325335,1.670176,16,0.173139,20.677645,65.799 NR4A2,10.410735,0.578344,1.414214,3.482202,0.858565, STAT1,3.810552,10.410735,24.590003,1.42405,23.425371 TNFRSF21,0.135842,0.094078,0.476319,0.043586,0.59460 IRF7,0.496546,1.248331,5.775717,0.267943,13.832596,7 CXCR4,41.355291,11.004335,89.884472,35.260964,59.301 RSAD2,0.036147,0.435275,4.112455,0.026096,5.314743,1 FCGR1B,0.795536,0.435275,1.443929,0.127627,1.866066, There are two types of little questions: those that require manipulating data in bulk, and those that require presenting it visually. I have found the R programming language an incredibly useful tool to work with data in bulk because it’s fast and it’s flexible. Unfortunately, R can only display static plots, which slow down the process of data exploration, wasting time that could be better spent thinking about the big questions.
  6. Demo rclickme.com In the year 2013, there is no reason

    why I shouldn’t be able to simply type the name of a data point and see where it shows up in my data. No technological reason prevents me from zooming in to a specific region and hovering over a point to show additional information on demand. During the past few years, a thriving community of JS developers has turned your internet browser into a very powerful visualization platform, but these advantages are just now starting to become adopted by the R community. I didn’t want to have to choose between R’s ability to work with data in bulk and JS’s ability to display data interactively, so I built an R package to get the best of both worlds. It’s called Clickme, and it’s available at rclickme.com
  7. data  <-­‐  data.frame(        x  =  c(1,  2,

     3),        y  =  c("a",  "b",  "c") ) R I encountered two major problems while trying to make both platforms talk to each other. The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.
  8. data  <-­‐  data.frame(        x  =  c(1,  2,

     3),        y  =  c("a",  "b",  "c") ) R var  data  =  [        {"x":1,  "y":"a"},        {"x":2,  "y":"b"},        {"x":3,  "y":"c"} ]; JS I encountered two major problems while trying to make both platforms talk to each other. The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.
  9. data  <-­‐  data.frame(        x  =  c(1,  2,

     3),        y  =  c("a",  "b",  "c") ) R var  data  =  [        {"x":1,  "y":"a"},        {"x":2,  "y":"b"},        {"x":3,  "y":"c"} ]; JS translate(data) Translator I encountered two major problems while trying to make both platforms talk to each other. The first was working with different data formats. This is an example of how R and JS store a matrix containing numbers and strings. Clickme converts R data to JS formatted data by using translator functions, which ensure that every data type in R has the right format in JS.
  10. data  <-­‐  data.frame(        x  =  c(1,  2,

     3),        y  =  c("a",  "b",  "c") ) var  data  =  {{  translate(data)  }}; R Template The other major problem was data reusability. How to tell the JS code responsible for generating the visualization what data to use? Clickme does this by using templates: hybrid files that contain both R and JS code. This makes it possible for the same template to be used to visualize different data sets. A template contains mostly JS code, but at critical points it has R code surrounded by double (or triple) braces. When the user asks to generate a plot, the R code is evaluated and the braces are replaced with the results, generating a visualization customized to your data.
  11. data  <-­‐  data.frame(        x  =  c(1,  2,

     3),        y  =  c("a",  "b",  "c") ) var  data  =  {{  translate(data)  }}; R var  data  =  [        {"x":1,  "y":"a"},        {"x":2,  "y":"b"},        {"x":3,  "y":"c"} ]; JS Template The other major problem was data reusability. How to tell the JS code responsible for generating the visualization what data to use? Clickme does this by using templates: hybrid files that contain both R and JS code. This makes it possible for the same template to be used to visualize different data sets. A template contains mostly JS code, but at critical points it has R code surrounded by double (or triple) braces. When the user asks to generate a plot, the R code is evaluated and the braces are replaced with the results, generating a visualization customized to your data.
  12. Clickme plots are easy to create and share The main

    reason why you should use Clickme for your daily plotting needs is that dynamic plots are as easy to generate as static plots. You simply call a function and send it a template and some data. The plots are also easy to share, simply upload them to a server, or email them.
  13. rclickme.com @nachocaballero You can try Clickme by visiting rclickme.com and

    following the instructions to install the package in R. Right now, you can only create scatter plots, but I’m working on adding more types of visualizations (line plots, heatmaps). If you have a visualization that you would like to be able to use directly from R, let me know and I’ll send you an email when the developer guide is ready. I hope Clickme helps you solve the little problems more quickly, so you can spend the extra time thinking about the big questions.