Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gene search discoverability with WikiPathways and IdeogramJS

Gene search discoverability with WikiPathways and IdeogramJS

A presentation to the WikiPathways group, discussing technical details and future ideas for the gene search recommendation UI at https://eweitz.github.io/ideogram/related-genes.

Af15066cec0a8e1a8d0fd0a3bae02965?s=128

Eric Weitz

June 09, 2021
Tweet

Transcript

  1. Gene search discoverability with WikiPathways and Ideogram.js Eric Weitz https://github.com/eweitz/ideogram

    WikiPathways meeting 2021-06-09
  2. Overview 1. Search UI design and genomics 2. Why and

    how Ideogram.js uses WikiPathways 3. How WikiPathways could improve gene search 4. Plans for Ideogram and gene search
  3. An ideogram is a drawing of a karyotype. A karyotype

    is a photograph of an organism’s genome. You can draw genomes with Ideogram.js. https://github.com/eweitz/ideogram What is an ideogram?
  4. Biomedical applications often support searching by gene symbol, like “BRCA1”

    or “ACE2”. Gene search usually returns only exact matches. Displaying related results is common in search UIs outside genomics. Mapping data to the genome enriches user experience. Showing related genes in ideograms improves scientific exploration. Gene search
  5. Help users explore and discover genomic data by widening search

    results. Challenges: • Users must know genes a priori that are related to a gene they search • Scarce screen real estate for new UI components • Visualizing search results in genomic space is useful but hard Design goal
  6. Related results in domain-specific search UIs

  7. Localizable results are mapped in generic search UIs

  8. Localizable results are mapped in generic search UIs

  9. Related genes For a given gene, it is easy to

    find two kinds of related genes: • Interacting genes: Adjacent nodes in the same biochemical pathway. • Paralogs: evolutionarily similar genes in same species. Often have comparable roles in different pathways.
  10. Related genes for RAD51 https://eweitz.github.io/ideogram/related-genes?q=RAD51

  11. Why chromosomes? • Gene expression often correlates in genomic neighborhood.

    • Knowing where something is not is useful and standard in search result maps. • Cytogenetic features (e.g. centromeres, stalks) can explain null results.
  12. Where the data comes from • Genomic coordinates for genes:

    Ensembl via MyGene.info • Interacting genes: WikiPathways • Paralogs: Ensembl All are free and open source REST APIs.
  13. Tooltips for related genes aid discovery https://eweitz.github.io/ideogram/related-genes?q=RAD51

  14. Design considerations Benefits of showing related genes in an ideogram:

    • Space efficient. Short, wide row of chromosomes uses empty page real estate. • Easy interaction. Plotting genes as large features makes them easy to see and click. • Domain specific. Users engage and recall more with tailored graphics than generic lists.
  15. Big, easily clickable features. Triggers new genomic search. Brief, noticeable

    legend and call to action Short and wide Show many genomic search results in the space of a few rows. Design at a glance
  16. Built with WikiPathways Ideogram.js uses WikiPathways REST API to get

    interactions for a given gene. • E.g. https://webservice.wikipathways.org/findInteractions?query=RAD51&format=json • Algorithm: ◦ Find interactions ◦ Filter by organism ◦ Omit non-genes ◦ Group pathways for each interacting gene • Explore the code: related-genes.js
  17. WikiPathways API is awesome • Familiar: Docs are in a

    conventional format -- Swagger -- and easily findable • Fast: returns interactions for a given gene in < 2 seconds • Feature-rich: easily filter by organism, immediately adjacent nodes
  18. WikiPathways could drastically improve gene search discoverability by improving support

    for orthologous pathways
  19. Non-human pathways are few and rough • # pathways in

    mouse is 21% that of human ....in Arabidopsis is 7% ... • Only 35 organisms • Often mapped from human, then not updated ◦ Yields less useful pathways due to disconnected interactions, etc.
  20. Source and method

  21. Interacting genes are uncommon in non-human searches Non-human Human %

    gene searches 68% 32% With 0 pathway genes 82% 61% But such searches are 2x more frequent. Better orthology support would fix this.
  22. Interacting pathway genes for ACE2: human vs. mouse

  23. Improve orthologous pathways with links Link orthologous pathways both ways:

    • Target -> source (mouse -> human) is common but not 100%. • Source -> targets (human -> mouse, chicken, etc.) would be useful.
  24. Propagate changes from source to targets 1. Bot watches all

    recent pathway changes 2. User edits a source pathway 3. Bot reads list of target pathways from links in source pathway 4. Bot updates each target pathway, using fresh map of orthologous genes
  25. Plans for Ideogram.js and gene search • Fix edge cases

    for gene labels • Overlay heatmap of gene density on chromosomes • Responsive ideogram dimensions • Search all genes in pathway
  26. Thank you! Eric Weitz (eric.m.weitz@gmail.com, https://github.com/eweitz) More about Ideogram.js: •

    Source code: https://github.com/eweitz/ideogram • Gene search recommendation UI: https://eweitz.github.io/ideogram/related-genes • 20+ examples: https://eweitz.github.io/ideogram