Galaxy Architecture 2015

Galaxy Architecture 2015

Presentation from Nate Coraor and James Taylor at Galaxy Community Conference 2015 Training Day.

3ee44f53c39bcd4bc663a2ea0e21d526?s=128

James Taylor

July 06, 2015
Tweet

Transcript

  1. Galaxy Architecture 2015 Nate, James @natefoo / @jxtx / #usegalaxy

  2. Please Interrupt! We’re here to answer your question about Galaxy

    architecture!
  3. 0. Getting Involved in Galaxy

  4. IRC: irc.freenode.net#galaxyproject GitHub: github.com/galaxyproject Trello: https://trello.com/b/75c1kASa/ Twitter: #usegalaxy, @galaxyproject

  5. Contributing All Galaxy development has moved to GitHub over the

    last year New official contribution guidelines: https://github. com/galaxyproject/galaxy/blob/dev/CONTRIBUTING.md
  6. 1. The family of /galaxyproject projects

  7. None
  8. github.com/galaxyproject/ galaxy The main Galaxy application. Web interface, database model,

    job running, etc. Also includes other web applications including the ToolShed and Reports.
  9. github.com/galaxyproject/ cloudman Galaxy CloudMan, framework and interface to orchestrate clusters

    on infrastructure clouds. Main component of Galaxy cloud images github.com/galaxyproject/ cloudlaunch CloudLaunch web application to make it easy to launch images on a cloud, drives http: //launch.usegalaxy.org
  10. github.com/galaxyproject/ tools-{devteam,iuc} Galaxy tools maintained by devteam (the PSU/Hopkins group)

    and iuc (the “Intergalactic Utilities Commission”). A variety of tools, generally of high quality, including the core tools for Galaxy main. Demonstrates current tool development best practices - development on github and then deployed to test/main ToolSheds.
  11. github.com/galaxyproject/ docker-build Build Galaxy Tool dependencies for the ToolShed in

    Docker containers Build Galaxy framework dependencies as Python wheels
  12. github.com/galaxyproject/ planemo Command line utilities to assist in the development

    of Galaxy tools. Linting, testing, deploying to ToolSheds… The best practice approach for Galaxy tool development! github.com/galaxyproject/ planemo-machine Builds Galaxy environments for Galaxy tool development including Docker containers, virtual machines, Google compute images...
  13. github.com/galaxyproject/ {ansible-*, *-playbook} Ansible components to automate almost every aspect

    of Galaxy installation and maintenance. Ansible is an advanced configuration management system These playbooks are used to maintain Galaxy main, cloud images, virtual machines, ...
  14. github.com/galaxyproject/ pulsar Distributed job execution engine for Galaxy. Allows staging

    data, scripts, configuration. Can run jobs on Windows machines. Can act as its own queueing system or access an existing cluster DRM.
  15. github.com/galaxyproject/ bioblend Official Python client for the Galaxy and CloudMan

    APIs.
  16. 2. Galaxy app architecture

  17. Client: Javascript, BackboneJS models, source in `client` directory (HTTP) Galaxy

    API: RESTful HTTP API for accessing and controlling the Galaxy application Managers: Manage resources in the scope of a user transaction, abstract most logic out of controllers Models: encapsulate all persistent state (except for raw data) – users, metadata, workflows, … persisted to relational database (ideally postgres) using SQLAlchemy layer Execution: Getting jobs run. Managing possibly a wide variety of different execution engines behind a single Galaxy instance.
  18. Browser Server Universe App… controllers controllers.api HTML on the wire,

    typically from mako JSON on the wire Renderer + progressive JS Backbone.js MVC on browser The old way The new way
  19. The old way User stuff (prefs, etc) Tool forms Reports

    Tool shed *Many of these have an API but it is not yet used by the UI The new way Visualizations History Tool menu Most grids In between Workflows Data Libraries
  20. The old way User stuff (prefs, etc) Tool forms Reports

    Tool shed *Many of these have an API but it is not yet used by the UI The new way Visualizations History Tool menu Most grids In between Workflows Data Libraries (beta)
  21. galaxy.util. pastescript.serve Middleware Stack galaxy.webapps.galaxy. GalaxyWebApplication WSGI Web Server galaxy.app.

    UniverseApplication Toolbox Job Creation Model Datatypes Reg. ...
  22. galaxy.main Job Handlers Job Creation Job Handlers Job Handlers Job

    Runner galaxy.main <<thread>>
  23. 3. Galaxy components and object model

  24. Galaxy data model is not database entity driven Entities are

    defined in galaxy.model as objects SQLAlchemy is used for object relation mapping Mappings are defined in galaxy.model.mapping in two parts — a table definition and a mapping between objects and tables including relationships Migrations allow the schema to be migrated forward automatically
  25. https://wiki.galaxyproject.org/Admin/Internals/DataModel

  26. None
  27. Core components Dataset

  28. Core components: run analysis Dataset History History Dataset Association Metadata

    Tool Job
  29. Metadata Structured data Different keys/types for different datatypes Can be

    used by tools to dynamically control the tool form
  30. Core components: workflow Workflow Step Workflow Stored Workflow Workflow Module

    (Tool Module, Input Dataset, etc)
  31. Core components: workflow run Workflow Step Workflow Stored Workflow Workflow

    Module (Tool Module, Input Dataset, etc) WF Step Invocation WF Invocation Tool Job Dataset
  32. Data Libraries LibraryDataset LibraryFolder Library LDDA Dataset

  33. Data Libraries: Permissions LibraryDataset LibraryFolder Library Dataset LDDA LibraryPermission Role

    User / Group action
  34. Reference data “cache” Galaxy Data Managers Data files Tool Data

    Tables Location files Tools
  35. Visualization Plugins Adding new visualizations to a Galaxy instance -

    Configuration file (XML) - Base template (Mako) - Additional static data if needed (CSS, JS, …)
  36. Visualization Plugins: Charts

  37. Visualization Plugins: Data How do I efficiently access data for

    my viz? - Framework provides direct link to read the raw dataset - or use Data providers - In config, assert that visualization requires a given type of data providers - Data providers process data before sending to browser. Slice, filter, reformat, ...
  38. Interactive Environments Galaxy side is identical to interactive environments: config

    and base template - Within the base template, launch a Docker container running a web accessible process - Build a UI that accesses that process through a proxy
  39. Dataset Collections Hundreds or thousands of similar datasets are unwieldy,

    how do you get a handle on them? - Group datasets into a single unit - Perform complex operations on that unit - Operations are performed on each group element - Output of each operation is a new group
  40. Individual Datasets Collection Collection Contents

  41. Map/reduce in workflows

  42. Histories HDA History Dataset

  43. Dataset Collections DC Element DatasetCollection History Dataset LDDA HDA LibraryFolder

    LDCA HDCA
  44. >>> fh = open( dataset.file_path, 'w' ) >>> fh.write( ‘foo’

    ) >>> fh.close() >>> fh = open( dataset.file_path, ‘r’ ) >>> fh.read() >>> update_from_file( dataset, file_name=‘foo.txt’ ) >>> get_data( dataset ) >>> get_data( dataset, start=42, count=4096 ) Object Store
  45. Object Store Nested Store Disk Store Data Consumer Object Store

    Distributed Hierarchical Caching Store S3/Swift IRODS
  46. 4. Galaxy startup

  47. Pre-startup nate@weyerbacher% git clone https://github.com/galaxyproject/galaxy.git galaxy-stable ~ Cloning into 'galaxy-stable'...

    remote: Counting objects: 173809, done. remote: Total 173809 (delta 0), reused 0 (delta 0), pack-reused 173809 Receiving objects: 100% (173809/173809), 55.18 MiB | 11.08 MiB/s, done. Resolving deltas: 100% (137885/137885), done. Checking connectivity... done. nate@weyerbacher% cd galaxy-stable ~ nate@weyerbacher% git checkout -b master origin/master ~/galaxy-stable Branch master set up to track remote branch master from origin. Switched to a new branch 'master' nate@weyerbacher% ~/galaxy-stable
  48. First run: Initialize configs nate@weyerbacher% sh run.sh ~/galaxy-stable Initializing config/migrated_tools_conf.xml

    from migrated_tools_conf.xml.sample Initializing config/shed_tool_conf.xml from shed_tool_conf.xml.sample Initializing config/shed_tool_data_table_conf.xml from shed_tool_data_table_conf.xml.sample Initializing config/shed_data_manager_conf.xml from shed_data_manager_conf.xml.sample Initializing lib/tool_shed/scripts/bootstrap_tool_shed/user_info.xml from user_info.xml.sample Initializing tool-data/shared/ucsc/builds.txt from builds.txt.sample Initializing tool-data/shared/ucsc/ucsc_build_sites.txt from ucsc_build_sites.txt.sample . . . Initializing tool-data/sift_db.loc from sift_db.loc.sample Initializing tool-data/srma_index.loc from srma_index.loc.sample Initializing tool-data/twobit.loc from twobit.loc.sample Initializing static/welcome.html from welcome.html.sample * There are many more configs in the configs/ directory, used in their “.sample” state. shed_* are mutable configs. Start hacking with: cp galaxy.ini.sample galaxy.ini
  49. First run: Fetch eggs Some eggs are out of date,

    attempting to fetch... Fetched http://eggs.galaxyproject.org/Mako/Mako-0.4.1-py2.7.egg Fetched http://eggs.galaxyproject.org/repoze.lru/repoze.lru-0.6-py2.7.egg Fetched http://eggs.galaxyproject.org/pycrypto/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg Fetched http://eggs.galaxyproject.org/boto/boto-2.27.0-py2.7.egg Fetched http://eggs.galaxyproject.org/Paste/Paste-1.7.5.1-py2.7.egg Fetched http://eggs.galaxyproject.org/wsgiref/wsgiref-0.1.2-py2.7.egg . . . Fetched http://eggs.galaxyproject.org/pytz/pytz-2013.9-py2.7.egg Fetched http://eggs.galaxyproject.org/nose/nose-0.11.1-py2.7.egg Fetched http://eggs.galaxyproject.org/requests/requests-2.2.1-py2.7.egg Fetched http://eggs.galaxyproject.org/anyjson/anyjson-0.3.3-py2.7.egg Fetched http://eggs.galaxyproject.org/WebError/WebError-0.8a-py2.7.egg Fetched http://eggs.galaxyproject.org/twill/twill-0.9-py2.7.egg Fetch successful. * As a result of this year’ s GCC Hackathon, eggs will soon be replaced by wheels, and will be installed the usual Pythonic way (with pip)
  50. Every run: Load the application python path is: /home/nate/galaxy-stable/eggs/Babel-1.3-py2.7.egg, /home/nate/galaxy-stable/eggs/pytz-2013.9-py2.7.

    egg, . . . /home/nate/galaxy-stable/eggs/Paste-1.7.5.1-py2.7.egg, /home/nate/galaxy-stable/lib, /home/nate/. venvburrito/lib/python2.7/site-packages/pip-1.4.1-py2.7.egg, /home/nate/.venvburrito/lib/python2.7/site- packages/setuptools-12.3-py2.7.egg, /home/nate/.venvburrito/lib/python2.7/site-packages, /usr/lib/python2.7, /usr/lib/python2.7/plat-x86_64-linux-gnu, /usr/lib/python2.7/lib-tk, /usr/lib/python2.7/lib-old, /usr/lib/python2. 7/lib-dynload, /usr/local/lib/python2.7/dist-packages, /usr/lib/python2.7/dist-packages/gtk-2.0, /usr/lib/python2. 7/dist-packages galaxy.queue_worker INFO 2015-07-06 06:44:54,638 Initalizing main Galaxy Queue Worker on sqlalchemy+sqlite:///. /database/control.sqlite?isolation_level=IMMEDIATE tool_shed.tool_shed_registry DEBUG 2015-07-06 06:44:54,660 Loading references to tool sheds from . /config/tool_sheds_conf.xml.sample tool_shed.tool_shed_registry DEBUG 2015-07-06 06:44:54,660 Loaded reference to tool shed: Galaxy Main Tool Shed galaxy.app DEBUG 2015-07-06 06:44:54,660 Using "galaxy.ini" config file: /home/nate/galaxy-stable/config/galaxy. ini.sample
  51. Every run: Load database migrations migrate.versioning.repository DEBUG 2015-07-06 06:44:54,742 Loading

    repository lib/galaxy/model/migrate... migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,743 Loading script lib/galaxy/model/migrate/versions/0001_initial_tables.py... migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,743 Script lib/galaxy/model/migrate/versions/0001_initial_tables.py loaded successfully migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,743 Loading script lib/galaxy/model/migrate/versions/0002_metadata_file_table.py... . . . migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,752 Script lib/galaxy/model/migrate/versions/0128_session_timeout.py loaded successfully migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,752 Loading script lib/galaxy/model/migrate/versions/0129_job_external_output_metadata_validity.py... migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,752 Script lib/galaxy/model/migrate/versions/0129_job_external_output_metadata_validity.py loaded successfully migrate.versioning.repository DEBUG 2015-07-06 06:44:54,752 Repository lib/galaxy/model/migrate loaded successfully galaxy.model.migrate.check DEBUG 2015-07-06 06:44:54,754 pysqlite>=2 egg successfully loaded for sqlite dialect
  52. First run: Initialize database galaxy.model.migrate.check INFO 2015-07-06 06:44:54,770 No database,

    initializing galaxy.model.migrate.check INFO 2015-07-06 06:44:54,851 Migrating 0 -> 1... galaxy.model.migrate.check INFO 2015-07-06 06:44:56,383 galaxy.model.migrate.check INFO 2015-07-06 06:44:56,383 Migrating 1 -> 2... galaxy.model.migrate.check INFO 2015-07-06 06:44:56,559 . . . galaxy.model.migrate.check INFO 2015-07-06 06:46:12,074 Migrating 127 -> 128... galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 Migration script to add session update time (used for timeouts) galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 Migrating 128 -> 129... galaxy.model.migrate.check INFO 2015-07-06 06:46:13,635 galaxy.model.migrate.check INFO 2015-07-06 06:46:13,635 Migration script to allow invalidation of job external output metadata temp files galaxy.model.migrate.check INFO 2015-07-06 06:46:13,636
  53. Everything after here happens every time

  54. Load tool migrations migrate.versioning.repository DEBUG 2015-07-06 06:46:13,646 Loading repository lib/tool_shed/galaxy_install/migrate...

    migrate.versioning.script.base DEBUG 2015-07-06 06:46:13,646 Loading script lib/tool_shed/galaxy_install/migrate/versions/0001_tools.py... . . . migrate.versioning.script.base DEBUG 2015-07-06 06:46:13,647 Loading script lib/tool_shed/galaxy_install/migrate/versions/0012_tools.py... migrate.versioning.script.base DEBUG 2015-07-06 06:46:13,647 Script lib/tool_shed/galaxy_install/migrate/versions/0012_tools.py loaded successfully migrate.versioning.repository DEBUG 2015-07-06 06:46:13,647 Repository lib/tool_shed/galaxy_install/migrate loaded successfully tool_shed.galaxy_install.migrate.check DEBUG 2015-07-06 06:46:13,649 pysqlite>=2 egg successfully loaded for sqlite dialect tool_shed.galaxy_install.migrate.check DEBUG 2015-07-06 06:46:13,663 The main Galaxy tool shed is not currently available, so skipped tool migration 1 until next server startup galaxy.model.orm DEBUG 2015-07-06 06:46:13,665 pysqlite>=2 egg successfully loaded for sqlite dialect galaxy.config INFO 2015-07-06 06:46:13,675 Install database targetting Galaxy's database configuration.
  55. Load datatypes galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Loading datatypes from ./config/datatypes_conf.xml.sample

    galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.binary from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.assembly from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.text from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.data from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.binary from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.sequence from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.tabular from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.binary from the datatype registry. . . .
  56. Load datatype sniffers galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for

    datatype 'galaxy.datatypes.tabular:Vcf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.binary:Bam' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.binary:Sff' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.binary:Sra' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.xml:Phyloxml' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.xml:Owl' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence:Maf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence:Lav' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence: Fasta' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence: Fastq' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,734 Loaded sniffer for datatype 'galaxy.datatypes.images:Html' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,734 Loaded sniffer for datatype 'galaxy.datatypes.images:Pdf' . . .
  57. Load build (display) sites galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build

    site 'ucsc': tool- data/shared/ucsc/ucsc_build_sites.txt with display sites: main,test,archaea,ucla galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'gbrowse': tool- data/shared/gbrowse/gbrowse_build_sites.txt with display sites: modencode,sgd_yeast,tair,wormbase,wormbase_ws120, wormbase_ws140,wormbase_ws170,wormbase_ws180,wormbase_ws190,wormbase_ws200,wormbase_ws204,wormbase_ws210, wormbase_ws220,wormbase_ws225 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'ensembl': tool- data/shared/ensembl/ensembl_sites.txt galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'ensembl_data_url': tool- data/shared/ensembl/ensembl_sites_data_URL.txt galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'igv': tool- data/shared/igv/igv_build_sites.txt galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'rviewer': tool- data/shared/rviewer/rviewer_build_sites.txt . . .
  58. Load data tables galaxy.tools.data DEBUG 2015-07-06 06:46:13,766 Loaded tool data

    table 'all_fasta' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'bfast_indexes' galaxy.tools.data WARNING 2015-07-06 06:46:13,767 Cannot find index file 'tool-data/blastdb_p.loc' for tool data table 'blastdb_p' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'blastdb_p' galaxy.tools.data WARNING 2015-07-06 06:46:13,767 Cannot find index file 'tool-data/bwa_index.loc' for tool data table 'bwa_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'bwa_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'bwa_indexes_color' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'indexed_maf_files' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'ngs_sim_fasta' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'perm_base_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'perm_color_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,768 Loaded tool data table 'picard_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,768 Loaded tool data table 'srma_indexes' . . .
  59. Read job configuration file galaxy.jobs WARNING 2015-07-06 06:46:14,022 Job configuration

    "./job_conf.xml" does not exist, using legacy job configuration from Galaxy config file "/home/nate/galaxy-stable/config/galaxy.ini.sample" instead galaxy.jobs DEBUG 2015-07-06 06:46:14,022 Loading job configuration from /home/nate/galaxy-stable/config/galaxy. ini.sample galaxy.jobs DEBUG 2015-07-06 06:46:14,022 Done loading job configuration * config/job_conf.xml will automatically be read if created, see job_conf.xml.sample_advanced for fully documented examples of all possible configurations
  60. Load tools galaxy.tools.toolbox.base INFO 2015-07-06 06:46:14,127 Parsing the tool configuration

    ./config/tool_conf.xml.sample galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,132 Loaded tool id: upload1, version: 1.1.4 into tool panel.. galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,133 Loaded tool id: ucsc_table_direct1, version: 1.0.0 into tool panel.. . . . galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,276 Loaded tool id: qual_stats_boxplot, version: 1.0.0 into tool panel.. galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,278 Loaded tool id: vcf_to_maf_customtrack1, version: 1.0.0 into tool panel.. galaxy.tools.toolbox.base INFO 2015-07-06 06:46:14,278 Parsing the tool configuration ./config/shed_tool_conf.xml galaxy.tools.toolbox.base INFO 2015-07-06 06:46:14,278 Parsing the tool configuration ./config/migrated_tools_conf. xml galaxy.tools.search DEBUG 2015-07-06 06:46:14,304 Starting to build toolbox index. galaxy.tools.search DEBUG 2015-07-06 06:46:14,890 Toolbox index finished. * shed_tool_conf.xml is empty on the first run so only tools provided with Galaxy are loaded, but after tools are installed from the Tool Shed, they will load here
  61. Associate display apps with datatypes galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,894 Loaded

    display application 'ucsc_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,895 Loaded display application 'ensembl_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,912 Loaded display application 'igv_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,912 Loaded display application 'igb_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,912 Loaded display application 'igb_bed' for datatype 'bed', inherit=False. . . . galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,976 Adding inherited display application 'ensembl_gff' to datatype 'gtf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,976 Adding inherited display application 'igv_gff' to datatype 'gtf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,976 Adding inherited display application 'rviewer_interval' to datatype 'bedstrict'
  62. Load implicit datatype converters galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,978 Loaded converter:

    CONVERTER_Bam_Bai_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,979 Loaded converter: CONVERTER_bam_to_bigwig_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,979 Loaded converter: CONVERTER_bed_to_gff_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,980 Loaded converter: CONVERTER_bed_to_bgzip_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,980 Loaded converter: CONVERTER_bed_to_tabix_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,981 Loaded converter: CONVERTER_bed_gff_or_vcf_to_bigwig_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,981 Loaded converter: CONVERTER_bed_to_fli_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,982 Loaded converter: CONVERTER_bedgraph_to_bigwig galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,982 Loaded converter: CONVERTER_len_to_linecount . . . galaxy.datatypes.registry DEBUG 2015-07-06 06:46:15,006 Loaded external metadata tool: __SET_METADATA__ galaxy.tools.imp_exp DEBUG 2015-07-06 06:46:15,008 Loaded history export tool: __EXPORT_HISTORY__ galaxy.tools.imp_exp DEBUG 2015-07-06 06:46:15,008 Loaded history import tool: __IMPORT_HISTORY__ * A few internal operations are defined as tools to allow them to run via Galaxy’ s job system, and are loaded here as well
  63. Load visualization plugins galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,009 VisualizationsRegistry, loaded plugin:

    charts galaxy.visualization.registry INFO 2015-07-06 06:46:15,010 Visualizations plugin disabled: Circster. Skipping... galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,011 VisualizationsRegistry, loaded plugin: graphview galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,011 VisualizationsRegistry, loaded plugin: phyloviz galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,011 VisualizationsRegistry, loaded plugin: scatterplot galaxy.visualization.registry INFO 2015-07-06 06:46:15,012 Visualizations plugin disabled: Sweepster. Skipping... galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,012 VisualizationsRegistry, loaded plugin: trackster
  64. Initialize job handlers galaxy.jobs.manager DEBUG 2015-07-06 06:46:15,022 Starting job handler

    galaxy.jobs INFO 2015-07-06 06:46:15,022 Handler 'main' will load all configured runner plugins galaxy.jobs.runners.state_handler_factory DEBUG 2015-07-06 06:46:15,024 Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit galaxy.jobs.runners DEBUG 2015-07-06 06:46:15,024 Starting 5 LocalRunner workers galaxy.jobs DEBUG 2015-07-06 06:46:15,025 Loaded job runner 'galaxy.jobs.runners.local:LocalJobRunner' as 'local' galaxy.jobs.runners.state_handler_factory DEBUG 2015-07-06 06:46:15,049 Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit galaxy.jobs.runners DEBUG 2015-07-06 06:46:15,049 Starting 3 LWRRunner workers galaxy.jobs.runners.lwr_client.manager INFO 2015-07-06 06:46:15,049 Setting LWR client class to standard, non- caching variant. galaxy.jobs DEBUG 2015-07-06 06:46:15,049 Loaded job runner 'galaxy.jobs.runners.lwr:LwrJobRunner' as 'lwr' galaxy.jobs DEBUG 2015-07-06 06:46:15,050 Legacy destination with id 'local:///', url 'local:///' converted, got params: galaxy.jobs.handler DEBUG 2015-07-06 06:46:15,050 Loaded job runners plugins: lwr:local galaxy.jobs.handler INFO 2015-07-06 06:46:15,050 job handler stop queue started galaxy.jobs.handler INFO 2015-07-06 06:46:15,057 job handler queue started
  65. Initialize web controllers galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,300 Enabling 'admin_toolshed' controller,

    class: AdminGalaxy galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,300 Enabling 'admin_toolshed' controller, class: AdminToolshed galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,302 Enabling 'biostar' controller, class: BiostarController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,389 Enabling 'cloudlaunch' controller, class: CloudController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,389 Enabling 'error' controller, class: Error galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,407 Enabling 'forms' controller, class: Forms galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,414 Enabling 'history' controller, class: HistoryController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,432 Enabling 'library' controller, class: Library galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,433 Enabling 'library_admin' controller, class: LibraryAdmin galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,433 Enabling 'library_common' controller, class: LibraryCommon . . . galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,564 Enabling 'tool_data' API controller, class: ToolData galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,567 Enabling 'tools' API controller, class: ToolsController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,569 Enabling 'workflows' API controller, class: WorkflowsAPIController
  66. Load WSGI middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,713 Enabling 'httpexceptions' middleware

    galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,714 Enabling 'recursive' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,749 Enabling 'eval exceptions' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,750 Enabling 'trans logger' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,750 Enabling 'x-forwarded-host' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,750 Enabling 'Request ID' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,752 added url, path to static middleware: /plugins/visualizations/charts/static, ./config/plugins/visualizations/charts/static galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,752 added url, path to static middleware: /plugins/visualizations/graphview/static, ./config/plugins/visualizations/graphview/static galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,752 added url, path to static middleware: /plugins/visualizations/scatterplot/static, ./config/plugins/visualizations/scatterplot/static galaxy.queue_worker INFO 2015-07-06 06:46:15,753 Binding and starting galaxy control worker for main Starting server in PID 8192. serving on http://127.0.0.1:8080
  67. serving on http://127.0.0.1:8080

  68. 5. A Galaxy request

  69. Beginning of request - TCP connection from client on port

    80 - WSGI server is responsible for picking up the connection, parsing HTTP headers, and reformatting them into a dictionary according to the WSGI spec, this dict is called “environ” - In a default Galaxy install this is currently Paste#http
  70. Middleware - The WSGI interface is based around function calls:

    def app( environ, start_response ): … - Middleware act as filters, modify the environ and then pass through to the next webapp - Galaxy uses several middleware components galaxy.webapps.galaxy.buildapp#wrap_in_middleware error handling, logging, proxy hostname, debug, static, ...
  71. WebApplication (galaxy.web.framework.base#WebApplication) - Galaxy’s custom web framework, shares a lot

    of ideas with Pylons - __call__ method supports the WSGI spec - Takes environ and creates a wrapper object GalaxyWebTransaction -- this is the ubiquitous trans! - Parses path_info from the environment to determine what controller method to call. The mapping approach is implemented by a library called Routes that maps
  72. Routes (galaxy.web.webapps.galaxy.buildapp#paste_app_factory) webapp.add_route('/datasets/:dataset_id/display/{filename:.+?}', controller='dataset', action='display', dataset_id=None, filename=None) URL like /datasets/278043/display:

    matches the route pattern, so: lookup the controller named “dataset” look for a method named “display” that is exposed call it, passing dataset_id and filename as keyword args
  73. Controllers (e.g. galaxy.webapps.galaxy.controllers) Provide endpoints for WebApplication to call. Controller

    methods accept trans as their first argument and additional keyword arguments By default, any <name>.py in the webapps controllers directory is loaded as controller <name> Can return a file or any iterable which will be streamed to the browser Often a helper like self.fill_template is used
  74. API Controllers (e.g. galaxy.webapps.galaxy.controllers.api) Similar to regular controllers, handle routes

    starting with “/api” Should only return JSON All new functionality should be implemented using API controllers, eventually most or all non-API controllers will be eliminated
  75. WSGI Server WebApplication Middleware HTTP / TCP wsgi environ dictionary

    wsgi environ dictionary Controller GalaxyWebTransaction object Managers, models, ...
  76. WSGI Server WebApplication Middleware HTTP / TCP wsgi environ dictionary

    wsgi environ dictionary Controller GalaxyWebTransaction object Managers, models, ... Iterable, Nested iterable, Callable, String, File object, ... Byte iterable or file Stream of bytes
  77. 6. A production Galaxy

  78. By Default SQLite Paste#http Single process Single host Local jobs

    Production PostgreSQL uWSGI/nginx Multiple processes Multiple hosts Cluster jobs usegalaxy.org/production
  79. Paste#http alternatives uWSGI Master process/log control Scales Less painful restarts

    Infinite configurability lib/galaxy/main.py Galaxy without the web stack Useful for job handlers
  80. Cluster support

  81. dedicated cluster Composition of a production Galaxy (usegalaxy.org) Universe View

    VMWare web-01 web-02 db-01 cluster-01 cluster-02 ... cluster-16 slurm rabbitmq Stampede pulsar Corral (DDN via NFS) Rodeo (openstack) slurm instance instance instance instance
  82. web-01 Composition of a production Galaxy (usegalaxy.org) Galaxy View db-01

    uWSGI (8p, 8t) Galaxy main.py (3p) nginx PostgreSQL web-02 uWSGI (8p, 8t) Galaxy main.py (3p) nginx Paste#http installer ProFTPD ProFTPD supervisord supervisord DNS round-robin
  83. Composition of a production Galaxy (usegalaxy.org) Solar System View web-{01,02}

    Galaxy server processes Local filesystem Immutable configs Mutable configs Galaxy web code Corral filesystem Galaxy handler code Logs Shed-installed tools Tool dependencies * Mutable configs distributed from web-01 to web-02 with Ansible Datasets
  84. Q&A