Slide 1

Slide 1 text

Galaxy Architecture 2015 Nate, James @natefoo / @jxtx / #usegalaxy

Slide 2

Slide 2 text

Please Interrupt! We’re here to answer your question about Galaxy architecture!

Slide 3

Slide 3 text

0. Getting Involved in Galaxy

Slide 4

Slide 4 text

IRC: irc.freenode.net#galaxyproject GitHub: github.com/galaxyproject Trello: https://trello.com/b/75c1kASa/ Twitter: #usegalaxy, @galaxyproject

Slide 5

Slide 5 text

Contributing All Galaxy development has moved to GitHub over the last year New official contribution guidelines: https://github. com/galaxyproject/galaxy/blob/dev/CONTRIBUTING.md

Slide 6

Slide 6 text

1. The family of /galaxyproject projects

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

github.com/galaxyproject/ galaxy The main Galaxy application. Web interface, database model, job running, etc. Also includes other web applications including the ToolShed and Reports.

Slide 9

Slide 9 text

github.com/galaxyproject/ cloudman Galaxy CloudMan, framework and interface to orchestrate clusters on infrastructure clouds. Main component of Galaxy cloud images github.com/galaxyproject/ cloudlaunch CloudLaunch web application to make it easy to launch images on a cloud, drives http: //launch.usegalaxy.org

Slide 10

Slide 10 text

github.com/galaxyproject/ tools-{devteam,iuc} Galaxy tools maintained by devteam (the PSU/Hopkins group) and iuc (the “Intergalactic Utilities Commission”). A variety of tools, generally of high quality, including the core tools for Galaxy main. Demonstrates current tool development best practices - development on github and then deployed to test/main ToolSheds.

Slide 11

Slide 11 text

github.com/galaxyproject/ docker-build Build Galaxy Tool dependencies for the ToolShed in Docker containers Build Galaxy framework dependencies as Python wheels

Slide 12

Slide 12 text

github.com/galaxyproject/ planemo Command line utilities to assist in the development of Galaxy tools. Linting, testing, deploying to ToolSheds… The best practice approach for Galaxy tool development! github.com/galaxyproject/ planemo-machine Builds Galaxy environments for Galaxy tool development including Docker containers, virtual machines, Google compute images...

Slide 13

Slide 13 text

github.com/galaxyproject/ {ansible-*, *-playbook} Ansible components to automate almost every aspect of Galaxy installation and maintenance. Ansible is an advanced configuration management system These playbooks are used to maintain Galaxy main, cloud images, virtual machines, ...

Slide 14

Slide 14 text

github.com/galaxyproject/ pulsar Distributed job execution engine for Galaxy. Allows staging data, scripts, configuration. Can run jobs on Windows machines. Can act as its own queueing system or access an existing cluster DRM.

Slide 15

Slide 15 text

github.com/galaxyproject/ bioblend Official Python client for the Galaxy and CloudMan APIs.

Slide 16

Slide 16 text

2. Galaxy app architecture

Slide 17

Slide 17 text

Client: Javascript, BackboneJS models, source in `client` directory (HTTP) Galaxy API: RESTful HTTP API for accessing and controlling the Galaxy application Managers: Manage resources in the scope of a user transaction, abstract most logic out of controllers Models: encapsulate all persistent state (except for raw data) – users, metadata, workflows, … persisted to relational database (ideally postgres) using SQLAlchemy layer Execution: Getting jobs run. Managing possibly a wide variety of different execution engines behind a single Galaxy instance.

Slide 18

Slide 18 text

Browser Server Universe App… controllers controllers.api HTML on the wire, typically from mako JSON on the wire Renderer + progressive JS Backbone.js MVC on browser The old way The new way

Slide 19

Slide 19 text

The old way User stuff (prefs, etc) Tool forms Reports Tool shed *Many of these have an API but it is not yet used by the UI The new way Visualizations History Tool menu Most grids In between Workflows Data Libraries

Slide 20

Slide 20 text

The old way User stuff (prefs, etc) Tool forms Reports Tool shed *Many of these have an API but it is not yet used by the UI The new way Visualizations History Tool menu Most grids In between Workflows Data Libraries (beta)

Slide 21

Slide 21 text

galaxy.util. pastescript.serve Middleware Stack galaxy.webapps.galaxy. GalaxyWebApplication WSGI Web Server galaxy.app. UniverseApplication Toolbox Job Creation Model Datatypes Reg. ...

Slide 22

Slide 22 text

galaxy.main Job Handlers Job Creation Job Handlers Job Handlers Job Runner galaxy.main <>

Slide 23

Slide 23 text

3. Galaxy components and object model

Slide 24

Slide 24 text

Galaxy data model is not database entity driven Entities are defined in galaxy.model as objects SQLAlchemy is used for object relation mapping Mappings are defined in galaxy.model.mapping in two parts — a table definition and a mapping between objects and tables including relationships Migrations allow the schema to be migrated forward automatically

Slide 25

Slide 25 text

https://wiki.galaxyproject.org/Admin/Internals/DataModel

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Core components Dataset

Slide 28

Slide 28 text

Core components: run analysis Dataset History History Dataset Association Metadata Tool Job

Slide 29

Slide 29 text

Metadata Structured data Different keys/types for different datatypes Can be used by tools to dynamically control the tool form

Slide 30

Slide 30 text

Core components: workflow Workflow Step Workflow Stored Workflow Workflow Module (Tool Module, Input Dataset, etc)

Slide 31

Slide 31 text

Core components: workflow run Workflow Step Workflow Stored Workflow Workflow Module (Tool Module, Input Dataset, etc) WF Step Invocation WF Invocation Tool Job Dataset

Slide 32

Slide 32 text

Data Libraries LibraryDataset LibraryFolder Library LDDA Dataset

Slide 33

Slide 33 text

Data Libraries: Permissions LibraryDataset LibraryFolder Library Dataset LDDA LibraryPermission Role User / Group action

Slide 34

Slide 34 text

Reference data “cache” Galaxy Data Managers Data files Tool Data Tables Location files Tools

Slide 35

Slide 35 text

Visualization Plugins Adding new visualizations to a Galaxy instance - Configuration file (XML) - Base template (Mako) - Additional static data if needed (CSS, JS, …)

Slide 36

Slide 36 text

Visualization Plugins: Charts

Slide 37

Slide 37 text

Visualization Plugins: Data How do I efficiently access data for my viz? - Framework provides direct link to read the raw dataset - or use Data providers - In config, assert that visualization requires a given type of data providers - Data providers process data before sending to browser. Slice, filter, reformat, ...

Slide 38

Slide 38 text

Interactive Environments Galaxy side is identical to interactive environments: config and base template - Within the base template, launch a Docker container running a web accessible process - Build a UI that accesses that process through a proxy

Slide 39

Slide 39 text

Dataset Collections Hundreds or thousands of similar datasets are unwieldy, how do you get a handle on them? - Group datasets into a single unit - Perform complex operations on that unit - Operations are performed on each group element - Output of each operation is a new group

Slide 40

Slide 40 text

Individual Datasets Collection Collection Contents

Slide 41

Slide 41 text

Map/reduce in workflows

Slide 42

Slide 42 text

Histories HDA History Dataset

Slide 43

Slide 43 text

Dataset Collections DC Element DatasetCollection History Dataset LDDA HDA LibraryFolder LDCA HDCA

Slide 44

Slide 44 text

>>> fh = open( dataset.file_path, 'w' ) >>> fh.write( ‘foo’ ) >>> fh.close() >>> fh = open( dataset.file_path, ‘r’ ) >>> fh.read() >>> update_from_file( dataset, file_name=‘foo.txt’ ) >>> get_data( dataset ) >>> get_data( dataset, start=42, count=4096 ) Object Store

Slide 45

Slide 45 text

Object Store Nested Store Disk Store Data Consumer Object Store Distributed Hierarchical Caching Store S3/Swift IRODS

Slide 46

Slide 46 text

4. Galaxy startup

Slide 47

Slide 47 text

Pre-startup nate@weyerbacher% git clone https://github.com/galaxyproject/galaxy.git galaxy-stable ~ Cloning into 'galaxy-stable'... remote: Counting objects: 173809, done. remote: Total 173809 (delta 0), reused 0 (delta 0), pack-reused 173809 Receiving objects: 100% (173809/173809), 55.18 MiB | 11.08 MiB/s, done. Resolving deltas: 100% (137885/137885), done. Checking connectivity... done. nate@weyerbacher% cd galaxy-stable ~ nate@weyerbacher% git checkout -b master origin/master ~/galaxy-stable Branch master set up to track remote branch master from origin. Switched to a new branch 'master' nate@weyerbacher% ~/galaxy-stable

Slide 48

Slide 48 text

First run: Initialize configs nate@weyerbacher% sh run.sh ~/galaxy-stable Initializing config/migrated_tools_conf.xml from migrated_tools_conf.xml.sample Initializing config/shed_tool_conf.xml from shed_tool_conf.xml.sample Initializing config/shed_tool_data_table_conf.xml from shed_tool_data_table_conf.xml.sample Initializing config/shed_data_manager_conf.xml from shed_data_manager_conf.xml.sample Initializing lib/tool_shed/scripts/bootstrap_tool_shed/user_info.xml from user_info.xml.sample Initializing tool-data/shared/ucsc/builds.txt from builds.txt.sample Initializing tool-data/shared/ucsc/ucsc_build_sites.txt from ucsc_build_sites.txt.sample . . . Initializing tool-data/sift_db.loc from sift_db.loc.sample Initializing tool-data/srma_index.loc from srma_index.loc.sample Initializing tool-data/twobit.loc from twobit.loc.sample Initializing static/welcome.html from welcome.html.sample * There are many more configs in the configs/ directory, used in their “.sample” state. shed_* are mutable configs. Start hacking with: cp galaxy.ini.sample galaxy.ini

Slide 49

Slide 49 text

First run: Fetch eggs Some eggs are out of date, attempting to fetch... Fetched http://eggs.galaxyproject.org/Mako/Mako-0.4.1-py2.7.egg Fetched http://eggs.galaxyproject.org/repoze.lru/repoze.lru-0.6-py2.7.egg Fetched http://eggs.galaxyproject.org/pycrypto/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg Fetched http://eggs.galaxyproject.org/boto/boto-2.27.0-py2.7.egg Fetched http://eggs.galaxyproject.org/Paste/Paste-1.7.5.1-py2.7.egg Fetched http://eggs.galaxyproject.org/wsgiref/wsgiref-0.1.2-py2.7.egg . . . Fetched http://eggs.galaxyproject.org/pytz/pytz-2013.9-py2.7.egg Fetched http://eggs.galaxyproject.org/nose/nose-0.11.1-py2.7.egg Fetched http://eggs.galaxyproject.org/requests/requests-2.2.1-py2.7.egg Fetched http://eggs.galaxyproject.org/anyjson/anyjson-0.3.3-py2.7.egg Fetched http://eggs.galaxyproject.org/WebError/WebError-0.8a-py2.7.egg Fetched http://eggs.galaxyproject.org/twill/twill-0.9-py2.7.egg Fetch successful. * As a result of this year’ s GCC Hackathon, eggs will soon be replaced by wheels, and will be installed the usual Pythonic way (with pip)

Slide 50

Slide 50 text

Every run: Load the application python path is: /home/nate/galaxy-stable/eggs/Babel-1.3-py2.7.egg, /home/nate/galaxy-stable/eggs/pytz-2013.9-py2.7. egg, . . . /home/nate/galaxy-stable/eggs/Paste-1.7.5.1-py2.7.egg, /home/nate/galaxy-stable/lib, /home/nate/. venvburrito/lib/python2.7/site-packages/pip-1.4.1-py2.7.egg, /home/nate/.venvburrito/lib/python2.7/site- packages/setuptools-12.3-py2.7.egg, /home/nate/.venvburrito/lib/python2.7/site-packages, /usr/lib/python2.7, /usr/lib/python2.7/plat-x86_64-linux-gnu, /usr/lib/python2.7/lib-tk, /usr/lib/python2.7/lib-old, /usr/lib/python2. 7/lib-dynload, /usr/local/lib/python2.7/dist-packages, /usr/lib/python2.7/dist-packages/gtk-2.0, /usr/lib/python2. 7/dist-packages galaxy.queue_worker INFO 2015-07-06 06:44:54,638 Initalizing main Galaxy Queue Worker on sqlalchemy+sqlite:///. /database/control.sqlite?isolation_level=IMMEDIATE tool_shed.tool_shed_registry DEBUG 2015-07-06 06:44:54,660 Loading references to tool sheds from . /config/tool_sheds_conf.xml.sample tool_shed.tool_shed_registry DEBUG 2015-07-06 06:44:54,660 Loaded reference to tool shed: Galaxy Main Tool Shed galaxy.app DEBUG 2015-07-06 06:44:54,660 Using "galaxy.ini" config file: /home/nate/galaxy-stable/config/galaxy. ini.sample

Slide 51

Slide 51 text

Every run: Load database migrations migrate.versioning.repository DEBUG 2015-07-06 06:44:54,742 Loading repository lib/galaxy/model/migrate... migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,743 Loading script lib/galaxy/model/migrate/versions/0001_initial_tables.py... migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,743 Script lib/galaxy/model/migrate/versions/0001_initial_tables.py loaded successfully migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,743 Loading script lib/galaxy/model/migrate/versions/0002_metadata_file_table.py... . . . migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,752 Script lib/galaxy/model/migrate/versions/0128_session_timeout.py loaded successfully migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,752 Loading script lib/galaxy/model/migrate/versions/0129_job_external_output_metadata_validity.py... migrate.versioning.script.base DEBUG 2015-07-06 06:44:54,752 Script lib/galaxy/model/migrate/versions/0129_job_external_output_metadata_validity.py loaded successfully migrate.versioning.repository DEBUG 2015-07-06 06:44:54,752 Repository lib/galaxy/model/migrate loaded successfully galaxy.model.migrate.check DEBUG 2015-07-06 06:44:54,754 pysqlite>=2 egg successfully loaded for sqlite dialect

Slide 52

Slide 52 text

First run: Initialize database galaxy.model.migrate.check INFO 2015-07-06 06:44:54,770 No database, initializing galaxy.model.migrate.check INFO 2015-07-06 06:44:54,851 Migrating 0 -> 1... galaxy.model.migrate.check INFO 2015-07-06 06:44:56,383 galaxy.model.migrate.check INFO 2015-07-06 06:44:56,383 Migrating 1 -> 2... galaxy.model.migrate.check INFO 2015-07-06 06:44:56,559 . . . galaxy.model.migrate.check INFO 2015-07-06 06:46:12,074 Migrating 127 -> 128... galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 Migration script to add session update time (used for timeouts) galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 galaxy.model.migrate.check INFO 2015-07-06 06:46:12,834 Migrating 128 -> 129... galaxy.model.migrate.check INFO 2015-07-06 06:46:13,635 galaxy.model.migrate.check INFO 2015-07-06 06:46:13,635 Migration script to allow invalidation of job external output metadata temp files galaxy.model.migrate.check INFO 2015-07-06 06:46:13,636

Slide 53

Slide 53 text

Everything after here happens every time

Slide 54

Slide 54 text

Load tool migrations migrate.versioning.repository DEBUG 2015-07-06 06:46:13,646 Loading repository lib/tool_shed/galaxy_install/migrate... migrate.versioning.script.base DEBUG 2015-07-06 06:46:13,646 Loading script lib/tool_shed/galaxy_install/migrate/versions/0001_tools.py... . . . migrate.versioning.script.base DEBUG 2015-07-06 06:46:13,647 Loading script lib/tool_shed/galaxy_install/migrate/versions/0012_tools.py... migrate.versioning.script.base DEBUG 2015-07-06 06:46:13,647 Script lib/tool_shed/galaxy_install/migrate/versions/0012_tools.py loaded successfully migrate.versioning.repository DEBUG 2015-07-06 06:46:13,647 Repository lib/tool_shed/galaxy_install/migrate loaded successfully tool_shed.galaxy_install.migrate.check DEBUG 2015-07-06 06:46:13,649 pysqlite>=2 egg successfully loaded for sqlite dialect tool_shed.galaxy_install.migrate.check DEBUG 2015-07-06 06:46:13,663 The main Galaxy tool shed is not currently available, so skipped tool migration 1 until next server startup galaxy.model.orm DEBUG 2015-07-06 06:46:13,665 pysqlite>=2 egg successfully loaded for sqlite dialect galaxy.config INFO 2015-07-06 06:46:13,675 Install database targetting Galaxy's database configuration.

Slide 55

Slide 55 text

Load datatypes galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Loading datatypes from ./config/datatypes_conf.xml.sample galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.binary from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.assembly from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.text from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.data from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.binary from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.sequence from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.tabular from the datatype registry. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,724 Retrieved datatype module galaxy.datatypes.binary from the datatype registry. . . .

Slide 56

Slide 56 text

Load datatype sniffers galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.tabular:Vcf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.binary:Bam' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.binary:Sff' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.binary:Sra' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.xml:Phyloxml' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.xml:Owl' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence:Maf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence:Lav' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence: Fasta' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,733 Loaded sniffer for datatype 'galaxy.datatypes.sequence: Fastq' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,734 Loaded sniffer for datatype 'galaxy.datatypes.images:Html' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,734 Loaded sniffer for datatype 'galaxy.datatypes.images:Pdf' . . .

Slide 57

Slide 57 text

Load build (display) sites galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'ucsc': tool- data/shared/ucsc/ucsc_build_sites.txt with display sites: main,test,archaea,ucla galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'gbrowse': tool- data/shared/gbrowse/gbrowse_build_sites.txt with display sites: modencode,sgd_yeast,tair,wormbase,wormbase_ws120, wormbase_ws140,wormbase_ws170,wormbase_ws180,wormbase_ws190,wormbase_ws200,wormbase_ws204,wormbase_ws210, wormbase_ws220,wormbase_ws225 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'ensembl': tool- data/shared/ensembl/ensembl_sites.txt galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'ensembl_data_url': tool- data/shared/ensembl/ensembl_sites_data_URL.txt galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'igv': tool- data/shared/igv/igv_build_sites.txt galaxy.datatypes.registry DEBUG 2015-07-06 06:46:13,735 Loaded build site 'rviewer': tool- data/shared/rviewer/rviewer_build_sites.txt . . .

Slide 58

Slide 58 text

Load data tables galaxy.tools.data DEBUG 2015-07-06 06:46:13,766 Loaded tool data table 'all_fasta' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'bfast_indexes' galaxy.tools.data WARNING 2015-07-06 06:46:13,767 Cannot find index file 'tool-data/blastdb_p.loc' for tool data table 'blastdb_p' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'blastdb_p' galaxy.tools.data WARNING 2015-07-06 06:46:13,767 Cannot find index file 'tool-data/bwa_index.loc' for tool data table 'bwa_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'bwa_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'bwa_indexes_color' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'indexed_maf_files' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'ngs_sim_fasta' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'perm_base_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,767 Loaded tool data table 'perm_color_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,768 Loaded tool data table 'picard_indexes' galaxy.tools.data DEBUG 2015-07-06 06:46:13,768 Loaded tool data table 'srma_indexes' . . .

Slide 59

Slide 59 text

Read job configuration file galaxy.jobs WARNING 2015-07-06 06:46:14,022 Job configuration "./job_conf.xml" does not exist, using legacy job configuration from Galaxy config file "/home/nate/galaxy-stable/config/galaxy.ini.sample" instead galaxy.jobs DEBUG 2015-07-06 06:46:14,022 Loading job configuration from /home/nate/galaxy-stable/config/galaxy. ini.sample galaxy.jobs DEBUG 2015-07-06 06:46:14,022 Done loading job configuration * config/job_conf.xml will automatically be read if created, see job_conf.xml.sample_advanced for fully documented examples of all possible configurations

Slide 60

Slide 60 text

Load tools galaxy.tools.toolbox.base INFO 2015-07-06 06:46:14,127 Parsing the tool configuration ./config/tool_conf.xml.sample galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,132 Loaded tool id: upload1, version: 1.1.4 into tool panel.. galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,133 Loaded tool id: ucsc_table_direct1, version: 1.0.0 into tool panel.. . . . galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,276 Loaded tool id: qual_stats_boxplot, version: 1.0.0 into tool panel.. galaxy.tools.toolbox.base DEBUG 2015-07-06 06:46:14,278 Loaded tool id: vcf_to_maf_customtrack1, version: 1.0.0 into tool panel.. galaxy.tools.toolbox.base INFO 2015-07-06 06:46:14,278 Parsing the tool configuration ./config/shed_tool_conf.xml galaxy.tools.toolbox.base INFO 2015-07-06 06:46:14,278 Parsing the tool configuration ./config/migrated_tools_conf. xml galaxy.tools.search DEBUG 2015-07-06 06:46:14,304 Starting to build toolbox index. galaxy.tools.search DEBUG 2015-07-06 06:46:14,890 Toolbox index finished. * shed_tool_conf.xml is empty on the first run so only tools provided with Galaxy are loaded, but after tools are installed from the Tool Shed, they will load here

Slide 61

Slide 61 text

Associate display apps with datatypes galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,894 Loaded display application 'ucsc_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,895 Loaded display application 'ensembl_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,912 Loaded display application 'igv_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,912 Loaded display application 'igb_bam' for datatype 'bam', inherit=False. galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,912 Loaded display application 'igb_bed' for datatype 'bed', inherit=False. . . . galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,976 Adding inherited display application 'ensembl_gff' to datatype 'gtf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,976 Adding inherited display application 'igv_gff' to datatype 'gtf' galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,976 Adding inherited display application 'rviewer_interval' to datatype 'bedstrict'

Slide 62

Slide 62 text

Load implicit datatype converters galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,978 Loaded converter: CONVERTER_Bam_Bai_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,979 Loaded converter: CONVERTER_bam_to_bigwig_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,979 Loaded converter: CONVERTER_bed_to_gff_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,980 Loaded converter: CONVERTER_bed_to_bgzip_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,980 Loaded converter: CONVERTER_bed_to_tabix_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,981 Loaded converter: CONVERTER_bed_gff_or_vcf_to_bigwig_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,981 Loaded converter: CONVERTER_bed_to_fli_0 galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,982 Loaded converter: CONVERTER_bedgraph_to_bigwig galaxy.datatypes.registry DEBUG 2015-07-06 06:46:14,982 Loaded converter: CONVERTER_len_to_linecount . . . galaxy.datatypes.registry DEBUG 2015-07-06 06:46:15,006 Loaded external metadata tool: __SET_METADATA__ galaxy.tools.imp_exp DEBUG 2015-07-06 06:46:15,008 Loaded history export tool: __EXPORT_HISTORY__ galaxy.tools.imp_exp DEBUG 2015-07-06 06:46:15,008 Loaded history import tool: __IMPORT_HISTORY__ * A few internal operations are defined as tools to allow them to run via Galaxy’ s job system, and are loaded here as well

Slide 63

Slide 63 text

Load visualization plugins galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,009 VisualizationsRegistry, loaded plugin: charts galaxy.visualization.registry INFO 2015-07-06 06:46:15,010 Visualizations plugin disabled: Circster. Skipping... galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,011 VisualizationsRegistry, loaded plugin: graphview galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,011 VisualizationsRegistry, loaded plugin: phyloviz galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,011 VisualizationsRegistry, loaded plugin: scatterplot galaxy.visualization.registry INFO 2015-07-06 06:46:15,012 Visualizations plugin disabled: Sweepster. Skipping... galaxy.web.base.pluginframework INFO 2015-07-06 06:46:15,012 VisualizationsRegistry, loaded plugin: trackster

Slide 64

Slide 64 text

Initialize job handlers galaxy.jobs.manager DEBUG 2015-07-06 06:46:15,022 Starting job handler galaxy.jobs INFO 2015-07-06 06:46:15,022 Handler 'main' will load all configured runner plugins galaxy.jobs.runners.state_handler_factory DEBUG 2015-07-06 06:46:15,024 Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit galaxy.jobs.runners DEBUG 2015-07-06 06:46:15,024 Starting 5 LocalRunner workers galaxy.jobs DEBUG 2015-07-06 06:46:15,025 Loaded job runner 'galaxy.jobs.runners.local:LocalJobRunner' as 'local' galaxy.jobs.runners.state_handler_factory DEBUG 2015-07-06 06:46:15,049 Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit galaxy.jobs.runners DEBUG 2015-07-06 06:46:15,049 Starting 3 LWRRunner workers galaxy.jobs.runners.lwr_client.manager INFO 2015-07-06 06:46:15,049 Setting LWR client class to standard, non- caching variant. galaxy.jobs DEBUG 2015-07-06 06:46:15,049 Loaded job runner 'galaxy.jobs.runners.lwr:LwrJobRunner' as 'lwr' galaxy.jobs DEBUG 2015-07-06 06:46:15,050 Legacy destination with id 'local:///', url 'local:///' converted, got params: galaxy.jobs.handler DEBUG 2015-07-06 06:46:15,050 Loaded job runners plugins: lwr:local galaxy.jobs.handler INFO 2015-07-06 06:46:15,050 job handler stop queue started galaxy.jobs.handler INFO 2015-07-06 06:46:15,057 job handler queue started

Slide 65

Slide 65 text

Initialize web controllers galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,300 Enabling 'admin_toolshed' controller, class: AdminGalaxy galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,300 Enabling 'admin_toolshed' controller, class: AdminToolshed galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,302 Enabling 'biostar' controller, class: BiostarController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,389 Enabling 'cloudlaunch' controller, class: CloudController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,389 Enabling 'error' controller, class: Error galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,407 Enabling 'forms' controller, class: Forms galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,414 Enabling 'history' controller, class: HistoryController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,432 Enabling 'library' controller, class: Library galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,433 Enabling 'library_admin' controller, class: LibraryAdmin galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,433 Enabling 'library_common' controller, class: LibraryCommon . . . galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,564 Enabling 'tool_data' API controller, class: ToolData galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,567 Enabling 'tools' API controller, class: ToolsController galaxy.web.framework.base DEBUG 2015-07-06 06:46:15,569 Enabling 'workflows' API controller, class: WorkflowsAPIController

Slide 66

Slide 66 text

Load WSGI middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,713 Enabling 'httpexceptions' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,714 Enabling 'recursive' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,749 Enabling 'eval exceptions' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,750 Enabling 'trans logger' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,750 Enabling 'x-forwarded-host' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,750 Enabling 'Request ID' middleware galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,752 added url, path to static middleware: /plugins/visualizations/charts/static, ./config/plugins/visualizations/charts/static galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,752 added url, path to static middleware: /plugins/visualizations/graphview/static, ./config/plugins/visualizations/graphview/static galaxy.webapps.galaxy.buildapp DEBUG 2015-07-06 06:46:15,752 added url, path to static middleware: /plugins/visualizations/scatterplot/static, ./config/plugins/visualizations/scatterplot/static galaxy.queue_worker INFO 2015-07-06 06:46:15,753 Binding and starting galaxy control worker for main Starting server in PID 8192. serving on http://127.0.0.1:8080

Slide 67

Slide 67 text

serving on http://127.0.0.1:8080

Slide 68

Slide 68 text

5. A Galaxy request

Slide 69

Slide 69 text

Beginning of request - TCP connection from client on port 80 - WSGI server is responsible for picking up the connection, parsing HTTP headers, and reformatting them into a dictionary according to the WSGI spec, this dict is called “environ” - In a default Galaxy install this is currently Paste#http

Slide 70

Slide 70 text

Middleware - The WSGI interface is based around function calls: def app( environ, start_response ): … - Middleware act as filters, modify the environ and then pass through to the next webapp - Galaxy uses several middleware components galaxy.webapps.galaxy.buildapp#wrap_in_middleware error handling, logging, proxy hostname, debug, static, ...

Slide 71

Slide 71 text

WebApplication (galaxy.web.framework.base#WebApplication) - Galaxy’s custom web framework, shares a lot of ideas with Pylons - __call__ method supports the WSGI spec - Takes environ and creates a wrapper object GalaxyWebTransaction -- this is the ubiquitous trans! - Parses path_info from the environment to determine what controller method to call. The mapping approach is implemented by a library called Routes that maps

Slide 72

Slide 72 text

Routes (galaxy.web.webapps.galaxy.buildapp#paste_app_factory) webapp.add_route('/datasets/:dataset_id/display/{filename:.+?}', controller='dataset', action='display', dataset_id=None, filename=None) URL like /datasets/278043/display: matches the route pattern, so: lookup the controller named “dataset” look for a method named “display” that is exposed call it, passing dataset_id and filename as keyword args

Slide 73

Slide 73 text

Controllers (e.g. galaxy.webapps.galaxy.controllers) Provide endpoints for WebApplication to call. Controller methods accept trans as their first argument and additional keyword arguments By default, any .py in the webapps controllers directory is loaded as controller Can return a file or any iterable which will be streamed to the browser Often a helper like self.fill_template is used

Slide 74

Slide 74 text

API Controllers (e.g. galaxy.webapps.galaxy.controllers.api) Similar to regular controllers, handle routes starting with “/api” Should only return JSON All new functionality should be implemented using API controllers, eventually most or all non-API controllers will be eliminated

Slide 75

Slide 75 text

WSGI Server WebApplication Middleware HTTP / TCP wsgi environ dictionary wsgi environ dictionary Controller GalaxyWebTransaction object Managers, models, ...

Slide 76

Slide 76 text

WSGI Server WebApplication Middleware HTTP / TCP wsgi environ dictionary wsgi environ dictionary Controller GalaxyWebTransaction object Managers, models, ... Iterable, Nested iterable, Callable, String, File object, ... Byte iterable or file Stream of bytes

Slide 77

Slide 77 text

6. A production Galaxy

Slide 78

Slide 78 text

By Default SQLite Paste#http Single process Single host Local jobs Production PostgreSQL uWSGI/nginx Multiple processes Multiple hosts Cluster jobs usegalaxy.org/production

Slide 79

Slide 79 text

Paste#http alternatives uWSGI Master process/log control Scales Less painful restarts Infinite configurability lib/galaxy/main.py Galaxy without the web stack Useful for job handlers

Slide 80

Slide 80 text

Cluster support

Slide 81

Slide 81 text

dedicated cluster Composition of a production Galaxy (usegalaxy.org) Universe View VMWare web-01 web-02 db-01 cluster-01 cluster-02 ... cluster-16 slurm rabbitmq Stampede pulsar Corral (DDN via NFS) Rodeo (openstack) slurm instance instance instance instance

Slide 82

Slide 82 text

web-01 Composition of a production Galaxy (usegalaxy.org) Galaxy View db-01 uWSGI (8p, 8t) Galaxy main.py (3p) nginx PostgreSQL web-02 uWSGI (8p, 8t) Galaxy main.py (3p) nginx Paste#http installer ProFTPD ProFTPD supervisord supervisord DNS round-robin

Slide 83

Slide 83 text

Composition of a production Galaxy (usegalaxy.org) Solar System View web-{01,02} Galaxy server processes Local filesystem Immutable configs Mutable configs Galaxy web code Corral filesystem Galaxy handler code Logs Shed-installed tools Tool dependencies * Mutable configs distributed from web-01 to web-02 with Ansible Datasets

Slide 84

Slide 84 text

Q&A