Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Genome Model Basics

gscjim
April 28, 2014

Genome Model Basics

An introduction to basic command line operations for the Genome Modeling System

gscjim

April 28, 2014
Tweet

Other Decks in Science

Transcript

  1. Where Do I Start? • Just type genome and hit

    enter! (then build up commands from there)
  2. Contents I. What is a genome model? II. Samples III.

    Instrument Data IV. Types of Genome Models V. Processing Profiles VI. Parameters vs. Inputs VII. Builds VIII. Genome Models IX. Other Documentation Resources
  3. What is a Genome Model? • A model is a

    preconfigured set of operations and transformations that run on a given set of inputs. • Each model is designed to answer a specific set of questions about your data • Model inputs are often Instrument Data
  4. Samples How do we get instrument data from what we

    know? • What do we already know? • As an example, we will find a Sample from a patient common name and get instrument data from that. • What do we mean when we say Sample? • Sample is an entity that we use to represent a collection of cells from a single extraction event of a tissue. (e.g. AML31 tumor, LUC7 tumor, PRC5 normal) • What kinds of things aren’t samples? • Individuals • Libraries
  5. Samples • How do I find a sample in the

    Genome Modeling System? • If you know any distinguishing properties of the sample, such as patient name, you can query for it at the command line. genome sample list patient_common_name~'LUC7%'
  6. Instrument Data What do we mean by Instrument Data? •

    Instrument data is an entity in the system that wraps data files containing sequence information (fastqs, bams, etc) with metadata and functionality. • Read length • Read count • Type • Source • Etc.. • Instrument data objects are the interface through which other system entities should access this kind of data
  7. Instrument Data • How do I find instrument data? •

    You can use the command line to find data for a sample genome instrument-data list solexa sample_name='H_JG- 1193-S.9723'
  8. Instrument Data • Note in the previous example that solexa

    is explicitly mentioned in the command. This means that only Solexa instrument data will be returned. • Additionally, notice that you can use –-show= to specify which fields will be displayed in your terminal. --show='id,flow_cell_id, lane,sample_name, read_length,is_paired_end'
  9. Instrument Data • Now that you have some instrument data,

    you probably want to know what you can do with it. • You can define a model and assign your data to that model. This will allow you to process, transform, and interrogate your data in a predefined way. • Each different model is designed to answer different questions that you might have about your data.
  10. Types of Genome Models • Reference Alignment • RNASeq •

    Somatic Variation • Somatic Validation • Genotype Microarray • Phenotype Correlation • ClinSeq • DeNovo Assembly
  11. Processing Profiles Now that you have data (inputs) and a

    model picked out (RnaSeq/RefAlign,etc), you will need a processing profile. • What is a processing profile? • A processing profile is a container for parameters, settings, and configuration that a model needs in order to run. • A model has a predefined series of operations to run and the processing profile tells the model how to run each step. • This can take the form of flags that are passed directly through to underlying programs • This can also be more complex entities (strategy strings) that modify the workflow itself.
  12. Params vs Inputs • A processing profile supplies parameters to

    a model. Meanwhile a model also takes inputs (instrument data for example). What’s the difference? • Parameters dictate the manner in which data is processed. They are fundamental to the identity of a model. • Changing the parameters to a model suggests the need to create a new model by definition. • Inputs are representative of the current available data used in that model. As more data comes in, it can be added to an existing model to update it.
  13. Builds You’ve set up a model with inputs and a

    processing profile, now what? • You want to ‘run’ the model to get an answer out. • Each run of a model is called a build and a model can have many builds. • A build contains the output/answers from your model given the inputs at the time it was built. • Possible Metaphors • Class and Instance • Recipe and Cake • Blueprint and Building • Assembly line and Car
  14. Genome Models • What is the origin of our use

    of the term Genome Model http://www. ncbi.nlm.nih.gov/projects/genome/assembly/grc/ • Ex: Human Genome Model - build 36 • What exactly is a Genome Model? • Genome Models are an idea conceived to organize common analysis tool chains and software around a central concept. In practice, they usually take the form of a processing profile, a series of processes, and one or more inputs centered on answering a specific set of questions about a genome.