Instrument Data IV. Types of Genome Models V. Processing Profiles VI. Parameters vs. Inputs VII. Builds VIII. Genome Models IX. Other Documentation Resources
preconfigured set of operations and transformations that run on a given set of inputs. • Each model is designed to answer a specific set of questions about your data • Model inputs are often Instrument Data
know? • What do we already know? • As an example, we will find a Sample from a patient common name and get instrument data from that. • What do we mean when we say Sample? • Sample is an entity that we use to represent a collection of cells from a single extraction event of a tissue. (e.g. AML31 tumor, LUC7 tumor, PRC5 normal) • What kinds of things aren’t samples? • Individuals • Libraries
Genome Modeling System? • If you know any distinguishing properties of the sample, such as patient name, you can query for it at the command line. genome sample list patient_common_name~'LUC7%'
Instrument data is an entity in the system that wraps data files containing sequence information (fastqs, bams, etc) with metadata and functionality. • Read length • Read count • Type • Source • Etc.. • Instrument data objects are the interface through which other system entities should access this kind of data
is explicitly mentioned in the command. This means that only Solexa instrument data will be returned. • Additionally, notice that you can use –-show= to specify which fields will be displayed in your terminal. --show='id,flow_cell_id, lane,sample_name, read_length,is_paired_end'
you probably want to know what you can do with it. • You can define a model and assign your data to that model. This will allow you to process, transform, and interrogate your data in a predefined way. • Each different model is designed to answer different questions that you might have about your data.
model picked out (RnaSeq/RefAlign,etc), you will need a processing profile. • What is a processing profile? • A processing profile is a container for parameters, settings, and configuration that a model needs in order to run. • A model has a predefined series of operations to run and the processing profile tells the model how to run each step. • This can take the form of flags that are passed directly through to underlying programs • This can also be more complex entities (strategy strings) that modify the workflow itself.
a model. Meanwhile a model also takes inputs (instrument data for example). What’s the difference? • Parameters dictate the manner in which data is processed. They are fundamental to the identity of a model. • Changing the parameters to a model suggests the need to create a new model by definition. • Inputs are representative of the current available data used in that model. As more data comes in, it can be added to an existing model to update it.
processing profile, now what? • You want to ‘run’ the model to get an answer out. • Each run of a model is called a build and a model can have many builds. • A build contains the output/answers from your model given the inputs at the time it was built. • Possible Metaphors • Class and Instance • Recipe and Cake • Blueprint and Building • Assembly line and Car
of the term Genome Model http://www. ncbi.nlm.nih.gov/projects/genome/assembly/grc/ • Ex: Human Genome Model - build 36 • What exactly is a Genome Model? • Genome Models are an idea conceived to organize common analysis tool chains and software around a central concept. In practice, they usually take the form of a processing profile, a series of processes, and one or more inputs centered on answering a specific set of questions about a genome.