• Dropout: B.S. Psychology, ISU 1995 • Dropout: B.A. Philosophy, ISU 1995 • Part-time, stipend-funded hobbyist 2006-2010 • University of Nebraska, Omaha • University of Nebraska Medical Center • Dropout: B.S. Bioinformatics, UNO 2010 • http://www.bioperl.org/wiki/Jay_Hannah • Self-taught DB / web developer since 1995. • http://github.com/jhannah Tuesday, June 11, 13
(A, C, G, or T) • Fully extended, the DNA from a single cell would have a total length of almost 6 feet. • All the DNA in your cells could reach the moon ...6000 times! • 24 distinct chromosomes • Estimated 20,000-25,0000 genes http://en.wikipedia.org/wiki/Human_genome http://www.rothamsted.ac.uk/notebook/courses/guide/dnast.htm Tuesday, June 11, 13
different between humans and mice. Only 1% different from chimpazee. [1] • “We share half our genes [DNA] with the banana.” [2] • DNA is the blueprint of ALL life. You grew from a single cell to an adult human. What made you you? Why aren’t you me? Or a chimp? Or a banana tree, a whale shark, plankton, a clover or a giant redwood? • Answer: proteins. 1. Mural, R.J., et al., Science, v. 296, May 31, 2002, p. 1661. 2. May, R., Quoted in Coglan & Boyce, New Scientist 167 (July Tuesday, June 11, 13
N Asparagine Asp D Aspartic acid Cys C Cysteine Gln Q Glutamine Glu E Glutamic acid Gly G Glycine His H Histidine Ile I Isoleucine Leu L Leucine Lys K Lysine Met M Methionine Phe F Phenylalanine Pro P Proline Ser S Serine Thr T Threonine Trp W Tryptophan Tyr Y Tyrosine Val V Valine Tuesday, June 11, 13
But these text manipulation examples are just simple map-based transforms • And the example isn't even terribly relevant — is this even a coding sequence? Tuesday, June 11, 13
But these text manipulation examples are just simple map-based transforms • And the example isn't even terribly relevant — is this even a coding sequence? • LOL dashed params Tuesday, June 11, 13
entrezgene largefasta seqxml ace excel lasergene strider agave exp locuslink swiss alf fasta mbsout swissdriver asciitree fastq metafasta tab bsml flybase_chadoxml table bsml_sax game nexml tigr chadoxml gbdriver phd tigrxml chaos gbxml pir tinyseq chaosxml gcg pln ztr ctf genbank qual Seq::SeqIO:: Look at all these formats!! Guess what? All the code for dealing with all these formats is completely non-standardized and most of it was written by a graduate student who has fallen off the face of the planet. Tuesday, June 11, 13
that write_seq() method... https://metacpan.org/source/CJFIELDS/BioPerl-1.6.901/Bio/SeqIO.pm#L519 Well... let's look at the constructor... https://metacpan.org/source/CJFIELDS/BioPerl-1.6.901/Bio/SeqIO.pm#L350 Tuesday, June 11, 13
=> $gb_file); my $seq_object = $seqio_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { say "primary tag: ", $feat_object->primary_tag; for my $tag ($feat_object->get_all_tags) { say " tag: $tag"; for my $value ($feat_object->get_tag_values($tag)) { say " value: $value"; } } } Tuesday, June 11, 13
$gb_file); my $seq_object = $seqio_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { say $feat_object->spliced_seq->seq; # e.g. 'ATTATTTTCGCTCGCTTCTCGCGCTTTTGCGT...' if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ say "gene: $val”; # e.g. 'NDP', from a line like '/gene="NDP"' } } } } Tuesday, June 11, 13
$seqio_object->next_seq; for my $feat_object ($seq_object->get_SeqFeatures) { if ($feat_object->primary_tag eq "CDS") { say $feat_object->spliced_seq->seq; # e.g. 'ATTATTTTCGCTCGCTTCTCGCGCTTTTGCGT...' if ($feat_object->has_tag('gene')) { for my $val ($feat_object->get_tag_values('gene')){ say "gene: $val”; # e.g. 'NDP', from a line like '/gene="NDP"' } } } } BioPerl to the rescue! Tuesday, June 11, 13
of code. It just needs some love. Okay, a lot of love. If you have the skills to contribute, please join in and help with the cleanup. Tuesday, June 11, 13