Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Log 05/03

Liang Bo Wang
May 03, 2013
59

Log 05/03

Liang Bo Wang

May 03, 2013
Tweet

Transcript

  1. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 1

    m i R N A - S e q a d a p t e r t r i m m i n g , T C G A P r o j e c t R e s u l t  Work Log 05/03
  2. Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 2

    Illustration of different constructs and the reads produced. •  In = Inserts •  R = single-end reads •  R1, R2 = paired-end reads A)  ln.length ≥ R.length B)  In.length < R.length C)  In.length ≥ 2x R.length D)  R.length < In.length < 2x R.length E)  In.length < R.length
  3. Adapter Trimming – using Cutadapt •  MIT Licensed, actively maintained

    •  Able to handle multi-adapters •  Allow matching error, quality check •  Detection at different position •  Fast with detailed report •  Python embedding C module •  Taken pair-end reads together (in dev) Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 3
  4. Result – 1st Run •  Adapter sequence (length = 69)

    TGGAATTCTCGGGTGCCAAGGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG Maximum error rate: 10.00% No. of adapters: 1 Processed reads: 7968372 Processed bases: 804805572 bp (804.8 Mbp) Trimmed reads: 7764654 (97.4%) Quality-trimmed: 59716594 bp (508.2 Mbp) (7.42% of total) Trimmed bases: 508188865 bp (508.2 Mbp) (63.14% of total) Too short reads: 51386 (0.6% of processed reads) Too long reads: 0 (0.0% of processed reads) Total time: 555.33 s Time per read: 0.07 ms Bioinformatics and Biostatistics Core, NTU Center of Genomic Medicine 5
  5. length of being trimmed per read Bioinformatics and Biostatistics Core,

    NTU Center of Genomic Medicine 7 0" 500000" 1000000" 1500000" 2000000" 0" 20" 40" 60" 80" 100" Expected"(Rand)" Trimmed"
  6. Result – 2nd Run Maximum error rate: 10.00% No. of

    adapters: 1 Processed reads: 7713268 Processed bases: 229515361 bp (229.5 Mbp) Trimmed reads: 3827577 (49.6%) Quality-trimmed: 180344 bp (117.8 Mbp) (0.08% of total) Trimmed bases: 117799429 bp (117.8 Mbp) (51.33% of total) Too short reads: 3827607 (49.6% of processed reads) Too long reads: 0 (0.0% of processed reads) Total time: 295.97 s Time per read: 0.04 ms === Adapter 1 === Adapter 'unknown_hifreq' (GCATTGGTGGTTCAGTGGTAGAATTCTCGCC), length 31, was trimmed 3827577 times. Lengths of removed sequences length count expected max. errors 28 6792 0.0 2 29 66555 0.0 2 30 703674 0.0 3 31 3049065 0.0 3 32 1333 0.0 3 33 105 0.0 3 34 53 0.0 3  9