Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unlocking Data Trapped in Audio and Video Files...

Unlocking Data Trapped in Audio and Video Files (PyTexas 2014)

As more and more apps record audio and video files we need to start thinking about what to do with those files. Playing them back isn't enough. Media files are full of data that developers can start exploiting thanks to an emergent category of signal and natural language processing APIs.

Let's look at some of the APIs that allow us to extract this data including words, emotion, and identity.

Avatar for Paul Murphy

Paul Murphy

October 04, 2014
Tweet

More Decks by Paul Murphy

Other Decks in Technology

Transcript

  1. 4,500   4,000   3,500   3,000   2,500  

    2,000   1,500   1,000   500   0     2013   2010   2005   2000   40,000   BC   1440   1969   Cave  pain@ng   Inven@on  of  the   prin@ng  press   Inven%on  of   the  Internet   Exabytes  
  2. Any good? • Some are pretty standard REST. • Some are clearly

    an afterthought. • Some include timestamps. • Some include confidence scores. • Some include more information, comprehensible only to speech experts.
  3. Now we’re back in the game! •  upper(), lower() • 

    split(), join() •  index() •  etc., etc.
  4. Audio is Быстрая   коричневая  лиса   прыгает  через  

    ленивую  собаку.   عيرسلا ينبلا بلعثلا زف  .لوسكلا بلكلا זירזה בלכ 速い茶色のキツネ は、のろまなイヌに 飛びかかった。   ???   The  quick  brown  fox   jumps  over  the  lazy   dog.  
  5. One step at a time • A timeline is just a

    sequence • …(timestamp, word) 2-tuples • …(timestamp, word, probability) 3-tuples
  6. • White Papers • Case studies • Benefits • Video • No API (I think)

    Identity • Agnicio • VoiceVault • OneVault • Nuance
  7. Quick recap • Words – Lots of options! • Emotions – Bit

    tougher • Identity – Lots of nothing
  8. Also, remember this? Быстрая   коричневая  лиса   прыгает  через

      ленивую  собаку.   عيرسلا ينبلا بلعثلا زفقيو  .لوسكلا بلكلا قوف זירזה םוחה לעוש בלכה לעמ ץפוק  .ןלצעה The  quick  brown  fox   jumps  over  the  lazy   dog.  
  9. Under development •  Word recognition •  Object matching •  Object

    recognition •  Emotions •  ID •  Events
  10. A few things to notice •  A lot of this

    data is complimentary •  Words •  Emotions •  ID •  We can use different signals to find the same data •  There is no standard way of accessing or representing it
  11. Is there a solution? •  Lots of partial solutions, and

    a lot more coming. •  Classification is critical. •  Some sort of coherent way of describing this data will emerge. •  My company is working to speed things up.